Conll05st

class paddle.text. Conll05st ( data_file=None, word_dict_file=None, verb_dict_file=None, target_dict_file=None, emb_file=None, download=True ) [source]

Implementation of Conll05st test dataset.

Note: only support download test dataset automatically for that

only test dataset of Conll05st is public.

Parameters
  • data_file (str) – path to data tar file, can be set None if download is True. Default None

  • word_dict_file (str) – path to word dictionary file, can be set None if download is True. Default None

  • verb_dict_file (str) – path to verb dictionary file, can be set None if download is True. Default None

  • target_dict_file (str) – path to target dictionary file, can be set None if download is True. Default None

  • emb_file (str) – path to embedding dictionary file, only used for get_embedding can be set None if download is True. Default None

  • download (bool) – whether to download dataset automatically if data_file word_dict_file verb_dict_file target_dict_file is not set. Default True

Returns

instance of conll05st dataset

Return type

Dataset

Examples

>>> import paddle
>>> from paddle.text.datasets import Conll05st

>>> class SimpleNet(paddle.nn.Layer):
...     def __init__(self):
...         super().__init__()
...
...     def forward(self, pred_idx, mark, label):
...         return paddle.sum(pred_idx), paddle.sum(mark), paddle.sum(label)


>>> conll05st = Conll05st()

>>> for i in range(10):
...     pred_idx, mark, label= conll05st[i][-3:]
...     pred_idx = paddle.to_tensor(pred_idx)
...     mark = paddle.to_tensor(mark)
...     label = paddle.to_tensor(label)
...
...     model = SimpleNet()
...     pred_idx, mark, label= model(pred_idx, mark, label)
...     print(pred_idx.item(), mark.item(), label.item())
>>> 
65840 5 1991
92560 5 3686
99120 5 457
121960 5 3945
4774 5 2378
14973 5 1938
36921 5 1090
26908 5 2329
62965 5 2968
97755 5 2674
get_dict ( )

get_dict

Get the word, verb and label dictionary of Wikipedia corpus.

Examples

>>> from paddle.text.datasets import Conll05st

>>> conll05st = Conll05st()
>>> word_dict, predicate_dict, label_dict = conll05st.get_dict()
get_embedding ( )

get_embedding

Get the embedding dictionary file.

Examples

>>> from paddle.text.datasets import Conll05st

>>> conll05st = Conll05st()
>>> emb_file = conll05st.get_embedding()