Conll05st¶
- class paddle.text. Conll05st ( data_file=None, word_dict_file=None, verb_dict_file=None, target_dict_file=None, emb_file=None, download=True ) [source]
-
Implementation of Conll05st test dataset.
- Note: only support download test dataset automatically for that
-
only test dataset of Conll05st is public.
- Parameters
-
data_file (str) – path to data tar file, can be set None if
download
is True. Default Noneword_dict_file (str) – path to word dictionary file, can be set None if
download
is True. Default Noneverb_dict_file (str) – path to verb dictionary file, can be set None if
download
is True. Default Nonetarget_dict_file (str) – path to target dictionary file, can be set None if
download
is True. Default Noneemb_file (str) – path to embedding dictionary file, only used for
get_embedding
can be set None ifdownload
is True. Default Nonedownload (bool) – whether to download dataset automatically if
data_file
word_dict_file
verb_dict_file
target_dict_file
is not set. Default True
- Returns
-
instance of conll05st dataset
- Return type
-
Dataset
Examples
>>> import paddle >>> from paddle.text.datasets import Conll05st >>> class SimpleNet(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... ... def forward(self, pred_idx, mark, label): ... return paddle.sum(pred_idx), paddle.sum(mark), paddle.sum(label) >>> conll05st = Conll05st() >>> for i in range(10): ... pred_idx, mark, label= conll05st[i][-3:] ... pred_idx = paddle.to_tensor(pred_idx) ... mark = paddle.to_tensor(mark) ... label = paddle.to_tensor(label) ... ... model = SimpleNet() ... pred_idx, mark, label= model(pred_idx, mark, label) ... print(pred_idx.item(), mark.item(), label.item()) >>> 65840 5 1991 92560 5 3686 99120 5 457 121960 5 3945 4774 5 2378 14973 5 1938 36921 5 1090 26908 5 2329 62965 5 2968 97755 5 2674
-
get_dict
(
)
get_dict¶
-
Get the word, verb and label dictionary of Wikipedia corpus.
Examples
>>> from paddle.text.datasets import Conll05st >>> conll05st = Conll05st() >>> word_dict, predicate_dict, label_dict = conll05st.get_dict()
-
get_embedding
(
)
get_embedding¶
-
Get the embedding dictionary file.
Examples
>>> from paddle.text.datasets import Conll05st >>> conll05st = Conll05st() >>> emb_file = conll05st.get_embedding()