Imikolov¶
- class paddle.text. Imikolov ( data_file=None, data_type='NGRAM', window_size=- 1, mode='train', min_word_freq=50, download=True ) [source]
-
Implementation of imikolov dataset.
- Parameters
-
data_file (str) – path to data tar file, can be set None if
download
is True. Default Nonedata_type (str) – ‘NGRAM’ or ‘SEQ’. Default ‘NGRAM’.
window_size (int) – sliding window size for ‘NGRAM’ data. Default -1.
mode (str) – ‘train’ ‘test’ mode. Default ‘train’.
min_word_freq (int) – minimal word frequence for building word dictionary. Default 50.
download (bool) – whether to download dataset automatically if
data_file
is not set. Default True
- Returns
-
instance of imikolov dataset
- Return type
-
Dataset
Examples
>>> import paddle >>> from paddle.text.datasets import Imikolov >>> class SimpleNet(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... ... def forward(self, src, trg): ... return paddle.sum(src), paddle.sum(trg) >>> imikolov = Imikolov(mode='train', data_type='SEQ', window_size=2) >>> for i in range(10): ... src, trg = imikolov[i] ... src = paddle.to_tensor(src) ... trg = paddle.to_tensor(trg) ... ... model = SimpleNet() ... src, trg = model(src, trg) ... print(src.item(), trg.item()) 2076 2075 2076 2075 675 674 4 3 464 463 2076 2075 865 864 2076 2075 2076 2075 1793 1792