Imdb¶
- class paddle.text. Imdb ( data_file=None, mode='train', cutoff=150, download=True ) [source]
-
Implementation of IMDB dataset.
- Parameters
-
data_file (str) – path to data tar file, can be set None if
download
is True. Default Nonemode (str) – ‘train’ ‘test’ mode. Default ‘train’.
cutoff (int) – cutoff number for building word dictionary. Default 150.
download (bool) – whether to download dataset automatically if
data_file
is not set. Default True
- Returns
-
instance of IMDB dataset
- Return type
-
Dataset
Examples
>>> >>> import paddle >>> from paddle.text.datasets import Imdb >>> class SimpleNet(paddle.nn.Layer): ... def __init__(self): ... super().__init__() ... ... def forward(self, doc, label): ... return paddle.sum(doc), label >>> imdb = Imdb(mode='train') >>> for i in range(10): ... doc, label = imdb[i] ... doc = paddle.to_tensor(doc) ... label = paddle.to_tensor(label) ... ... model = SimpleNet() ... image, label = model(doc, label) ... print(doc.shape, label.shape) [121] [1] [115] [1] [386] [1] [471] [1] [585] [1] [206] [1] [221] [1] [324] [1] [166] [1] [598] [1]