Dataset¶
- class paddle.io. Dataset [source]
-
An abstract class to encapsulate methods and behaviors of datasets.
All datasets in map-style(dataset samples can be get by a given key) should be a subclass of paddle.io.Dataset. All subclasses should implement following methods:
__getitem__
: get sample from dataset with a given index. This method is required by reading dataset sample inpaddle.io.DataLoader
.__len__
: return dataset sample number. This method is required by some implements ofpaddle.io.BatchSampler
see
paddle.io.DataLoader
.Examples
>>> import numpy as np >>> from paddle.io import Dataset >>> # define a random dataset >>> class RandomDataset(Dataset): ... def __init__(self, num_samples): ... self.num_samples = num_samples ... ... def __getitem__(self, idx): ... image = np.random.random([784]).astype('float32') ... label = np.random.randint(0, 9, (1, )).astype('int64') ... return image, label ... ... def __len__(self): ... return self.num_samples ... >>> dataset = RandomDataset(10) >>> for i in range(len(dataset)): ... image, label = dataset[i] ... # do something