DataFeeder¶
- class paddle.fluid.data_feeder. DataFeeder ( feed_list, place, program=None ) [source]
-
- Api_attr
-
Static Graph
DataFeeder converts the data that returned by a reader into a data structure that can feed into Executor. The reader is usually a python generator that returns a list of mini-batch data entries.
- Parameters
-
feed_list (list) – Variables or names of Variables that need to feed.
place (api_fluid_CPUPlace | api_fluid_CUDAPlace) – place indicates the device (CPU | GPU) the data will be fed into, if you want to feed data into GPU, please using
fluid.CUDAPlace(i)
(i
represents the GPU id), or if you want to feed data into CPU, please usingfluid.CPUPlace()
.program (api_fluid_Program , optional) – The Program that will feed data into, if program is None, it will use default_main_program(). Default None.
- Raises
-
ValueError –
Example
import numpy as np import paddle import paddle.fluid as fluid place = fluid.CPUPlace() def reader(): for _ in range(4): yield np.random.random([4]).astype('float32'), np.random.random([3]).astype('float32'), main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program, startup_program): data_1 = fluid.data(name='data_1', shape=[None, 2, 2], dtype='float32') data_2 = fluid.data(name='data_2', shape=[None, 1, 3], dtype='float32') out = fluid.layers.fc(input=[data_1, data_2], size=2) # ... feeder = fluid.DataFeeder([data_1, data_2], place) exe = fluid.Executor(place) exe.run(startup_program) feed_data = feeder.feed(reader()) # print feed_data to view feed results # print(feed_data['data_1']) # print(feed_data['data_2']) outs = exe.run(program=main_program, feed=feed_data, fetch_list=[out]) print(outs)
-
feed
(
iterable
)
feed¶
-
According to
feed_list
ofDataFeeder
anditerable
, converts the input into a data structure that can feed into Executor.- Parameters
-
iterable (generator) – user defined python generator to read the raw input data
- Returns
-
a
dict
that contains (variable name - converted tensor) pairs - Return type
-
dict
Example
# In this example, reader - generator will return a list of ndarray of 3 elements # feed API will convert each ndarray input into a tensor # the return result is a dict with keys: data_1, data_2, data_3 # result['data_1'] a LoD-Tensor with shape of [5, 2, 1, 3]. 5 is batch size, and [2, 1, 3] is the real shape of data_1. # result['data_2'], result['data_3'] are similar. import numpy as np import paddle.fluid as fluid def reader(limit=5): for i in range(1, limit + 1): yield np.ones([6]).astype('float32') * i , np.ones([1]).astype('int64') * i, np.random.random([9]).astype('float32') data_1 = fluid.data(name='data_1', shape=[None, 2, 1, 3]) data_2 = fluid.data(name='data_2', shape=[None, 1], dtype='int64') data_3 = fluid.data(name='data_3', shape=[None, 3, 3], dtype='float32') feeder = fluid.DataFeeder(['data_1','data_2', 'data_3'], fluid.CPUPlace()) result = feeder.feed(reader()) print(result['data_1']) print(result['data_2']) print(result['data_3'])
-
feed_parallel
(
iterable,
num_places=None
)
feed_parallel¶
-
Similar with feed function, feed_parallel is used with multiple devices (CPU|GPU). Here
iterable
is a list of python generators. The data return by each generator in the list will be fed into a separate device.- Parameters
-
iterable (list|tuple) – list of user-defined python generators. The element number should match the
num_places
.num_places (int, optional) – the number of devices. If not provided (None), all available devices on the machine will be used. Default None.
- Returns
-
a
generator
that generate dict which contains (variable name - converted tensor) pairs, the total number of dicts will be generated matches with thenum_places
- Return type
-
generator
Note
The number of devices -
num_places
should equal to the generator (element ofiterable
) numberExample
import numpy as np import paddle.fluid as fluid def generate_reader(batch_size, base=0, factor=1): def _reader(): for i in range(batch_size): yield np.ones([4]) * factor + base, np.ones([4]) * factor + base + 5 return _reader() x = fluid.data(name='x', shape=[None, 2, 2]) y = fluid.data(name='y', shape=[None, 2, 2], dtype='float32') z = fluid.layers.elementwise_add(x, y) feeder = fluid.DataFeeder(['x','y'], fluid.CPUPlace()) place_num = 2 places = [fluid.CPUPlace() for x in range(place_num)] data = [] exe = fluid.Executor(fluid.CPUPlace()) exe.run(fluid.default_startup_program()) program = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(places=places) # print sample feed_parallel r result # for item in list(feeder.feed_parallel([generate_reader(5, 0, 1), generate_reader(3, 10, 2)], 2)): # print(item['x']) # print(item['y']) reader_list = [generate_reader(5, 0, 1), generate_reader(3, 10, 2)] res = exe.run(program=program, feed=list(feeder.feed_parallel(reader_list, 2)), fetch_list=[z]) print(res)
-
decorate_reader
(
reader,
multi_devices,
num_places=None,
drop_last=True
)
decorate_reader¶
-
Decorate the reader (generator) to fit multiple devices. The reader generate multiple mini-batches. Each mini-batch will be fed into a single device.
- Parameters
-
reader (generator) – a user defined python generator used to get
mini-batch
of data. Amini-batch
can be regarded as a python generator that returns batches of input entities, just like the below_mini_batch
in the code example.multi_devices (bool) – indicate whether to use multiple devices or not.
num_places (int, optional) – if
multi_devices
is True, you can specify the number of devices(CPU|GPU) to use, if multi_devices is None, the function will use all the devices of the current machine. Default None.drop_last (bool, optional) – whether to drop the last round of data if it is not enough to feed all devices. Default True.
- Returns
-
a new
generator
which return converted dicts that can be fed into Executor - Return type
-
generator
- Raises
-
ValueError – If drop_last is False and the data cannot fit devices perfectly.
Example
import numpy as np import paddle import paddle.fluid as fluid import paddle.fluid.compiler as compiler def reader(): def _mini_batch(batch_size): for i in range(batch_size): yield np.random.random([16]).astype('float32'), np.random.randint(10, size=[1]) for _ in range(10): yield _mini_batch(np.random.randint(1, 10)) place_num = 3 places = [fluid.CPUPlace() for _ in range(place_num)] # a simple network sample data = fluid.data(name='data', shape=[None, 4, 4], dtype='float32') label = fluid.data(name='label', shape=[None, 1], dtype='int64') hidden = fluid.layers.fc(input=data, size=10) feeder = fluid.DataFeeder(place=places[0], feed_list=[data, label]) reader = feeder.decorate_reader(reader, multi_devices=True, num_places=3, drop_last=True) exe = fluid.Executor(places[0]) exe.run(fluid.default_startup_program()) compiled_prog = compiler.CompiledProgram( fluid.default_main_program()).with_data_parallel(places=places) for i,data in enumerate(reader()): # print data if you like # print(i, data) ret = exe.run(compiled_prog, feed=data, fetch_list=[hidden]) print(ret)