DataFeedDesc¶
- class paddle.fluid.data_feed_desc. DataFeedDesc ( proto_file ) [source]
-
- Api_attr
-
Static Graph
Datafeed descriptor, describing input training data format. This class is currently only used for AsyncExecutor (See comments for class AsyncExecutor for a brief introduction)
DataFeedDesc shall be initialized from a valid protobuf message from disk.
See
paddle/fluid/framework/data_feed.proto
for message definition. A typical message might look like:import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto')
However, users usually shouldn’t care about the message format; instead, they are encouraged to use
Data Generator
as a tool to generate a valid data description, in the process of converting their raw log files to training files acceptable to AsyncExecutor.DataFeedDesc can also be changed during runtime. Once you got familiar with what each field mean, you can modify it to better suit your need. E.g.:
import paddle.fluid as fluid data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_batch_size(128) data_feed.set_dense_slots('wd') # The slot named 'wd' will be dense data_feed.set_use_slots('wd') # The slot named 'wd' will be used
Finally, the content can be dumped out for debugging purpose:
print(data_feed.desc())
- Parameters
-
proto_file (string) – Disk file containing a data feed description.
-
set_batch_size
(
batch_size
)
set_batch_size¶
-
Set
batch_size
in api_fluid_DataFeedDesc .batch_size
can be changed during training.Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_batch_size(128)
- Parameters
-
batch_size (int) – The number of batch size.
- Returns
-
None.
-
set_dense_slots
(
dense_slots_name
)
set_dense_slots¶
-
Set slots in
dense_slots_name
as dense slots. Note: In default, all slots are sparse slots.Features for a dense slot will be fed into a Tensor, while those for a sparse slot will be fed into a LoDTensor.
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_dense_slots(['words'])
- Parameters
-
dense_slots_name (list(str)) – a list of slot names which will be set dense.
- Returns
-
None.
-
set_use_slots
(
use_slots_name
)
set_use_slots¶
-
Set if a specific slot will be used for training. A dataset shall contain a lot of features, through this function one can select which ones will be used for a specific model.
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_use_slots(['words'])
- Parameters
-
use_slots_name – a list of slot names which will be used in training
Note
Default is not used for all slots
-
desc
(
)
desc¶
-
Returns a protobuf message for this DataFeedDesc
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') print(data_feed.desc())
- Returns
-
A string message