QueueDataset¶
- class paddle.fluid.dataset. QueueDataset [source]
-
QueueDataset, it will process data streamly.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
-
local_shuffle
(
)
local_shuffle¶
-
Local shuffle data.
Local shuffle is not supported in QueueDataset NotImplementedError will be raised
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset("QueueDataset") dataset.local_shuffle()
- Raises
-
NotImplementedError – QueueDataset does not support local shuffle
-
global_shuffle
(
fleet=None
)
global_shuffle¶
-
Global shuffle data.
Global shuffle is not supported in QueueDataset NotImplementedError will be raised
- Parameters
-
fleet (Fleet) – fleet singleton. Default None.
Examples
import paddle.fluid as fluid from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet dataset = fluid.DatasetFactory().create_dataset("QueueDataset") dataset.global_shuffle(fleet)
- Raises
-
NotImplementedError – QueueDataset does not support global shuffle
-
desc
(
)
desc¶
-
Returns a protobuf message for this DataFeedDesc
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() print(dataset.desc())
- Returns
-
A string message
-
set_batch_size
(
batch_size
)
set_batch_size¶
-
Set batch size. Will be effective during training
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_batch_size(128)
- Parameters
-
batch_size (int) – batch size
-
set_download_cmd
(
download_cmd
)
set_download_cmd¶
-
Set customized download cmd: download_cmd
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_download_cmd("./read_from_afs")
- Parameters
-
download_cmd (str) – customized download command
-
set_fea_eval
(
record_candidate_size,
fea_eval=True
)
set_fea_eval¶
-
set fea eval mode for slots shuffle to debug the importance level of slots(features), fea_eval need to be set True for slots shuffle.
- Parameters
-
record_candidate_size (int) – size of instances candidate to shuffle one slot
fea_eval (bool) – whether enable fea eval mode to enable slots shuffle. default is True.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_fea_eval(1000000, True)
-
set_filelist
(
filelist
)
set_filelist¶
-
Set file list in current worker.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_filelist(['a.txt', 'b.txt'])
- Parameters
-
filelist (list) – file list
-
set_hdfs_config
(
fs_name,
fs_ugi
)
set_hdfs_config¶
-
Set hdfs config: fs name ad ugi
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
- Parameters
-
fs_name (str) – fs name
fs_ugi (str) – fs ugi
-
set_pipe_command
(
pipe_command
)
set_pipe_command¶
-
Set pipe command of current dataset A pipe command is a UNIX pipeline command that can be used only
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_pipe_command("python my_script.py")
- Parameters
-
pipe_command (str) – pipe command
-
set_pv_batch_size
(
pv_batch_size
)
set_pv_batch_size¶
-
Set pv batch size. It will be effective during enable_pv_merge
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_pv_batch(128)
- Parameters
-
pv_batch_size (int) – pv batch size
-
set_rank_offset
(
rank_offset
)
set_rank_offset¶
-
Set rank_offset for merge_pv. It set the message of Pv.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_rank_offset("rank_offset")
- Parameters
-
rank_offset (str) – rank_offset’s name
-
set_so_parser_name
(
so_parser_name
)
set_so_parser_name¶
-
Set so parser name of current dataset
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_so_parser_name("./abc.so")
- Parameters
-
pipe_command (str) – pipe command
-
set_thread
(
thread_num
)
set_thread¶
-
Set thread num, it is the num of readers.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_thread(12)
- Parameters
-
thread_num (int) – thread num
-
set_use_var
(
var_list
)
set_use_var¶
-
Set Variables which you will use.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset() dataset.set_use_var([data, label])
- Parameters
-
var_list (list) – variable list
-
slots_shuffle
(
slots
)
slots_shuffle¶
-
Slots Shuffle Slots Shuffle is a shuffle method in slots level, which is usually used in sparse feature with large scale of instances. To compare the metric, i.e. auc while doing slots shuffle on one or several slots with baseline to evaluate the importance level of slots(features).
- Parameters
-
slots (list[string]) – the set of slots(string) to do slots shuffle.
Examples
import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_merge_by_lineid() #suppose there is a slot 0 dataset.slots_shuffle([‘0’])
-
local_shuffle
(
)