QueueDataset¶

class paddle.fluid.dataset. QueueDataset [source]

QueueDataset, it will process data streamly.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")

local_shuffle ( ) local_shuffle¶

Local shuffle data.

Local shuffle is not supported in QueueDataset NotImplementedError will be raised

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
dataset.local_shuffle()

Raises: NotImplementedError – QueueDataset does not support local shuffle

global_shuffle ( fleet=None ) global_shuffle¶

Global shuffle data.

Global shuffle is not supported in QueueDataset NotImplementedError will be raised

Parameters: fleet (Fleet) – fleet singleton. Default None.

Examples

import paddle.fluid as fluid
from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
dataset.global_shuffle(fleet)

Raises: NotImplementedError – QueueDataset does not support global shuffle

desc ( ) desc¶

Returns a protobuf message for this DataFeedDesc

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
print(dataset.desc())

Returns: A string message

set_batch_size ( batch_size ) set_batch_size¶

Set batch size. Will be effective during training

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_batch_size(128)

Parameters: batch_size (int) – batch size

set_download_cmd ( download_cmd ) set_download_cmd¶

Set customized download cmd: download_cmd

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_download_cmd("./read_from_afs")

Parameters: download_cmd (str) – customized download command

set_fea_eval ( record_candidate_size, fea_eval=True ) set_fea_eval¶

set fea eval mode for slots shuffle to debug the importance level of slots(features), fea_eval need to be set True for slots shuffle.

Parameters

record_candidate_size (int) – size of instances candidate to shuffle one slot
fea_eval (bool) – whether enable fea eval mode to enable slots shuffle. default is True.

Examples

import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_fea_eval(1000000, True)

set_filelist ( filelist ) set_filelist¶

Set file list in current worker.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_filelist(['a.txt', 'b.txt'])

Parameters: filelist (list) – file list

set_hdfs_config ( fs_name, fs_ugi ) set_hdfs_config¶

Set hdfs config: fs name ad ugi

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")

Parameters

fs_name (str) – fs name
fs_ugi (str) – fs ugi

set_pipe_command ( pipe_command ) set_pipe_command¶

Set pipe command of current dataset A pipe command is a UNIX pipeline command that can be used only

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_pipe_command("python my_script.py")

Parameters: pipe_command (str) – pipe command

set_pv_batch_size ( pv_batch_size ) set_pv_batch_size¶

Set pv batch size. It will be effective during enable_pv_merge

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_pv_batch(128)

Parameters: pv_batch_size (int) – pv batch size

set_rank_offset ( rank_offset ) set_rank_offset¶

Set rank_offset for merge_pv. It set the message of Pv.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_rank_offset("rank_offset")

Parameters: rank_offset (str) – rank_offset’s name

set_so_parser_name ( so_parser_name ) set_so_parser_name¶

Set so parser name of current dataset

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_so_parser_name("./abc.so")

Parameters: pipe_command (str) – pipe command

set_thread ( thread_num ) set_thread¶

Set thread num, it is the num of readers.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
 dataset.set_thread(12)

Parameters: thread_num (int) – thread num

set_use_var ( var_list ) set_use_var¶

Set Variables which you will use.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_use_var([data, label])

Parameters: var_list (list) – variable list

slots_shuffle ( slots ) slots_shuffle¶

Slots Shuffle Slots Shuffle is a shuffle method in slots level, which is usually used in sparse feature with large scale of instances. To compare the metric, i.e. auc while doing slots shuffle on one or several slots with baseline to evaluate the importance level of slots(features).

Parameters: slots (list[string]) – the set of slots(string) to do slots shuffle.

Examples

import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_merge_by_lineid() #suppose there is a slot 0 dataset.slots_shuffle([‘0’])