QueueDataset

class paddle.fluid.dataset. QueueDataset [source]

QueueDataset, it will process data streamly.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
local_shuffle ( )

local_shuffle

Local shuffle data.

Local shuffle is not supported in QueueDataset NotImplementedError will be raised

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
dataset.local_shuffle()
Raises

NotImplementedError – QueueDataset does not support local shuffle

global_shuffle ( fleet=None )

global_shuffle

Global shuffle data.

Global shuffle is not supported in QueueDataset NotImplementedError will be raised

Parameters

fleet (Fleet) – fleet singleton. Default None.

Examples

import paddle.fluid as fluid
from paddle.fluid.incubate.fleet.parameter_server.pslib import fleet
dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
dataset.global_shuffle(fleet)
Raises

NotImplementedError – QueueDataset does not support global shuffle

desc ( )

desc

Returns a protobuf message for this DataFeedDesc

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
print(dataset.desc())
Returns

A string message

set_batch_size ( batch_size )

set_batch_size

Set batch size. Will be effective during training

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_batch_size(128)
Parameters

batch_size (int) – batch size

set_download_cmd ( download_cmd )

set_download_cmd

Set customized download cmd: download_cmd

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_download_cmd("./read_from_afs")
Parameters

download_cmd (str) – customized download command

set_fea_eval ( record_candidate_size, fea_eval=True )

set_fea_eval

set fea eval mode for slots shuffle to debug the importance level of slots(features), fea_eval need to be set True for slots shuffle.

Parameters
  • record_candidate_size (int) – size of instances candidate to shuffle one slot

  • fea_eval (bool) – whether enable fea eval mode to enable slots shuffle. default is True.

Examples


import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_fea_eval(1000000, True)

set_filelist ( filelist )

set_filelist

Set file list in current worker.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_filelist(['a.txt', 'b.txt'])
Parameters

filelist (list) – file list

set_hdfs_config ( fs_name, fs_ugi )

set_hdfs_config

Set hdfs config: fs name ad ugi

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_hdfs_config("my_fs_name", "my_fs_ugi")
Parameters
  • fs_name (str) – fs name

  • fs_ugi (str) – fs ugi

set_pipe_command ( pipe_command )

set_pipe_command

Set pipe command of current dataset A pipe command is a UNIX pipeline command that can be used only

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_pipe_command("python my_script.py")
Parameters

pipe_command (str) – pipe command

set_pv_batch_size ( pv_batch_size )

set_pv_batch_size

Set pv batch size. It will be effective during enable_pv_merge

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_pv_batch(128)
Parameters

pv_batch_size (int) – pv batch size

set_rank_offset ( rank_offset )

set_rank_offset

Set rank_offset for merge_pv. It set the message of Pv.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_rank_offset("rank_offset")
Parameters

rank_offset (str) – rank_offset’s name

set_so_parser_name ( so_parser_name )

set_so_parser_name

Set so parser name of current dataset

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_so_parser_name("./abc.so")
Parameters

pipe_command (str) – pipe command

set_thread ( thread_num )

set_thread

Set thread num, it is the num of readers.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
 dataset.set_thread(12)
Parameters

thread_num (int) – thread num

set_use_var ( var_list )

set_use_var

Set Variables which you will use.

Examples

import paddle.fluid as fluid
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_use_var([data, label])
Parameters

var_list (list) – variable list

slots_shuffle ( slots )

slots_shuffle

Slots Shuffle Slots Shuffle is a shuffle method in slots level, which is usually used in sparse feature with large scale of instances. To compare the metric, i.e. auc while doing slots shuffle on one or several slots with baseline to evaluate the importance level of slots(features).

Parameters

slots (list[string]) – the set of slots(string) to do slots shuffle.

Examples

import paddle.fluid as fluid dataset = fluid.DatasetFactory().create_dataset(“InMemoryDataset”) dataset.set_merge_by_lineid() #suppose there is a slot 0 dataset.slots_shuffle([‘0’])