launch¶
- paddle.distributed. launch ( ) [source]
-
Paddle distribution training entry
python -m paddle.distributed.launch
.- Usage:
-
python -m paddle.distributed.launch [-h] [--log_dir LOG_DIR] [--nproc_per_node NPROC_PER_NODE] [--run_mode RUN_MODE] [--gpus GPUS] [--selected_gpus GPUS] [--ips IPS] [--servers SERVERS] [--workers WORKERS] [--heter_workers HETER_WORKERS] [--worker_num WORKER_NUM] [--server_num SERVER_NUM] [--heter_worker_num HETER_WORKER_NUM] [--http_port HTTP_PORT] [--elastic_server ELASTIC_SERVER] [--job_id JOB_ID] [--np NP] [--scale SCALE] [--host HOST] [--force FORCE] training_script ...
- Base Parameters:
-
--log_dir
: The path for each process’s log. e.g.,--log_dir=output_dir
. Default--log_dir=log
.--nproc_per_node
: The number of processes to launch on a node. In gpu training, it should be less or equal to the gpus number of you system(or you set by –gpus). e.g.,--nproc_per_node=8
--run_mode
: run mode of job, can be:collective/ps/ps-heter. e.g.,--run_mode=ps
. Default--run_mode=collective
.--gpus
: It’s for gpu training. e.g.,--gpus=0,1,2,3
will launch four training processes each bound to one gpu.--selected_gpus
: gpus aliases, recommend to use--gpus
.--xpus
: It’s for xpu training if xpu is available. e.g.,--xpus=0,1,2,3
.--selected_xpus
: xpus aliases, recommend to use--xpus
.training_script
: The full path to the single GPU training program/script to be launched in parallel, followed by all the arguments for the training script. e.g.,traing.py
training_script_args
: The args of training_script. e.g.,--lr=0.1
- Collective Parameters:
-
--ips
: Paddle cluster nodes ips, e.g.,--ips=192.168.0.16,192.168.0.17
. Default--ips=127.0.0.1
.
- Parameter-Server Parameters:
-
--servers
: User defined servers ip:port, e.g.,--servers="192.168.0.16:6170,192.168.0.17:6170"
--workers
: User defined workers ip:port, e.g.,--workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172"
--heter_workers
: User defined heter workers ip1:port1;ip2:port2, e.g.,--heter_workers="192.168.0.16:6172;192.168.0.17:6172"
--worker_num
: Number of workers (It recommend to set when in the emulated distributed environment using single node)--server_num
: Number of servers (It recommend to set when in the emulated distributed environment using single node)--heter_worker_num
: Number of heter_workers in each stage (It recommend to set when in the emulated distributed environment using single node)--heter_devices
: Type of heter_device in each stage--http_port
: Gloo http Port
- Elastic Parameters:
-
--elastic_server
: etcd server host:port, e.g.,--elastic_server=127.0.0.1:2379
--job_id
: job unique id, e.g.,--job_id=job1
--np
: job pod/node number, e.g.,--np=2
--host
: bind host, default to POD_IP env.
- Returns
-
None
- Examples 1 (collective, single node):
-
# For training on single node using 4 gpus. python -m paddle.distributed.launch --gpus=0,1,2,3 train.py --lr=0.01
- Examples 2 (collective, multi node):
-
# The parameters of --gpus and --ips must be consistent in each node. # For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 # On 192.168.0.16: python -m paddle.distributed.launch --gpus=0,1,2,3 --ips=192.168.0.16,192.168.0.17 train.py --lr=0.01 # On 192.168.0.17: python -m paddle.distributed.launch --gpus=0,1,2,3 --ips=192.168.0.16,192.168.0.17 train.py --lr=0.01
- Examples 3 (ps, cpu, single node):
-
# To simulate distributed environment using single node, e.g., 2 servers and 4 workers. python -m paddle.distributed.launch --server_num=2 --worker_num=4 train.py --lr=0.01
- Examples 4 (ps, cpu, multi node):
-
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server and 2 workers. # On 192.168.0.16: python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172" train.py --lr=0.01 # On 192.168.0.17: python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172" train.py --lr=0.01
- Examples 5 (ps, gpu, single node):
- Examples 6 (ps, gpu, multi node):
-
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server and 2 workers. # On 192.168.0.16: export CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172" train.py --lr=0.01 # On 192.168.0.17: export CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172" train.py --lr=0.01
- Examples 7 (ps-heter, cpu + gpu, single node):
-
# To simulate distributed environment using single node, e.g., 2 servers and 4 workers, two workers use gpu, two workers use cpu. export CUDA_VISIBLE_DEVICES=0,1 python -m paddle.distributed.launch --server_num=2 --worker_num=2 --heter_worker_num=2 train.py --lr=0.01
- Examples 8 (ps-heter, cpu + gpu, multi node):
-
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server, 1 gpu worker, 1 cpu worker. # On 192.168.0.16: export CUDA_VISIBLE_DEVICES=0 python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.17:6171" --heter_workers="192.168.0.16:6172,192.168.0.17:6172" train.py --lr=0.01 # On 192.168.0.17: export CUDA_VISIBLE_DEVICES=0 python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.17:6171" --heter_workers="192.168.0.16:6172,192.168.0.17:6172" train.py --lr=0.01
- Examples 9 (elastic):
-
python -m paddle.distributed.launch --elastic_server=127.0.0.1:2379 --np=2 --job_id=job1 --gpus=0,1,2,3 train.py