Multi-node example

Manual deployment

Similar to single-node example, a controller process should be launched first and any extra policy and rollout replica can join at any time.

In this example, 4 policy replica with each of 4 GPUs and two rollout replica with each of 2 GPUs are launched.

### On node A
# 1. Launch the controller process
./cosmos_rl/launcher/launch_controller.sh --port 8000 --config configs/qwen3/qwen3-8b-p-tp4-r-tp2-pp1-grpo.toml

export COSMOS_CONTROLLER_HOST=localhost:8000

# 2. Launch one policy replica of 4 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/launch_replica.sh --type policy --ngpus 4

# 3. Launch two rollout replicas of 2 GPUs
CUDA_VISIBLE_DEVICES=4,5 ./tools/launch_replica.sh --type rollout --ngpus 2
CUDA_VISIBLE_DEVICES=6,7 ./tools/launch_replica.sh --type rollout --ngpus 2

### On node B
# Launch two more policy replicas of 4 GPUs
export COSMOS_CONTROLLER_HOST={node-A-host-or-ip}:8000
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/launch_replica.sh --type policy --ngpus 4
CUDA_VISIBLE_DEVICES=4,5,6,7 ./tools/launch_replica.sh --type policy --ngpus 4

Note

In case one single replica needs more GPUs than 8 (across multiple nodes),

--rdzv-endpoint={MASTER_ADDR}:{MASTER_PORT}

should be specified for launch_replica.sh to correctly form a valid replica.

DGXC-Lepton job example

DGXC-Lepton Job takes care of the number of GPU required by each policy and rollout replica. It will calculate and allocate the number of total nodes necessary to launch the job.

>>> cosmos-rl \
        --config ./configs/qwen3/qwen3-8b-p-tp4-r-tp2-pp1-grpo.toml \
        --lepton-mode \
        --lepton-job-name cosmos-multi-node-test \
        --lepton-container-image {your-docker-image} \
        --lepton-resource-shape gpu.8xh100-sxm \
        --lepton-node-group {your-node-group} \
        --lepton-secret HUGGING_FACE_HUB_TOKEN={your-hugging-face-token} \
        --policy 8 \
        --rollout 4 \
        /workspace/cosmos_rl/tools/dataset/math_dapo.py

Total 8*4 + 4*2 = 40 GPUs are used. (5 8xH100 nodes will be allocated for this job)

For more details about DGXC-Lepton, please refer to DGXC-Lepton Job

Slurm job example

Cosmos slurm Job takes care of the number of GPU required by each policy and rollout replica. It will calculate and allocate the number of total nodes necessary to launch the job, then allocate and launch the job on the slurm cluster.

>>> python tools/slurm/dispatch_job.py \
        --ngpu-per-node 8 \
        --n-policy-replicas 8 \
        --n-rollout-replicas 4  \
        --config-path configs/qwen2-5/qwen2-5-32b-p-fsdp2-tp4-r-tp4-pp1-grpo-gsm8k.toml \
        --repo-root-path ${YOUR_COSMOS_REPO_PATH} \
        --output-root-path ${YOUR_COSMOS_LOG_OUTPUT_PATH} \
        --cosmos-container ${YOUR_COSMOS_SQSQ_PATH} \
        --slurm-partition ${YOUR_SLURM_PARTITION} \
        --slurm-account ${YOUR_SLURM_ACCOUNT} \
        tools/dataset/gsm8k_grpo.py

Total 8*8 + 4*4 = 64 GPUs are used. (8 8xH100 nodes will be allocated for this job)

For more details about Slurm job submission, please refer to Slurm Job