Single node example
This example demonstrates how to run Qwen3-8B on a single node with 8 GPUs.
>>> cosmos-rl \
--config configs/qwen3/qwen3-8b-p-tp4-r-tp2-pp1-grpo.toml \
--policy 1 \
--rollout 2
Explanation of the command:
--config
: the path to the training config file.--policy
: the number of policy replicas.--rollout
: the number of rollout replicas.
As the toml file name suggests, this example uses Qwen3-8B model with:
4-way tensor parallelism for policy model
2-way tensor parallelism for rollout model
and total 8 GPUs are used since 1 policy and 2 rollout replicas are specified.
If everything goes well, you should see the training process like this:
[rank0]:[cosmos] 2025-06-09 22:14:29,220 - cosmos - INFO - Step: 1/4670, Loss: 0.00000
[rank1]:[cosmos] 2025-06-09 22:14:29,219 - cosmos - INFO - Step: 1/4670, Loss: 0.00000
...
Note
You may encounter loss values of 0.0 because the GRPO advantage is zero. Since it is a toy math example, it is expected.