Welcome to cosmos-rl’s documentation!
cosmos-rl is fully compatible with PyTorch and is designed for the future of distributed training.
Main Features
6D Parallelism: Sequence, Tensor, Context, Pipeline, FSDP, DDP.
Elastic & Fault Tolerance: A set of techniques to improve the robustness of distributed training.
- Async RL
- Flexible
Rollout and Policy are decoupled into independent processes/GPUs.
No colocation of Rollout and Policy is required.
Number of Rollout/Policy instances can be scaled independently.
- Fast
IB/NVLink are used for high-speed weight synchronization.
Policy training and Rollout weight synchronization are PARALLELIZED.
- Robust
Support AIPO for stable off-policy training.
Async/Sync strategy can be selected upon to user’s choice.
Note
6D Parallelism is fully supported by Policy Model. For Rollout Model, only Tensor Parallelism and Pipeline Parallelism are supported.
Multi nodes training
Elastic & Fault Tolerance
Async RL
Parallelism