Welcome to cosmos-rl’s documentation!
cosmos-rl is fully compatible with PyTorch and is designed for the future of distributed training.
Main Features
6D Parallelism: Sequence, Tensor, Context, Pipeline, FSDP, DDP.
Elastic & Fault Tolerance: A set of techniques to improve the robustness of distributed training.
- Async RL
- Flexible
Rollout and Policy are decoupled into independent processes/GPUs.
No colocation of Rollout and Policy is required.
Number of Rollout/Policy instances can be scaled independently.
- Fast
IB/NVLink are used for high-speed weight synchronization.
Policy training and Rollout weight synchronization are PARALLELIZED.
- Robust
Support AIPO for stable off-policy training.
Async/Sync strategy can be selected upon to user’s choice.
- Multi-training Algorithms
Supports state-of-the-art LLM RL algorithms (e.g., GRPO, DAPO, etc.).
Well-architected design ensures high extensibility, requiring only minimal configuration to implement custom training algorithms.
- Diversified Model Support
Natively supports LLaMA/Qwen/Qwen-VL/Qwen3-MoE series models.
Compatible with all Huggingface LLMs.
Easily extensible to other model architectures by customizing interface.
Note
6D Parallelism is fully supported by Policy Model. For Rollout Model, only Tensor Parallelism and Pipeline Parallelism are supported.
Quick Start
Multi nodes training
Elastic & Fault Tolerance
Async RL
Parallelism
Quantization