Welcome to cosmos-rl’s documentation!
cosmos-rl is fully compatible with PyTorch and is designed for the future of distributed training.
Main Features
- Natively Designed for Physical AI
Cosmos-RL supports training serveral physical AI paradigms, e.g., LLM/VLM, world foundational models, VLA, etc.
- Multi-training Algorithms
Supports state-of-the-art LLM RL algorithms (e.g., GRPO, DAPO, etc.), RL algorithms for world foundational models (e.g., FlowGRPO, DDRL, DiffusionNFT, etc.), and VLA-specific algorithms.
Well-architected design ensures high extensibility, requiring only minimal configuration to implement custom training algorithms.
- Diversified Model Support
- For LLM/VLM:
Natively supports LLaMA/Qwen/Qwen-VL/Qwen3-MoE series models.
Compatible with all Huggingface LLMs.
- For world foundational models:
Natively supports SD3/Cosmos-Predict2.5/SANA.
Compatible with mainstream Huggingface world foundational models based on diffusers.
- For VLA (Vision-Language-Action):
Natively supports OpenVLA, OpenVLA-OFT, and PI0.5 series models.
Integrated with LIBERO and BEHAVIOR-1K simulators.
Easily extensible to other model architectures by customizing interface.
6D Parallelism: Sequence, Tensor, Context, Pipeline, FSDP, DDP.
Elastic & Fault Tolerance: A set of techniques to improve the robustness of distributed training.
- Async RL
- Flexible
Rollout and Policy are decoupled into independent processes/GPUs.
No colocation of Rollout and Policy is required.
Number of Rollout/Policy instances can be scaled independently.
- Fast
IB/NVLink are used for high-speed weight synchronization.
Policy training and Rollout weight synchronization are PARALLELIZED.
- Robust
Support AIPO for stable off-policy training.
Async/Sync strategy can be selected upon to user’s choice.
Note
6D Parallelism is fully supported by Policy Model. For Rollout Model, only Tensor Parallelism and Pipeline Parallelism are supported.
Quick Start
Multi nodes training
Elastic & Fault Tolerance
Async RL
Parallelism
Quantization
World Foundational Models