Welcome to cosmos-rl’s documentation!

cosmos-rl is fully compatible with PyTorch and is designed for the future of distributed training.

Main Features

  • 6D Parallelism: Sequence, Tensor, Context, Pipeline, FSDP, DDP.

  • Elastic & Fault Tolerance: A set of techniques to improve the robustness of distributed training.

  • Async RL
    • Flexible
      • Rollout and Policy are decoupled into independent processes/GPUs.

      • No colocation of Rollout and Policy is required.

      • Number of Rollout/Policy instances can be scaled independently.

    • Fast
      • IB/NVLink are used for high-speed weight synchronization.

      • Policy training and Rollout weight synchronization are PARALLELIZED.

    • Robust
      • Support AIPO for stable off-policy training.

      • Async/Sync strategy can be selected upon to user’s choice.

  • Multi-training Algorithms
    • Supports state-of-the-art LLM RL algorithms (e.g., GRPO, DAPO, etc.).

    • Well-architected design ensures high extensibility, requiring only minimal configuration to implement custom training algorithms.

  • Diversified Model Support
    • Natively supports LLaMA/Qwen/Qwen-VL/Qwen3-MoE series models.

    • Compatible with all Huggingface LLMs.

    • Easily extensible to other model architectures by customizing interface.

Note

6D Parallelism is fully supported by Policy Model. For Rollout Model, only Tensor Parallelism and Pipeline Parallelism are supported.

Parallelism