Welcome to cosmos-rl’s documentation!

cosmos-rl is fully compatible with PyTorch and is designed for the future of distributed training.

Main Features

  • Natively Designed for Physical AI
    • Cosmos-RL supports training serveral physical AI paradigms, e.g., LLM/VLM, world foundational models, VLA, etc.

    • Multi-training Algorithms
      • Supports state-of-the-art LLM RL algorithms (e.g., GRPO, DAPO, etc.), RL algorithms for world foundational models (e.g., FlowGRPO, DDRL, DiffusionNFT, etc.), and VLA-specific algorithms.

      • Well-architected design ensures high extensibility, requiring only minimal configuration to implement custom training algorithms.

    • Diversified Model Support
      • For LLM/VLM:
        • Natively supports LLaMA/Qwen/Qwen-VL/Qwen3-MoE series models.

        • Compatible with all Huggingface LLMs.

      • For world foundational models:
        • Natively supports SD3/Cosmos-Predict2.5/SANA.

        • Compatible with mainstream Huggingface world foundational models based on diffusers.

      • For VLA (Vision-Language-Action):
        • Natively supports OpenVLA, OpenVLA-OFT, and PI0.5 series models.

        • Integrated with LIBERO and BEHAVIOR-1K simulators.

      • Easily extensible to other model architectures by customizing interface.

  • 6D Parallelism: Sequence, Tensor, Context, Pipeline, FSDP, DDP.

  • Elastic & Fault Tolerance: A set of techniques to improve the robustness of distributed training.

  • Async RL
    • Flexible
      • Rollout and Policy are decoupled into independent processes/GPUs.

      • No colocation of Rollout and Policy is required.

      • Number of Rollout/Policy instances can be scaled independently.

    • Fast
      • IB/NVLink are used for high-speed weight synchronization.

      • Policy training and Rollout weight synchronization are PARALLELIZED.

    • Robust
      • Support AIPO for stable off-policy training.

      • Async/Sync strategy can be selected upon to user’s choice.

Note

6D Parallelism is fully supported by Policy Model. For Rollout Model, only Tensor Parallelism and Pipeline Parallelism are supported.

Parallelism