Welcome to cosmos-rl’s documentation!

cosmos-rl is fully compatible with PyTorch and is designed for the future of distributed training.

Main Features

  • 6D Parallelism: Sequence, Tensor, Context, Pipeline, FSDP, DDP.

  • Elastic & Fault Tolerance: A set of techniques to improve the robustness of distributed training.

  • Async RL
    • Flexible
      • Rollout and Policy are decoupled into independent processes/GPUs.

      • No colocation of Rollout and Policy is required.

      • Number of Rollout/Policy instances can be scaled independently.

    • Fast
      • IB/NVLink are used for high-speed weight synchronization.

      • Policy training and Rollout weight synchronization are PARALLELIZED.

    • Robust
      • Support AIPO for stable off-policy training.

      • Async/Sync strategy can be selected upon to user’s choice.

Note

6D Parallelism is fully supported by Policy Model. For Rollout Model, only Tensor Parallelism and Pipeline Parallelism are supported.

Parallelism