Cosmos Cookbook
Overview
NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models (WFMs), guardrails, and an accelerated data processing and curation pipeline. This cookbook serves as a practical guide to the Cosmos open models — offering step-by-step workflows, technical recipes, and concrete examples for building, adapting, and deploying WFMs. It helps developers reproduce successful Cosmos model deployments and customize them for their specific domains.
The Cosmos ecosystem supports the complete Physical AI development lifecycle — from inference using pre-trained models to custom post-training for domain adaptation. Inside, you'll find:
- Quick-start inference examples to get up and running fast.
- Advanced post-training workflows for domain-specific fine-tuning.
- Proven recipes for scalable, production-ready deployments.
Open Source Community Platform
The Cosmos Cookbook is an open-source resource where NVIDIA and the broader Physical AI community share practical workflows, proven techniques, and domain-specific adaptations.
📂 Repository: https://github.com/nvidia-cosmos/cosmos-cookbook
We welcome contributions—from new examples and workflow improvements to bug fixes and documentation updates. Together, we can evolve best practices and accelerate the adoption of Cosmos models across domains.
📊 Physical AI Datasets: Access curated datasets for autonomous vehicles, intelligent transportation systems, robotics, smart spaces, and warehouse environments on the NVIDIA Physical AI Collection on Hugging Face.
Case Study Recipes
The cookbook includes comprehensive use cases demonstrating real-world applications across the Cosmos platform.
Cosmos Predict
Future state prediction and generation
| Workflow | Description | Link |
|---|---|---|
| Inference | Text2Image synthetic data generation for intelligent transportation systems | ITS Synthetic Data Generation |
| Training | Traffic anomaly generation with improved realism and prompt alignment | Traffic Anomaly Generation |
| Training | Synthetic trajectory data generation for humanoid robot learning | GR00T-Dreams |
| Training | LoRA post-training for sports video generation with improved player dynamics and rule coherence | Sports Video Generation |
Advanced Topics: For model compression and deployment optimization, see Distilling Cosmos Predict 2.5 to learn how to distill the model into a 4-step student using DMD2.
Cosmos Transfer
Multi-control video generation and augmentation
| Workflow | Description | Link |
|---|---|---|
| Inference | CARLA simulator-to-real augmentation for traffic anomaly scenarios | CARLA Sim2Real |
| Inference | Multi-control video editing for background replacement, lighting, and object transformation | Real-World Video Manipulation |
| Inference | Domain transfer pipeline for scarce biological datasets using edge-based control and FiftyOne | BioTrove Moths Augmentation |
| Inference | Weather augmentation pipeline for simulation data using multi-modal controls | Weather Augmentation |
| Inference | CG-to-real conversion for multi-view warehouse environments | Warehouse Simulation |
| Inference | Sim2Real data augmentation for robotics navigation tasks | X-Mobility Navigation |
| Inference | Synthetic manipulation motion generation for humanoid robots | GR00T-Mimic |
Cosmos Reason
Vision-language reasoning and quality control
| Workflow | Description | Link |
|---|---|---|
| Training | Physical plausibility check for video quality assessment | Video Rewards |
| Training | Spatial AI understanding for warehouse environments | Spatial AI Warehouse |
| Training | Intelligent transportation scene understanding and analysis | Intelligent Transportation |
| Training | AV video captioning and visual question answering for autonomous vehicles | AV Video Caption VQA |
| Training | Temporal localization for MimicGen robot learning data generation | Temporal Localization |
Cosmos Curator
| Workflow | Description | Link |
|---|---|---|
| Curation | Curate video data for Cosmos Predict 2 post-training | Predict 2 Data Curation |
End-to-End Workflows
| Workflow | Description | Link |
|---|---|---|
| SDG Pipeline | Complete synthetic data generation pipeline for traffic scenarios using CARLA, Cosmos Transfer 2.5, and Cosmos Reason 1 | Smart City SDG |
Cosmos Models for Physical AI
The Cosmos family of open models consists of five core repositories, each targeting specific capabilities in the AI development workflow:
Cosmos Curator - A GPU-accelerated video curation pipeline built on Ray. Supports multi-model analysis, content filtering, annotation, and deduplication for both inference and training data preparation.
Cosmos Predict - A diffusion transformer for future state prediction. Provides text-to-image and video-to-world generation capabilities, with specialized variants for robotics and simulation. Supports custom training for domain-specific prediction tasks.
Cosmos Transfer - A multi-control video generation system with ControlNet and MultiControlNet conditioning (including depth, segmentation, LiDAR, and HDMap). Includes 4K upscaling capabilities and supports training for custom control modalities and domain adaptation.
Cosmos Reason - A 7B vision-language model for physically grounded reasoning. Handles spatial/temporal understanding and chain-of-thought tasks, with fine-tuning support for embodied AI applications and domain-specific reasoning.
Cosmos RL - A distributed training framework supporting both supervised fine-tuning (SFT) and reinforcement learning approaches. Features elastic policy rollout, FP8/FP4 precision support, and optimization for large-scale VLM and LLM training.
All models include pre-trained checkpoints and support custom training for domain-specific adaptation. The diagram below illustrates component interactions across inference and training workflows.

ML/Gen AI Concepts
The cookbook is organized around key concepts spanning (controlled) inference and training use cases:
1. Control Modalities - Master precise control over video generation with Cosmos Transfer 2.5 using Edge, Depth, Segmentation, and Vis modalities. This covers structural preservation, semantic replacement, lighting consistency, and multi-control approaches for achieving high-fidelity, controllable video transformations.
2. Data Curation - Use Cosmos Curator to prepare your datasets with modular, scalable processing pipelines. This includes splitting, captioning, filtering, deduplication, task-specific sampling, and cloud-native or local execution.
3. Model Post-Training - Fine-tune foundation models using your curated data. This covers domain adaptation for Predict (2 and 2.5), Transfer (1 and 2.5), and Reason 1, setup for supervised fine-tuning, LoRA, or reinforcement learning, and use of Cosmos RL for large-scale distributed rollout.
4. Evaluation and Quality Control - Ensure your post-trained models are aligned and robust through metrics, visualization, and qualitative inspection. Leverage Cosmos Reason 1 as a quality filter (e.g. for synthetic data rejection sampling).
5. Model Distillation - Compress large foundation models into smaller, efficient variants while preserving performance. This includes knowledge distillation techniques for Cosmos models, teacher-student training setups, and deployment optimization for edge devices and resource-constrained environments.
Gallery
Visual examples of Cosmos Transfer results across Physical AI domains:
- Robotics Domain Adaptation - Sim-to-real transfer for robotic manipulation with varied materials, lighting, and environments.
- Autonomous Vehicle Domain Adaptation - Multi-control video generation for driving scenes across different weather, lighting, and time-of-day conditions.
Quick Start Paths
This cookbook provides flexible entry points for both inference and training workflows. Each section contains runnable scripts, technical recipes, and complete examples.
- Inference workflows: Getting Started for setup and immediate model deployment
- Physical AI datasets: NVIDIA Physical AI Collection on Hugging Face for curated datasets across domains
- Data processing: Data Processing & Analysis for content analysis workflows
- Training workflows: Model Training & Fine-tuning for domain adaptation
- Case study recipes: Case Study Recipes organized by application area