Hugging Face Model Support
Overview
Cosmos-RL provides comprehensive support for Hugging Face models through its generic HFModel
wrapper. This design enables seamless integration of any LLMs or VLMs compatible with HF Transformers into the training and inference pipeline.
Architecture Design
The HFModel
class, implemented in cosmos_rl/policy/model/hf_models/__init__.py
, serves as a universal adapter for Hugging Face models. Key design principles include:
- Universal Compatibility
The wrapper supports any LLM or VLM architecture available through Hugging Face’s model ecosystem.
- Automatic Fallback Mechanism
When a model lacks custom implementation in the
cosmos_rl/policy/model
directory, the system automatically defaults to the HF Model wrapper, ensuring broad compatibility without requiring custom code.- Seamless Integration
Model configuration loading via
AutoConfig.from_pretrained
Weight loading through
model_class.from_pretrained
(model_class
is defined in “architectures” of config.json)
- Distributed Training Ready
The wrapper is optimized for distributed training strategies, particularly FSDP (Fully Sharded Data Parallel), with built-in support for: - Weight synchronization - Rotary embedding handling - Parallelism compatibility checks
- Extensibility
Users can rapidly experiment with new or custom Hugging Face models without waiting for explicit cosmos-rl support, making the framework future-proof and adaptable to the evolving AI model landscape.
Distributed Training Support
Currently, the Policy module exclusively supports Fully Sharded Data Parallel (FSDP) for distributed training and inference operations.
Training Examples
SFT
Note:
For LLMs like Mistral that require special handling to avoid role alternation errors, use the provided
dummy_sft.py
script.For VLMs, use the provided
hf_vlm_sft.py
script.
# LLM SFT
cosmos-rl --config configs/mistral/mistral-7b-fsdp4-sft.toml ./tools/dataset/dummy_sft.py
# VLM SFT
cosmos-rl --config configs/gemma/gemma3-12b-vlm-fsdp4-sft.toml ./tools/dataset/hf_vlm_sft.py
GRPO
# LLM GRPO
cosmos-rl --config configs/phi/phi4-14b-p-fsdp4-r-tp2-grpo.toml ./tools/dataset/gsm8k_grpo.py
Supported Models
The following models have been thoroughly tested and validated:
LLMs
Mistral
Gemma
Phi
GPT-OSS
VLMs
Gemma VLM
Llama VLM
LLava
Note
While models above have been extensively tested, other Hugging Face models should also be compatible. If you encounter issues with untested models, please report them by opening an issue in our repository.
Tip
The generic HF Model wrapper enables support for most Hugging Face models without additional configuration. For optimal performance, consider using models that have been specifically tested and optimized.