Humanoid VLA stack
GR00T-style Humanoid Fine-tuning
Foundation models for generalist humanoid robots. Fine-tune from a LeRobot v2 dataset plus a modality.json that maps state, action, and video keys to the policy's body-part expectations.
Export format
GR00T-LeRobot
Control freq.
20–50 Hz typical
Action repr.
Per body part; the policy handles chunking and tokenization internally
Required data
- ●LeRobot v2 (meta/info.json, episodes.jsonl, tasks.jsonl)
- ●Per-camera MP4 video chunks
- ●Parquet state/action chunks
- ●meta/modality.json
- ●Embodiment tag
When to use
Humanoid, bimanual, or semi-humanoid robots where you want a foundation-model VLA baseline with whole-body control.
Recommended SignIQ samples
Generalist VLA stack
openpi (π0 / π0.5) Fine-tuning
Three-step recipe: convert to LeRobot, define training config, run a policy server for inference. Strong real-world transfer for ALOHA bimanual and Franka-class single-arm.
Control freq.
10–50 Hz depending on action chunking
Action repr.
Action chunks; flow-policy output for π0.5
Required data
- ●LeRobot dataset
- ●Task config (action space, observation keys)
- ●Optional: human-video co-training data
When to use
When you want a flow-policy baseline with strong sim-to-real transfer (DROID-class single-arm or ALOHA bimanual).
Recommended SignIQ samples
Bimanual imitation learning
ACT — Action Chunking Transformer
Action-chunked transformer policy designed for low-cost bimanual hardware. Trains directly on HDF5 episodes with multi-camera RGB and joint targets.
Action repr.
Action chunks of length 16–100 future steps
Required data
- ●Multi-camera RGB (front, left wrist, right wrist)
- ●Joint qpos
- ●Action chunks
- ●Episode-level task name
When to use
Bimanual ALOHA-class manipulation with 50–500 demos per task. Strongest baseline before reaching for VLAs.
Recommended SignIQ samples
Multimodal action distributions
Diffusion Policy
Multimodal action-distribution policies via denoising diffusion. Excels when actions are non-unique (multiple valid grasps, contact strategies).
Export format
HDF5 / Zarr / LeRobot
Action repr.
Predicted action chunk via diffusion sampler
Required data
- ●Observation history (image stack or state)
- ●Action chunks
- ●Optional: language instruction
When to use
Single-arm or bimanual manipulation with multi-modal action distributions.
Recommended SignIQ samples
Open VLA pretraining
OpenVLA-style VLA
Tokenized-action VLA pretrained on cross-embodiment manipulation. Image + language → discretized 7-DOF action enables drop-in fine-tuning on language-conditioned data.
Export format
RLDS / LeRobot (OXE-style)
Action repr.
Discretized 7-DOF EEF delta + gripper tokens
Required data
- ●Image observation
- ●Language instruction
- ●Discretized 7-DOF action (xyz, rpy, gripper)
When to use
When you need an open VLA pretraining baseline or a cross-embodiment fine-tuning starting point.
Recommended SignIQ samples
Video + action dynamics
Robot World Models
Predict future video frames conditioned on actions or latent actions. Used for planning, evaluation, and pretraining via human video.
Export format
Video + Parquet
Action repr.
Continuous or learned latent actions
Required data
- ●Continuous video (10s+)
- ●Action sequences
- ●State sequences
- ●Optional: latent action labels
When to use
When you want to scale beyond available robot actions by leveraging human-video corpora for planning and dynamics priors.
Recommended SignIQ samples
Whole-body control
Humanoid Whole-Body Policy
Policies that control full-body humanoids — locomotion, balance, manipulation posture, and retargeted motion priors.
Export format
Custom LeRobot / HDF5 with body-channel mapping
Control freq.
50–500 Hz (low-level)
Action repr.
Joint torques or target positions per body chain
Required data
- ●Full-body proprioception
- ●Retargeted human motion data
- ●Tactile / contact signals
- ●Multi-view RGB
When to use
Whole-body humanoid control combining retargeted motion priors with target-robot teleop.
Recommended SignIQ samples