Model Recipes

From dataset to deployed policy.

Every recipe answers four questions: what data does the model need, what format, what control frequency, and which SignIQ or public sample makes a good starting point. Click any sample badge to open the interactive inspector.

Humanoid VLA stack

GR00T-style Humanoid Fine-tuning

Foundation models for generalist humanoid robots. Fine-tune from a LeRobot v2 dataset plus a modality.json that maps state, action, and video keys to the policy's body-part expectations.

Export format
GR00T-LeRobot
Control freq.
20–50 Hz typical
Action repr.
Per body part; the policy handles chunking and tokenization internally

Required data

  • LeRobot v2 (meta/info.json, episodes.jsonl, tasks.jsonl)
  • Per-camera MP4 video chunks
  • Parquet state/action chunks
  • meta/modality.json
  • Embodiment tag

When to use

Humanoid, bimanual, or semi-humanoid robots where you want a foundation-model VLA baseline with whole-body control.

Generalist VLA stack

openpi (π0 / π0.5) Fine-tuning

Three-step recipe: convert to LeRobot, define training config, run a policy server for inference. Strong real-world transfer for ALOHA bimanual and Franka-class single-arm.

Export format
LeRobot
Control freq.
10–50 Hz depending on action chunking
Action repr.
Action chunks; flow-policy output for π0.5

Required data

  • LeRobot dataset
  • Task config (action space, observation keys)
  • Optional: human-video co-training data

When to use

When you want a flow-policy baseline with strong sim-to-real transfer (DROID-class single-arm or ALOHA bimanual).

Bimanual imitation learning

ACT — Action Chunking Transformer

Action-chunked transformer policy designed for low-cost bimanual hardware. Trains directly on HDF5 episodes with multi-camera RGB and joint targets.

Export format
HDF5
Control freq.
30–50 Hz
Action repr.
Action chunks of length 16–100 future steps

Required data

  • Multi-camera RGB (front, left wrist, right wrist)
  • Joint qpos
  • Action chunks
  • Episode-level task name

When to use

Bimanual ALOHA-class manipulation with 50–500 demos per task. Strongest baseline before reaching for VLAs.

Multimodal action distributions

Diffusion Policy

Multimodal action-distribution policies via denoising diffusion. Excels when actions are non-unique (multiple valid grasps, contact strategies).

Export format
HDF5 / Zarr / LeRobot
Control freq.
10–30 Hz
Action repr.
Predicted action chunk via diffusion sampler

Required data

  • Observation history (image stack or state)
  • Action chunks
  • Optional: language instruction

When to use

Single-arm or bimanual manipulation with multi-modal action distributions.

Open VLA pretraining

OpenVLA-style VLA

Tokenized-action VLA pretrained on cross-embodiment manipulation. Image + language → discretized 7-DOF action enables drop-in fine-tuning on language-conditioned data.

Export format
RLDS / LeRobot (OXE-style)
Control freq.
5–15 Hz
Action repr.
Discretized 7-DOF EEF delta + gripper tokens

Required data

  • Image observation
  • Language instruction
  • Discretized 7-DOF action (xyz, rpy, gripper)

When to use

When you need an open VLA pretraining baseline or a cross-embodiment fine-tuning starting point.

Video + action dynamics

Robot World Models

Predict future video frames conditioned on actions or latent actions. Used for planning, evaluation, and pretraining via human video.

Export format
Video + Parquet
Control freq.
10–30 Hz
Action repr.
Continuous or learned latent actions

Required data

  • Continuous video (10s+)
  • Action sequences
  • State sequences
  • Optional: latent action labels

When to use

When you want to scale beyond available robot actions by leveraging human-video corpora for planning and dynamics priors.

Whole-body control

Humanoid Whole-Body Policy

Policies that control full-body humanoids — locomotion, balance, manipulation posture, and retargeted motion priors.

Export format
Custom LeRobot / HDF5 with body-channel mapping
Control freq.
50–500 Hz (low-level)
Action repr.
Joint torques or target positions per body chain

Required data

  • Full-body proprioception
  • Retargeted human motion data
  • Tactile / contact signals
  • Multi-view RGB

When to use

Whole-body humanoid control combining retargeted motion priors with target-robot teleop.

Have a target model? We'll match the dataset.

Tell us which model recipe you're building toward — embodiment, action space, control frequency, language conditioning — and we'll scope a collection run that drops in.