Model Recipes — Train GR00T, openpi, ACT, OpenVLA on SignIQ data

Humanoid VLA stack

GR00T-style Humanoid Fine-tuning

Foundation models for generalist humanoid robots. Fine-tune from a LeRobot v2 dataset plus a modality.json that maps state, action, and video keys to the policy's body-part expectations.

Export format

GR00T-LeRobot

Control freq.

20–50 Hz typical

Action repr.

Per body part; the policy handles chunking and tokenization internally

Required data

●LeRobot v2 (meta/info.json, episodes.jsonl, tasks.jsonl)
●Per-camera MP4 video chunks
●Parquet state/action chunks
●meta/modality.json
●Embodiment tag

When to use

Humanoid, bimanual, or semi-humanoid robots where you want a foundation-model VLA baseline with whole-body control.

Recommended SignIQ samples

SignIQ Humanoid — Warehouse Manipulation →SignIQ ALOHA Bimanual — Coffee Prep →SignIQ Collaborative Dual-Arm — Contact-Rich Threading →

Generalist VLA stack

openpi (π0 / π0.5) Fine-tuning

Three-step recipe: convert to LeRobot, define training config, run a policy server for inference. Strong real-world transfer for ALOHA bimanual and Franka-class single-arm.

Export format

LeRobot

Control freq.

10–50 Hz depending on action chunking

Action repr.

Action chunks; flow-policy output for π0.5

Required data

●LeRobot dataset
●Task config (action space, observation keys)
●Optional: human-video co-training data

When to use

When you want a flow-policy baseline with strong sim-to-real transfer (DROID-class single-arm or ALOHA bimanual).

Recommended SignIQ samples

SignIQ ALOHA Bimanual — Coffee Prep →SignIQ Collaborative Single-Arm — Pick & Place →

Bimanual imitation learning

ACT — Action Chunking Transformer

Action-chunked transformer policy designed for low-cost bimanual hardware. Trains directly on HDF5 episodes with multi-camera RGB and joint targets.

Export format

HDF5

Control freq.

30–50 Hz

Action repr.

Action chunks of length 16–100 future steps

Required data

●Multi-camera RGB (front, left wrist, right wrist)
●Joint qpos
●Action chunks
●Episode-level task name

When to use

Bimanual ALOHA-class manipulation with 50–500 demos per task. Strongest baseline before reaching for VLAs.

Recommended SignIQ samples

SignIQ ALOHA Bimanual — Coffee Prep →SignIQ Collaborative Dual-Arm — Contact-Rich Threading →

Multimodal action distributions

Diffusion Policy

Multimodal action-distribution policies via denoising diffusion. Excels when actions are non-unique (multiple valid grasps, contact strategies).

Export format

HDF5 / Zarr / LeRobot

Control freq.

10–30 Hz

Action repr.

Predicted action chunk via diffusion sampler

Required data

●Observation history (image stack or state)
●Action chunks
●Optional: language instruction

When to use

Single-arm or bimanual manipulation with multi-modal action distributions.

Recommended SignIQ samples

SignIQ Collaborative Single-Arm — Pick & Place →SignIQ ALOHA Bimanual — Coffee Prep →

Open VLA pretraining

OpenVLA-style VLA

Tokenized-action VLA pretrained on cross-embodiment manipulation. Image + language → discretized 7-DOF action enables drop-in fine-tuning on language-conditioned data.

Export format

RLDS / LeRobot (OXE-style)

Control freq.

5–15 Hz

Action repr.

Discretized 7-DOF EEF delta + gripper tokens

Required data

●Image observation
●Language instruction
●Discretized 7-DOF action (xyz, rpy, gripper)

When to use

When you need an open VLA pretraining baseline or a cross-embodiment fine-tuning starting point.

Recommended SignIQ samples

SignIQ Collaborative Single-Arm — Pick & Place →

Video + action dynamics

Robot World Models

Predict future video frames conditioned on actions or latent actions. Used for planning, evaluation, and pretraining via human video.

Export format

Video + Parquet

Control freq.

10–30 Hz

Action repr.

Continuous or learned latent actions

Required data

●Continuous video (10s+)
●Action sequences
●State sequences
●Optional: latent action labels

When to use

When you want to scale beyond available robot actions by leveraging human-video corpora for planning and dynamics priors.

Recommended SignIQ samples

SignIQ ALOHA Bimanual — Coffee Prep →SignIQ Humanoid — Warehouse Manipulation →

Whole-body control

Humanoid Whole-Body Policy

Policies that control full-body humanoids — locomotion, balance, manipulation posture, and retargeted motion priors.

Export format

Custom LeRobot / HDF5 with body-channel mapping

Control freq.

50–500 Hz (low-level)

Action repr.

Joint torques or target positions per body chain

Required data

●Full-body proprioception
●Retargeted human motion data
●Tactile / contact signals
●Multi-view RGB

When to use

Whole-body humanoid control combining retargeted motion priors with target-robot teleop.

Recommended SignIQ samples

SignIQ Humanoid — Warehouse Manipulation →

From dataset to deployed policy.

GR00T-style Humanoid Fine-tuning

Required data

When to use

Recommended SignIQ samples

openpi (π0 / π0.5) Fine-tuning

Required data

When to use

Recommended SignIQ samples

ACT — Action Chunking Transformer

Required data

When to use

Recommended SignIQ samples

Diffusion Policy

Required data

When to use

Recommended SignIQ samples

OpenVLA-style VLA

Required data

When to use

Recommended SignIQ samples

Robot World Models

Required data

When to use

Recommended SignIQ samples

Humanoid Whole-Body Policy

Required data

When to use

Recommended SignIQ samples

Have a target model? We'll match the dataset.