Step 1: Convert Your Dataset to LeRobot Format
python convert_k1_to_lerobot.py \
--input-dir ./recordings/session_001/ \
--output-dir ./dataset/k1-pick-place/ \
--repo-id your-username/k1-pick-place \
--success-only # filters to episodes labeled success=true
Step 2: Train with Diffusion Policy
Diffusion Policy works well for whole-body tasks because it handles multi-modal action distributions and produces smooth trajectories. Training takes 3–6 hours on an NVIDIA GPU (16 GB VRAM recommended).
python -m lerobot.scripts.train \
--dataset_repo_id=your-username/k1-pick-place \
--policy.type=diffusion \
--policy.obs_as_global_cond=true \
--training.num_epochs=300 \
--training.batch_size=64 \
--output_dir=./checkpoints/k1-diffusion-v1
# Monitor training (open in browser)
tensorboard --logdir=./checkpoints/k1-diffusion-v1/logs/
Watch the training loss and validation loss curves. Training is complete when validation loss has plateaued for at least 20 epochs. Do not stop training early based on wall time alone.
Step 3: Evaluate in MuJoCo Simulation
python eval_policy_sim.py \
--checkpoint=./checkpoints/k1-diffusion-v1/checkpoint_300.pt \
--env=booster_gym/envs/pick_place.py \
--num_episodes=20 \
--render
Target: ≥60% success rate in simulation before deploying to real hardware. If below 60%, collect more demonstrations (return to Unit 4) or check your data quality.
Step 4: Live Deployment on the Real K1
python deploy_policy.py \
--checkpoint=./checkpoints/k1-diffusion-v1/checkpoint_300.pt \
--robot-ip=192.168.10.102 \
--cameras head_cam,external \
--task "pick up the red block" \
--max-episode-duration=30 \
--safety-monitor=true
The --safety-monitor flag enables automatic DAMP fallback if joint velocities or torques exceed safety thresholds. Always enable this during initial deployment.
Evaluating Your Policy
Run 20 evaluation trials to get a statistically meaningful success rate:
python eval_policy_live.py \
--checkpoint=./checkpoints/k1-diffusion-v1/checkpoint_300.pt \
--robot-ip=192.168.10.102 \
--num-trials=20 \
--log-results=./eval_results/k1-diffusion-v1-eval.json
For each trial, reset the scene to the same starting configuration as your training demonstrations. Record the success/fail result and the failure mode for failed trials. Common failure modes: scene variation (lighting, object position), domain shift between sim and real, insufficient training data.
The Data Flywheel
After your first deployment:
- Identify your top 3 failure modes from evaluation logs.
- Collect targeted demonstrations that cover those failure modes (return to Unit 4).
- Mix new episodes with your original dataset (50/50 or weighted toward failures).
- Retrain and re-evaluate. Repeat until you reach your target success rate.
Unit 5 Complete When...
You have a trained Diffusion Policy or ACT checkpoint achieving ≥60% success rate in simulation. You have deployed it live to the K1 and run at least 10 real-world evaluation trials. You have identified your top failure modes and have a plan for your next data collection session.
Path Complete
You have gone from safe power-on to a working whole-body imitation learning policy on the Booster K1. Share your results in the SVRC Forum and contribute your dataset to the dataset registry.