Unit 5: Train & Deploy — Booster K1 Learning Path

Step 1: Convert Your Dataset to LeRobot Format

python convert_k1_to_lerobot.py \
  --input-dir ./recordings/session_001/ \
  --output-dir ./dataset/k1-pick-place/ \
  --repo-id your-username/k1-pick-place \
  --success-only  # filters to episodes labeled success=true

Step 2: Train with Diffusion Policy

Diffusion Policy works well for whole-body tasks because it handles multi-modal action distributions and produces smooth trajectories. Training takes 3–6 hours on an NVIDIA GPU (16 GB VRAM recommended).

python -m lerobot.scripts.train \
  --dataset_repo_id=your-username/k1-pick-place \
  --policy.type=diffusion \
  --policy.obs_as_global_cond=true \
  --training.num_epochs=300 \
  --training.batch_size=64 \
  --output_dir=./checkpoints/k1-diffusion-v1

# Monitor training (open in browser)
tensorboard --logdir=./checkpoints/k1-diffusion-v1/logs/

Watch the training loss and validation loss curves. Training is complete when validation loss has plateaued for at least 20 epochs. Do not stop training early based on wall time alone.

Step 3: Evaluate in MuJoCo Simulation

python eval_policy_sim.py \
  --checkpoint=./checkpoints/k1-diffusion-v1/checkpoint_300.pt \
  --env=booster_gym/envs/pick_place.py \
  --num_episodes=20 \
  --render

Target: ≥60% success rate in simulation before deploying to real hardware. If below 60%, collect more demonstrations (return to Unit 4) or check your data quality.

Step 4: Live Deployment on the Real K1

⚠ All safety protocols apply during policy deployment. Spotter present, e-stop tested, 3 m × 3 m clear area. During the first live run, have your hand on the e-stop. The policy may generate unexpected motions in edge cases.

python deploy_policy.py \
  --checkpoint=./checkpoints/k1-diffusion-v1/checkpoint_300.pt \
  --robot-ip=192.168.10.102 \
  --cameras head_cam,external \
  --task "pick up the red block" \
  --max-episode-duration=30 \
  --safety-monitor=true

The --safety-monitor flag enables automatic DAMP fallback if joint velocities or torques exceed safety thresholds. Always enable this during initial deployment.

Evaluating Your Policy

Run 20 evaluation trials to get a statistically meaningful success rate:

python eval_policy_live.py \
  --checkpoint=./checkpoints/k1-diffusion-v1/checkpoint_300.pt \
  --robot-ip=192.168.10.102 \
  --num-trials=20 \
  --log-results=./eval_results/k1-diffusion-v1-eval.json

For each trial, reset the scene to the same starting configuration as your training demonstrations. Record the success/fail result and the failure mode for failed trials. Common failure modes: scene variation (lighting, object position), domain shift between sim and real, insufficient training data.

The Data Flywheel

After your first deployment:

Identify your top 3 failure modes from evaluation logs.
Collect targeted demonstrations that cover those failure modes (return to Unit 4).
Mix new episodes with your original dataset (50/50 or weighted toward failures).
Retrain and re-evaluate. Repeat until you reach your target success rate.

Unit 5 Complete When...

You have a trained Diffusion Policy or ACT checkpoint achieving ≥60% success rate in simulation. You have deployed it live to the K1 and run at least 10 real-world evaluation trials. You have identified your top failure modes and have a plan for your next data collection session.

Path Complete

You have gone from safe power-on to a working whole-body imitation learning policy on the Booster K1. Share your results in the SVRC Forum and contribute your dataset to the dataset registry.