docs/guides/training your first model

Training Your First Model

A step-by-step walkthrough of preparing data, launching a training run, interpreting results, and promoting a model to staging.


This guide uses a manipulator arm example throughout. The same steps apply to all robot types and tasks.

1 Prepare Your Dataset

Good training data is the most important factor in model quality. Before uploading, review your recordings:

  • Aim for at least 500 complete task demonstrations for manipulation tasks
  • Recordings should cover edge cases — avoid a dataset of only perfect runs
  • Keep input topics consistent: same sensor, same QoS, same frequency
  • Remove recordings where the robot failed catastrophically (E-stop triggered, etc.)

Upload your prepared dataset:

bash
kairo data upload ./arm_demos.bag \
  --robot rob_a1b2c3d4 \
  --label "arm-pick-place-v1"

# Uploading: 100% ████████████████████ 2.1 GB
# Dataset ID: ds_xyz123
# Status: preprocessing

Wait for preprocessing to complete (usually 2–5 minutes):

bash
kairo data status --dataset ds_xyz123
# Status: ready  |  Samples: 12,480  |  Size: 2.1 GB

2 Launch a Training Run

Start training with the dataset you just uploaded:

bash
kairo train start \
  --dataset ds_xyz123 \
  --robot rob_a1b2c3d4 \
  --task manipulation \
  --name "arm-model-v1"

# Training run started: run_789abc
# Estimated duration: ~4 minutes
# View logs: kairo train logs --run run_789abc --follow
Give your runs descriptive names. They show up in the dashboard history and make it easy to compare results across experiments.

3 Monitor Training Progress

Stream live logs from the training run:

bash
kairo train logs --run run_789abc --follow

[00:00] Initialising training environment...
[00:12] Preprocessing complete — 12,480 samples loaded
[00:45] Epoch  1/20 — loss: 0.512  val_loss: 0.498
[01:30] Epoch  5/20 — loss: 0.241  val_loss: 0.259
[02:15] Epoch 10/20 — loss: 0.148  val_loss: 0.162
[03:00] Epoch 15/20 — loss: 0.098  val_loss: 0.119
[03:45] Epoch 20/20 — loss: 0.072  val_loss: 0.091
[03:52] Training complete.
        Accuracy: 94.3%  |  Val accuracy: 93.1%
        Model size: 48 MB (quantised: 12 MB)
[03:55] Packaging as ROS 2 node...
[03:58] Model ready: mdl_abc123 (v1)

You can also view training metrics in real time from the dashboard under Models → Runs. The dashboard shows loss curves, accuracy over epochs, and a comparison against previous runs.

4 Evaluate Results

Once training completes, inspect the model's metrics:

bash
kairo model info --model mdl_abc123

# Name:           arm-model-v1
# Task:           manipulation
# Architecture:   transformer_v2 (auto-selected)
# Training time:  3m 58s
# Accuracy:       94.3%
# Val accuracy:   93.1%
# Inference latency (p50): 12 ms
# Inference latency (p99): 28 ms
# Model size:     48 MB  |  Quantised: 12 MB
# Status:         staging
A high training accuracy with significantly lower validation accuracy (gap > 5%) indicates overfitting. Try collecting more diverse data or use the --augment flag to enable data augmentation.

5 Promote to Active

If you're satisfied with the results, run the model in simulation first, then promote it:

bash
# Optional: test in simulation before promoting
kairo simulate --model mdl_abc123 --env gazebo

# Promote v1 to active
kairo model promote --model mdl_abc123 --version v1

# Deploy to your robot
kairo deploy push --model mdl_abc123 --robot rob_a1b2c3d4