Our Approach — Manufacturing Training Data for Robotics

THE GAP

Why lab data breaks in production

A manipulation policy trained on lab demonstrations often degrades when transferred to a real manufacturing cell. The reasons are specific and measurable:

Machine-state dependencies go uncaptured. A CNC load/unload task isn't just pick-and-place — it requires awareness of door state, chuck state, cycle timing, coolant status, and alarm conditions. Lab data has none of this.

Fixture and part variance disappears. In a lab, parts are uniform and fixtures are ideal. On a shop floor, tray positions shift, parts have burrs and chips, fixtures wear, and operators develop workarounds that a training set needs to include.

Exceptions are the actual deployment blocker. The nominal cycle works. The misload, the jam, the incomplete seat, the alarm recovery — these are what cause intervention rates to stay high. Lab datasets capture the happy path. Factory data captures the real path.

Transfer between cells fails silently. Two cells running the same part on the same machine model will have different lighting, different fixture geometry, and different operator habits. Without multi-cell variance in training data, policies look good in eval and fail in deployment.

CNC machine operators performing precision manufacturing tasks

Real operators. Real machines. Real exceptions.

Lab Data

✗ No machine state context

✗ Uniform parts, ideal conditions

✗ Happy path only

✗ Single environment

✗ No operator variance

TalosHub Factory Data

✓ Machine state aligned (door, chuck, cycle, HMI)

✓ Real parts with variance and wear

✓ 2–8 exception classes per pack

✓ Multi-cell, multi-facility capture

✓ Expert operator demonstrations

DATA ARCHITECTURE

A data contract, not a data dump

Every episode follows a canonical schema designed for downstream training pipelines.

episode_schema.json

copy

1{
"episode_id": "ep_042",
"task_family": "cnc_machine_tending",
"machine": {
  "type": "haas_vf2",
  "cell_id": "cell_A3",
  "facility": "facility_midwest_01"
},
"streams": {
  "video": ["cam_front.mp4", "cam_side.mp4", "cam_overhead.mp4"],
  "machine_state": "machine_events.jsonl",
  "operator_actions": "actions.jsonl"
},
"phases": [
  {"id": 1, "label": "approach",               "start_ms": 0,     "end_ms": 3200},
  {"id": 2, "label": "door_open",              "start_ms": 3200,  "end_ms": 4800},
  {"id": 3, "label": "part_extract",           "start_ms": 4800,  "end_ms": 8400},
  {"id": 4, "label": "part_load",              "start_ms": 8400,  "end_ms": 13200},
  {"id": 5, "label": "door_close_cycle_start",  "start_ms": 13200, "end_ms": 15600}
],
"exceptions": [
  {"type": "misload", "phase": 4, "recovery": "reposition_and_retry"}
],
"outcome": "success",
"labels": {
  "split": "train",
  "quality_score": 0.94
}
29}

Multi-stream capture

Every episode includes synchronized video from 2–4 camera angles, timestamped machine events from the CNC controller, and operator action logs. Streams are aligned to a common clock.

Phase segmentation

Each episode is decomposed into discrete manipulation phases — approach, grasp, transport, load, verify — with millisecond-precision boundaries. Your pipeline can train on full episodes or individual phases.

Exception taxonomy

We don't just capture the nominal cycle. Each task pack includes 2–8 exception classes: misloads, retries, jams, incomplete seating, alarm responses, and operator recovery patterns — labeled with the specific recovery action taken.

COMPLEXITY

Manufacturing data is a different engineering problem

Machine-State Coupling

A robot's next valid action depends on states it can't observe from vision alone — is the chuck clamped? Is the door interlock engaged? Is the cycle timer running? We capture these signals directly from the machine controller and align them to the video timeline.

Non-Stationary Environments

Factory conditions drift between shifts. Lighting changes, coolant levels drop, fixtures wear, and new part batches arrive with different surface characteristics. We capture across these variations deliberately rather than controlling for them.

Recovery is the Product

The nominal cycle is the easy part. What happens when a part doesn't seat, a gripper slips, or an alarm fires? These recovery demonstrations are what separate a policy that works in eval from one that survives deployment. We prioritize exception capture.

Cell-to-Cell Transfer

Two "identical" cells are never identical. Different cable routing, different fixture wear, different operator habits, different ambient conditions. We capture across cells so your model learns the invariant task structure, not the specifics of one workstation.

COVERAGE

Manufacturing-native, machine-centered

One schema, any machine-centered workflow.

CNC Machine Tending

Load/unload, part presentation,
cycle-start sequencing

Haas, Mazak, DMG MORI, Fanuc. Vertical and horizontal machining centers. We capture the full operator interaction: door management, chuck operations, part seating verification, cycle initiation, and in-process monitoring.

Press & Brake Operations

Infeed, forming, extraction,
safety-sequenced handling

Hydraulic and servo press brakes. Multi-stage forming sequences with safety-interlock awareness, force-context transitions, and part-flow tracking across stations.

Metrology & Inspection

Measurement handoff, orientation,
result-linked decisions

CMM loading, optical inspection placement, gauge-based verification. Machine results feed back into the episode labels — measurement outcome is linked to the manipulation that preceded it.

Finishing & Surface Work

Tool use, force modulation,
quality-outcome labeling

Grinding, deburring, polishing. Force-sensitive phases where the manipulation strategy depends on real-time surface feedback. Quality outcomes are labeled per episode.

DELIVERABLE

What ships in a task pack

~/task-packs/seahawk-cnc-tending-v1

seahawk-cnc-tending-v1/ ├── dataset/ │ ├── episodes/ │ │ ├── ep_001/ (cam_front.mp4, cam_side.mp4, machine.jsonl, actions.jsonl) │ │ ├── ep_002/ │ │ └── ... (50 episodes) │ └── manifest.json ├── labels/ │ ├── phases.csv │ ├── exceptions.csv │ └── outcomes.csv ├── splits/ │ ├── train.txt (35 episodes) │ ├── val.txt (8 episodes) │ └── test.txt (7 episodes) ├── docs/ │ ├── schema.md │ ├── taxonomy.md │ └── capture_notes.md ├── sample_loader.py ├── README.md └── DELIVERY_NOTES.md

50

Episodes

Multi-view, machine-state synchronized, quality-scored. Train/val/test splits included.

ms

Phase Labels

Millisecond-precision segmentation across 5–12 phases per task. Train on full episodes or slices.

2–8

Exception Classes

Labeled exception types with recovery action annotations. Not just the nominal cycle.

↓

Ready to Ingest

Sample loader, schema docs, and a README that explains every field. No ambiguity on handoff.

Bringing AI to the
factory floor

Why lab data breaks in production

A data contract, not a data dump

Manufacturing data is a different engineering problem

Manufacturing-native, machine-centered

What ships in a task pack

Your model needs factory data.
We have the factories.

Bringing AI to thefactory floor

Why lab data breaks in production

A data contract, not a data dump

Manufacturing data is a different engineering problem

Manufacturing-native, machine-centered

What ships in a task pack

Your model needs factory data.We have the factories.

Bringing AI to the
factory floor

Your model needs factory data.
We have the factories.