OUR APPROACH

Bringing AI to the
factory floor

The biggest gap in robotics training data isn't volume — it's context. Lab demonstrations don't carry machine state, cycle timing, fixture variance, or the exception patterns that define real manufacturing work. We close that gap.

Why lab data breaks in production

A manipulation policy trained on lab demonstrations often degrades when transferred to a real manufacturing cell. The reasons are specific and measurable:

Machine-state dependencies go uncaptured. A CNC load/unload task isn't just pick-and-place — it requires awareness of door state, chuck state, cycle timing, coolant status, and alarm conditions. Lab data has none of this.

Fixture and part variance disappears. In a lab, parts are uniform and fixtures are ideal. On a shop floor, tray positions shift, parts have burrs and chips, fixtures wear, and operators develop workarounds that a training set needs to include.

Exceptions are the actual deployment blocker. The nominal cycle works. The misload, the jam, the incomplete seat, the alarm recovery — these are what cause intervention rates to stay high. Lab datasets capture the happy path. Factory data captures the real path.

Transfer between cells fails silently. Two cells running the same part on the same machine model will have different lighting, different fixture geometry, and different operator habits. Without multi-cell variance in training data, policies look good in eval and fail in deployment.

CNC machine operators performing precision manufacturing tasks Real operators. Real machines. Real exceptions.
Lab Data
No machine state context
Uniform parts, ideal conditions
Happy path only
Single environment
No operator variance
TalosHub Factory Data
Machine state aligned (door, chuck, cycle, HMI)
Real parts with variance and wear
2–8 exception classes per pack
Multi-cell, multi-facility capture
Expert operator demonstrations

A data contract, not a data dump

Every episode follows a canonical schema designed for downstream training pipelines.

episode_schema.json
copy
1{
2 "episode_id": "ep_042",
3 "task_family": "cnc_machine_tending",
4 "machine": {
5 "type": "haas_vf2",
6 "cell_id": "cell_A3",
7 "facility": "facility_midwest_01"
8 },
9 "streams": {
10 "video": ["cam_front.mp4", "cam_side.mp4", "cam_overhead.mp4"],
11 "machine_state": "machine_events.jsonl",
12 "operator_actions": "actions.jsonl"
13 },
14 "phases": [
15 {"id": 1, "label": "approach", "start_ms": 0, "end_ms": 3200},
16 {"id": 2, "label": "door_open", "start_ms": 3200, "end_ms": 4800},
17 {"id": 3, "label": "part_extract", "start_ms": 4800, "end_ms": 8400},
18 {"id": 4, "label": "part_load", "start_ms": 8400, "end_ms": 13200},
19 {"id": 5, "label": "door_close_cycle_start", "start_ms": 13200, "end_ms": 15600}
20 ],
21 "exceptions": [
22 {"type": "misload", "phase": 4, "recovery": "reposition_and_retry"}
23 ],
24 "outcome": "success",
25 "labels": {
26 "split": "train",
27 "quality_score": 0.94
28 }
29}
Multi-stream capture

Every episode includes synchronized video from 2–4 camera angles, timestamped machine events from the CNC controller, and operator action logs. Streams are aligned to a common clock.

Phase segmentation

Each episode is decomposed into discrete manipulation phases — approach, grasp, transport, load, verify — with millisecond-precision boundaries. Your pipeline can train on full episodes or individual phases.

Exception taxonomy

We don't just capture the nominal cycle. Each task pack includes 2–8 exception classes: misloads, retries, jams, incomplete seating, alarm responses, and operator recovery patterns — labeled with the specific recovery action taken.

Manufacturing data is a different engineering problem

Machine-State Coupling

A robot's next valid action depends on states it can't observe from vision alone — is the chuck clamped? Is the door interlock engaged? Is the cycle timer running? We capture these signals directly from the machine controller and align them to the video timeline.

Non-Stationary Environments

Factory conditions drift between shifts. Lighting changes, coolant levels drop, fixtures wear, and new part batches arrive with different surface characteristics. We capture across these variations deliberately rather than controlling for them.

Recovery is the Product

The nominal cycle is the easy part. What happens when a part doesn't seat, a gripper slips, or an alarm fires? These recovery demonstrations are what separate a policy that works in eval from one that survives deployment. We prioritize exception capture.

Cell-to-Cell Transfer

Two "identical" cells are never identical. Different cable routing, different fixture wear, different operator habits, different ambient conditions. We capture across cells so your model learns the invariant task structure, not the specifics of one workstation.

Manufacturing-native, machine-centered

One schema, any machine-centered workflow.

CNC Machine Tending
Load/unload, part presentation,
cycle-start sequencing

Haas, Mazak, DMG MORI, Fanuc. Vertical and horizontal machining centers. We capture the full operator interaction: door management, chuck operations, part seating verification, cycle initiation, and in-process monitoring.

Press & Brake Operations
Infeed, forming, extraction,
safety-sequenced handling

Hydraulic and servo press brakes. Multi-stage forming sequences with safety-interlock awareness, force-context transitions, and part-flow tracking across stations.

Metrology & Inspection
Measurement handoff, orientation,
result-linked decisions

CMM loading, optical inspection placement, gauge-based verification. Machine results feed back into the episode labels — measurement outcome is linked to the manipulation that preceded it.

Finishing & Surface Work
Tool use, force modulation,
quality-outcome labeling

Grinding, deburring, polishing. Force-sensitive phases where the manipulation strategy depends on real-time surface feedback. Quality outcomes are labeled per episode.

What ships in a task pack

~/task-packs/seahawk-cnc-tending-v1
seahawk-cnc-tending-v1/ ├── dataset/ │ ├── episodes/ │ │ ├── ep_001/ (cam_front.mp4, cam_side.mp4, machine.jsonl, actions.jsonl) │ │ ├── ep_002/ │ │ └── ... (50 episodes) │ └── manifest.json ├── labels/ │ ├── phases.csv │ ├── exceptions.csv │ └── outcomes.csv ├── splits/ │ ├── train.txt (35 episodes) │ ├── val.txt (8 episodes) │ └── test.txt (7 episodes) ├── docs/ │ ├── schema.md │ ├── taxonomy.md │ └── capture_notes.md ├── sample_loader.py ├── README.md └── DELIVERY_NOTES.md
50
Episodes

Multi-view, machine-state synchronized, quality-scored. Train/val/test splits included.

ms
Phase Labels

Millisecond-precision segmentation across 5–12 phases per task. Train on full episodes or slices.

2–8
Exception Classes

Labeled exception types with recovery action annotations. Not just the nominal cycle.

Ready to Ingest

Sample loader, schema docs, and a README that explains every field. No ambiguity on handoff.

Your model needs factory data.
We have the factories.

Tell us the workflow. We'll scope a task pack.

Get in Touch
hello@taloshub.io