Dataset Card v1.0

Manufacturing
Manipulation Dataset

Name: TalosHub Manufacturing Manipulation Dataset
Creator: TalosHub
Published: 2026-04
License: https://taloshub.io/terms.html

Technical specification for buyers and researchers.

Issued April 2026|Maintainer: TalosHub|hello@taloshub.io

01 — Overview

What this dataset is

The TalosHub Manufacturing Manipulation Dataset is a collection of structured, real-world episodes captured from human operators performing industrial manipulation tasks in active manufacturing facilities. Each episode is multi-view video synchronized with machine-state events from real CNC controllers, phase-segmented at millisecond precision, with labeled exception classes and recovery actions. Episodes are delivered as scoped task packs targeting specific workflows — CNC machine tending, manual deburring, mixed-fastener assembly, welding, inspection.

02 — Episode Schema

Canonical episode schema

Every episode is a directory containing synchronized streams plus a manifest.

{
  "episode_id": "ep_042",
  "schema_version": "1.0",
  "task_family": "cnc_machine_tending",
  "machine": {
    "type": "haas_vf2",
    "controller": "fanuc_0i_mf",
    "cell_id": "cell_A3",
    "facility": "facility_001"
  },
  "streams": {
    "video": ["cam_overhead.mp4", "cam_front.mp4", "cam_side.mp4",
              "cam_wrist_r.mp4", "cam_wrist_l.mp4"],
    "machine_state": "machine_events.jsonl",
    "operator_actions": "actions.jsonl"
  },
  "phases": [
    {"id": 1, "label": "approach", "start_ms": 0, "end_ms": 3200},
    {"id": 2, "label": "door_open", "start_ms": 3200, "end_ms": 4800}
  ],
  "exceptions": [
    {"type": "misload", "phase_id": 4, "severity": "moderate",
     "recovery_action": "reposition_and_retry",
     "recovery_success": true, "recovery_duration_ms": 1850}
  ],
  "outcome": "success",
  "quality_score": 0.94,
  "labels": {"split": "train"},
  "capture": {
    "session_id": "sess_012",
    "operator_pseudonym": "OP-A",
    "operator_skill_level": "expert",
    "timestamp": "2026-04-10T06:14:22.847Z"
  }
}

03 — Capture Pipeline

How we capture

Cameras

Each episode includes 3-5 synchronized cameras: overhead (4K/30fps), front operator-facing (4K/30fps), side profile (4K/30fps), and on highest-value packs, two wrist-mounted close-ups (1080p/30fps). All cameras are frame-synchronized via LED flash at session start; multi-camera extrinsics are calibrated with a ChArUco target before each session.

Machine state

Where supported, machine events are captured live from the CNC controller via OPC-UA (Siemens, Heidenhain, modern Fanuc), MTConnect (Haas, Mazak, DMG MORI, Okuma), or FOCAS (older Fanuc). Where live capture is not available, machine state is reconstructed from a fixed camera trained on the HMI display, augmented with external sensors. All events output as JSONL with millisecond-precision timestamps.

Operators

Each pack includes data from a minimum of 3 operators of mixed skill level (junior, mid, expert) to capture behavioral variance. All operators sign consent forms before recording. Real names never appear in deliverables; operators are identified by pseudonyms (OP-A, OP-B, OP-C).

Sync precision

All streams within an episode are aligned to a common reference clock (machine state where available, otherwise overhead camera). Maximum permitted drift between any two streams is 50ms; episodes exceeding this are flagged for manual realignment or rejection.

04 — Annotation Protocol

How we label

Phases

Each episode is decomposed into discrete phases drawn from a fixed per-task-family vocabulary. Phase boundaries are placed at millisecond precision; phases must cover 100% of episode duration with no gaps or overlaps.

Exceptions

Episodes contain natural and induced exceptions drawn from a per-task-family taxonomy (for CNC tending: misload, jam, incomplete_seat, chip_buildup, alarm_response, gripper_slip). Each exception is tagged with severity (minor / moderate / critical), recovery action, recovery success (boolean), and recovery duration (ms).

Inter-rater agreement

Phase boundary placement is validated by a second reviewer for 10% of episodes. Mean inter-rater drift on phase boundaries is reported per pack (target under 200ms).

05 — Quality Gates

Quality threshold per episode

Stream sync drift ≤ 50ms across all cameras and machine-state stream
Phase coverage = 100% of episode duration
Exception annotations include all required fields (type, severity, recovery_action, recovery_success)
Schema validates against canonical Pydantic schema
Quality score ≥ 0.85, computed as: 0.3 × video_quality + 0.3 × label_accuracy + 0.2 × sync_precision + 0.2 × exception_completeness

06 — Coverage Summary

Current and planned coverage

Task Family	Episodes	Status
CNC Machine Tending	50	Ready (Apr 2026)
Manual Deburring & Finishing	50	In capture (Apr-May 2026)
Precision Quality Inspection	50	In capture (Apr-May 2026)
Mixed-Fastener Sub-Assembly	50	Scheduled (May-Jun 2026)
CNC Setup & Changeover	50	Scheduled (May-Jun 2026)
Manual Welding (MIG/TIG)	50	Scheduled (Jun 2026)

07 — Splits & Format

Delivery format

Each pack is split 70/15/15 (train/val/test) by default, stratified by exception type so each split contains proportional exception coverage. Custom split ratios available on request. Episodes ship as a zipped archive (~150-200MB per episode) with a manifest, sample loader (Python, RLDS-compatible), schema documentation, and capture notes.

08 — Limitations

What this dataset is NOT

This dataset is human demonstration data captured for training manipulation policies. It is NOT teleoperation data — robot joint angles, end-effector trajectories, and force/torque sensor data from a robot platform are not included. Buyers integrating with their own robot platform must pair this data with their robot-specific demonstrations or use it for vision/action grounding only.

Episodes are captured from a limited set of facilities (currently 1, expanding to 3 by Q3 2026). Cell-to-cell variance is limited until additional facilities come online. Lighting variance is limited to standard factory conditions; dramatically different lighting environments (outdoor, low-light, high-glare) are not currently represented.

Glossary

Key terms

Episode: A complete captured workflow from start to finish, including all synchronized streams (video, machine state, operator actions) and metadata.
Phase segmentation: The decomposition of an episode into discrete phases at millisecond precision, with each phase tagged with tools used, decisions made, and outcomes.
Machine state events: Real-time events captured from CNC controllers via OPC-UA, MTConnect, or FOCAS protocols. Includes door state, chuck state, cycle timing, alarms.
Exception annotation: Tagging of natural and induced exceptions in captured workflows. Each exception includes type, severity, recovery action, recovery success boolean, and recovery duration.
Quality score: A composite score per episode computed across video quality (30%), label accuracy (30%), sync precision (20%), and exception completeness (20%). Minimum threshold for delivery is 0.85.

09 — Licensing

Licensing terms

Tier 01

Standard

Perpetual license to use the data for model training, evaluation, and deployment. Resale and redistribution restricted.

Tier 02

Exclusive

TalosHub commits not to sell the scoped pack to competitors for an agreed exclusivity window. Premium pricing (typically 2-3x base).

Tier 03

Research

Reduced pricing for academic labs with citation requirements. Open-friendly terms negotiable per project.

10 — Citation

How to cite

TalosHub, Inc. (2026). Manufacturing Manipulation Dataset v1.0, [task family pack name]. https://taloshub.io/dataset-card.html

11 — Get Started

Ready to evaluate?

For methodology questions, schema details, or custom scoping — request a 20-minute call. We share the full sample loader code and 2–3 demonstration episodes after the scoping conversation.

Email hello@taloshub.io Book a call →

Or use the contact form:

ManufacturingManipulation Dataset