Manufacturing data for robotics

Manufacturing training data for robots that have to work in factories

TalosHub captures real manufacturing workflows and turns them into structured, machine-state-aligned task packs: multi-view video, phase labels, exception and recovery annotations, schema documentation, sample loaders, and train/val/test splits.

Captured directly from operating manufacturing facilities. Real operators, real machines, real production environments — not laboratory demonstrations.

Why now

Robotics is moving from demos to deployment

Robotics is moving from demos to deployment. The open datasets that helped general manipulation research — Open X-Embodiment, Bridge, DROID — are not enough for factory deployment. They usually lack machine-state context, fixture variance, operator workarounds, and recovery behavior. That is the data gap TalosHub fills.

Industrial robot installations reached US$16.7B in global market value as of the most recent IFR data — a record high — and humanoid programs are widely understood to be bottlenecked on real-world manipulation data with task-specific context. The teams that win the next 18 months are the ones with structured, high-density manufacturing data. The teams building it themselves are losing 6–12 months to facility access and operator wrangling. That is the gap TalosHub fills.

The open data gap

What Open X-Embodiment can't give you

Public datasets cover lab manipulation. They lack machine-state context, fixture variance, operator workarounds, and recovery behavior — the signals that determine whether a policy survives factory deployment.

Time cost

6–12 months

Teams building manufacturing capture programs in-house lose 6–12 months to facility access negotiations, operator recruitment, capture rig engineering, and annotation workflow setup.

The humanoid push

Bottlenecked at deployment

Humanoid programs across the field are widely understood to be bottlenecked on real-world manipulation data with task-specific context — not on compute, not on architecture, but on data.

Source: International Federation of Robotics, World Robotics 2026 trends report.

Inside the factory

Factory data your robot cannot get from a lab

Most robotics training data comes from controlled labs. Factories are different: CNC doors open and close, chucks clamp and release, alarms fire, chips build up, fixtures drift, operators improvise, and recovery behavior matters. TalosHub captures those real workflows inside active manufacturing facilities and packages them into structured episodes your team can inspect, load, and train against.

Not raw footage. Not generic datasets. Scoped, labeled, segmented, documented, and white-labeled under your brand.

EVENT STREAM · LAST 5 MIN LIVE
14:32:08 CHUCK clamp engaged · 4250 N
14:33:21 DOOR closed · cycle start
14:35:47 CHIP buildup detected · CAM-A
14:38:15 DOOR opened · part inspect
14:41:02 ALARM tool wear threshold · op recovery
Complementary capability

Bring your own factory video

For organizations with existing factory video assets, TalosHub can process source material into the same canonical structured format we deliver from our own captures. Same schema. Same annotations. Same quality gates. Different data source. Common engagement pattern for manufacturers with established operator video archives who want to convert that data into trainable assets.

Deliverables

What a task pack includes

01 · SCOPE

Scope

Task family, episode count, exception coverage, machine families. Defined before capture.

family deburring
episodes 50
exceptions 4
02 · CAPTURE

Multi-view capture

Up to five synchronized cameras per workflow. Three fixed positions (overhead, front, side) plus optional wrist-mounted cameras for fine-motion detail. Synchronized streams with ≤50ms maximum drift.

03 · STATE

Machine state

Live events from CNC controllers via OPC-UA, MTConnect, or FOCAS. Door, chuck, alarms, cycle timing — millisecond-timestamped machine events aligned to video within the episode sync tolerance.

04 · SEGMENT

Phase segments

Workflow decomposed into millisecond-precision phases. 100% coverage, validated by inter-rater agreement.

05 · EXCEPTIONS

Exceptions

Each exception tagged with severity, recovery action, and success state. Per-task-family taxonomy.

MISSED_BURR → rework
TOOL_WEAR → tool_swap
PART_DRIFT → refixture
06 · QA

QA gates

Per-episode quality scores across video, labels, sync, and exceptions. Gates at 0.85 minimum.

0.94
0.0 THRESHOLD 0.85 1.0
07 · SPLITS

Splits

Pre-stratified by exception type so each split contains proportional coverage. 70/15/15 default.

TRAIN 70%
VAL
TEST
08 · DOCS

Docs & loader

Schema reference, taxonomy, capture notes, quality assessment, and a Python loader for PyTorch and TensorFlow.

📁 task_pack_v1/
📄 README.md
📄 schema.json
📄 taxonomy.json
📁 episodes/
📁 splits/
📄 loader.py
09 · HANDOFF

White-label

Pack delivered under your branding for customer-facing artifacts. Internal provenance metadata and licensing terms preserved per agreement.

your_company_
brand_assets/
task_pack.zip
How it works

From scoped workflow to usable dataset in 4 weeks

Every task pack follows the same operating model: scope the workflow, capture synchronized episodes, structure and validate labels, package the dataset, then hand off with a review and gap map. The output is not a folder of video. It is a dataset your team can evaluate immediately.

01

Scope

Define task family, episode count, exception coverage, and machine families. We work with your team to lock the scope before any capture begins.

WEEK 1
02

Capture

Up to five synchronized cameras running inside an active manufacturing facility. Millisecond-timestamped machine-state events aligned to video within the episode sync tolerance.

WEEK 1–2
03

Structure

Phase segmentation, exception annotation, behavioral metadata extraction, and inter-rater validation on a sample of episodes.

WEEK 2–3
04

Package

Canonical episode JSON, train/val/test splits, sample loader, schema documentation, and per-episode quality scores. Quality gates at 0.85 minimum.

WEEK 3–4
05

Deliver

White-labeled task pack delivered to your team. Walkthrough session covering schema, loader integration, and a gap map for any future capture.

WEEK 4

Built for the teams making robots work in factories

Three buyer paths. Three different evaluation criteria. Same canonical task pack.

For Robot Learning Engineers

Validate before you commit

Load the sample manifest, inspect phase labels, validate sync precision, and test the loader before committing to a full pack. Every TalosHub pack ships with the same RLDS-compatible schema and reference loader you can run locally on a sample episode.

Request a sample task pack →

Schema, sample manifest, and loader code shared before scoping.

For VP / Head of AI

Close the factory data gap

Close the factory-data gap without spending months building capture operations, facility access, and annotation workflows in-house. TalosHub captures inside operating manufacturing facilities and delivers structured task packs in 4 weeks — not the 6 to 12 months an in-house program takes.

See the engagement model →

See how a 4-week task pack engagement actually works.

For CEO / VP Product / BD

White-labeled task packs for customer pilots

Support a manufacturing pilot with a white-labeled task pack scoped to the customer workflow you need to prove. The data ships under your brand, follows your customer's task definition, and lets your team start training without operational lift on the customer side.

Book a 20-min scoping call →

20-minute call. Workflow scoping, not pitch.

Data handling and operator consent

Client-commissioned datasets are tenant-isolated and delivered only under the terms of the client agreement. Reference datasets and partner-facility datasets are governed by separate consent, facility, and licensing agreements. Operator consent is recorded for every capture session, and no real operator names are included in client deliverables.

Operator consent recorded per session
Pseudonymization in all client deliverables
Facility-controlled data segregation

Common questions

What kind of robotics training data does TalosHub provide?
TalosHub provides manufacturing-native training data for robotics companies. We capture real operator workflows from CNC machine tending, press/brake operations, metrology, and finishing processes. Every dataset includes multi-view video, machine-state alignment, phase segmentation, exception coverage, and success/failure labels — structured and ready for VLA model training, imitation learning, and robot manipulation research.
Is this teleoperation data?
By default, no. TalosHub captures human demonstration data from real manufacturing workflows. It includes multi-view video, machine-state timelines, operator actions, phases, exceptions, outcomes, and QA metadata. Robot joint angles, end-effector trajectories, and force/torque streams are only included when a client-specific capture program collects them directly from a robot platform.
How do we evaluate quality before committing?
We share a sample schema, sample manifest, loader code, and a small demo episode set before engagement. Each delivered episode must pass sync precision, phase coverage, annotation completeness, schema validation, and quality-score gates. Episodes scoring below 0.85 are flagged for rework or excluded from delivery. See the dataset card →
How is TalosHub different from other robotics data providers?
Most robotics data comes from lab environments or broad humanoid demonstrations. TalosHub is manufacturing-native — we capture data directly from factory floors with real machines, real operators, and real exceptions. Every task pack includes machine-state context (door state, chuck state, cycle timing, alarms) that lab data simply doesn't have. We deliver white-labeled packages, not raw footage.
What industries and machines does TalosHub cover?
We currently capture CNC machine tending on Haas, Mazak, DMG MORI, and Fanuc verticals and horizontals. Press and brake operations on hydraulic and servo press brakes. Metrology workflows including CMM loading and optical inspection. Finishing operations including grinding, deburring, and polishing. Our capture schema extends to any machine-centered manufacturing workflow — new coverage areas are added with each client engagement.
What does a typical engagement timeline look like?
Starter Pack (25–50 episodes): 4 weeks from scoping to delivery. Week 1 — workflow scope, capture plan, facility alignment. Week 2 — on-site capture sessions. Week 3 — data processing, labeling, QA. Week 4 — packaging, delivery review, gap map. Growth and Enterprise packs extend this timeline based on scope.
Who uses TalosHub data?
Our clients include robotics companies building industrial manipulation systems, VLA model builders expanding their training coverage, robot OEMs and integrators reducing deployment friction, and industrial AI research teams that need real-world manufacturing benchmarks.
What is included in a TalosHub task pack?
Each task pack includes 25–150 usable episodes with multi-view capture, machine-state aligned metadata, phase segmentation labels, 2–8 exception classes with recovery patterns, train/validation/test splits, a sample data loader, schema documentation, and a README — all white-labeled under your brand.
Next steps

Tell us the workflow you need to prove

We will respond within 24 hours, share the relevant schema and sample format, and schedule a 20-minute scoping call if there is a fit.

What happens after you reach out
01

We respond within 24 hours

Confirmation of scope and any clarifying questions.

02

We share the schema and sample

Relevant dataset card section, sample manifest, and loader code reference.

03

20-minute scoping call

Workflow definition, exception priorities, and timeline.

04

Proposal in 48 hours after the call

Task pack scope, deliverables, lead time, and commercial terms.

We share schema and sample format before scoping. No sales pressure.