Better agents learn from real operations.

Optonomous runs autonomous agents inside real Shopify businesses — and turns that work into first-party, consented, outcome-labeled datasets that AI labs can't get anywhere else.

Explore datasets
1Mission

We are an agentic commerce data research company. Our mission is to bring AI into the real economy through the work of autonomous agents operating real businesses — the richest signal of how commerce actually gets done.

2Process

We build agentic datasets with the same rigor researchers bring to models.

i.

Hypothesize

Identify an agentic-commerce capability worth unlocking — dispute resolution, reorder timing, support deflection.

ii.

Design

Architect the trajectory and label schema needed to teach a model that capability.

iii.

Capture

Capture it first-party as our agents operate real merchant ops — context, actions, results, and verified outcomes. See our methodology →

iv.

De-identify

Redact PII at source, score re-identification risk against k-anonymity thresholds, and exclude all PCI/payment data.

v.

Evaluate & Iterate

Measure data quality and tune the collection until a small, high-signal set is achieved.

vi.

Release

Publish the dataset, and continuously refresh it as our agents keep operating.

Our datasets are built for AI labs and platforms training autonomous agents for customer support, payments, fulfillment, and commerce operations.

3Featured Datasets

The datasets only an autonomous-ops platform can produce

We lead with what nobody else has: agentic data and evals built from our agents' verified operational outcomes — train on Trace and Dialogue, evaluate on Crucible. Purchase and reference data is a secondary tier below.

Flagship · RL-ready

Agent Trajectory Dataset

Trace

De-identified records of Optonomous agents performing real e-commerce operations, captured end-to-end as structured trajectories: context → reasoning → tool_call → tool_result → outcome → reward_signal.

Every trajectory carries a verified business outcome and reward signal — the outcome-labeled, RL-ready agentic data only a platform that runs the operations can produce.

Request a sample →
Eval · recurring

Crucible

Agentic Commerce Benchmark

A held-out, outcome-labeled evaluation set built from our verified operational outcomes — with a fresh, uncontaminated split each cycle for benchmarking every model generation.

Request the benchmark spec →

Dialogue

Support Conversation Corpus

Multi-turn, intent-labeled customer-support conversations, fully PII-scrubbed. Built for training and evaluating customer-service agents.

Request sample →
Purchase & reference data Supporting datasets — secondary to our agentic products.

Tally

Basket-Level Purchase Records

Itemized, basket-level purchase records — U.S. core, with cross-market coverage added only where we hold a documented consent basis. All PCI/payment data is excluded at source.

Request sample →

Cohort

Aggregated Purchase Cohorts

k-anonymous, aggregated purchase cohorts for retention and LTV modeling. Records describe grouped segments that meet a minimum k-anonymity threshold — no individual- or household-level tracking.

Request sample →

Registry

Product & Pricing Graph

A structured product graph of SKUs, attributes, images, and pricing across domains and retailers. Product-level reference data — no personal data.

Request sample →

Browse more datasets or design one with us

We offer additional proprietary commerce datasets not listed here. Contact us to request a sample, explore more options, or collaborate on a new dataset.

Contact us

How to access our datasets

1. Request samples

We'll set up a quick call to understand your use case, then send you relevant data samples.

2. Purchase access

Enter a data license agreement for the dataset and use-cases your team needs.

3. Receive data

For off-the-shelf datasets, we'll grant your team access within one to two days.

Bonus: Experiment with us

We frequently partner with research teams to design new shapes of commerce data. Contact us for more.

4Provenance & Methodology

First-party data, built on a platform we operate end-to-end

Our agentic data is generated inside Optonomous, as our own agents run real merchant operations. Our purchase and reference data flows in from merchants' connected commerce platforms. Both pass a fixed de-identification pipeline before anything can be licensed — and we never broker third-party panels.

i. — our core advantage

First-party origin (agentic datasets)

Our agentic datasets — Trace, Crucible, and Dialogue — are generated on infrastructure we own and operate. Our agents do the work — resolving tickets, disputing chargebacks, reordering inventory — and the trajectory is the byproduct. Nothing here is scraped, purchased, or assembled from brokered panels, so we control that pipeline end to end and can attest to where every record came from. (Our purchase and reference datasets have a different origin — see Data rights.)

ii.

Consent

Data is collected only from merchants who explicitly enroll in our opt-in + revenue-share program. Merchants share in the revenue from datasets derived from their operations, which aligns incentives and keeps consent active rather than buried in terms.

iii.

Data rights & platform sources

Our purchase and reference datasets (Tally, Cohort, Registry) don't originate on our own infrastructure — they flow from merchants' commerce platforms (e.g. the Shopify Admin API) through merchant-authorized connections. Onward licensing of these derivatives is governed by each merchant's authorization and the applicable platform partner terms: [ platform-source licensing basis — pending counsel ].

iv.

De-identification

A fixed pipeline runs before any data becomes licensable:

  • PII redaction before data leaves its source system
  • Re-identification risk scoring against k-anonymity thresholds
  • Hard exclusion of all PCI / payment data at source — it never enters the sellable pipeline
v.

Opt-out & deletion

Merchants can opt out at any time, which halts new collection immediately. Deletion requests propagate to derived datasets in line with our retention and re-release policy, and downstream licensees are notified per contract.

No PII in sellable data No PCI / payment data, ever First-party only — no brokered panels Consented + revenue-shared k-anonymity risk scoring
5Careers

Join us to shape the future of commerce AI

We're hiring across research, engineering, and operations.

See open roles