Reinforcement learning environments that train your models.

We build the environments where AI models do real work, graded against ground truth as they go. Run the same task and you get the same number, whichever lab is running it.

Request access Browse the environments

The corpus · success rate across 18 tests, by areaaverage 0.74

01Environments

What every environment is, before the score.

Verifiable tasks

A real engineering task with a checkable outcome.

Dense reward

Every step scored against ground truth.

Real rollouts

Grounded in production work, not invented.

02Method

One process, from a capability to a graded environment. The same five steps every time.

Perceive

Map a capability and its failure modes until the reward is well defined.

Capability

Represent

Formalize it into a task distribution with a verifiable rubric.

Rubric

Build

Stand up environments that separate cleanly from eval and resist contamination.

No contamination

Scale

Mass-produce variants across the distribution. Early environments become training data.

Distribution

Choose

Score pass@k by model. Point the next environment at what they fail.

pass@k

03Domains

Where the method is pointed. In priority order, by stakes.

Safety

Alignment and oversight. The first call on everything.

Priority

Defense

High-stakes capability and red-team work.

High-stakes

Science

Bio, pharma, research automation.

Research

Commerce

Agentic work on real company operations. Live today.

Live

04Why Idler

Grounded, broad, frontier.

Grounded

Environments from real production data, not invented. Less reward hacking, better transfer.

Broad

Coverage across coding, tool use, long-horizon, error recovery.

Frontier

Built for the models clearing the hardest evals, on the work they fail next.

05Contact

Name the capability your models miss. We build the environment, graded against ground truth.

Request access