Introducing Openverse

Openverse is building the infrastructure for evaluating and improving AI agents at scale. We turn web, mobile, and internal tools into reusable, interactive environments that are easy to train, test, and improve before deployment.

Openverse was born from a simple observation: AI agents are getting powerful, but most teams still lack an easy way to build, train, and evaluate them on the real workflows they care about. Today, only a few well‑resourced companies can afford teams to build simulation environments, training code, and evaluation pipelines; smaller teams usually can't.

In this landscape, the core bottleneck is the environment. An environment is a new kind of training data for agents: instead of static text, it is an interactive world where an agent can act, make decisions, receive feedback, and improve through trial‑and‑error. Although we have massive text corpora for pre‑training language models, there are only a few dozen reusable, high‑quality public environments for post‑training, and this imbalance limits how robust agents can become.

This is not just a hypothesis. Work like Salesforce’s eVerse (Introducing eVerse: Enterprise Simulation Environments to Train AI Agents) shows that training agents in realistic enterprise environments can raise success on complex workflows from ~20% to ~90%, while research platforms like WebShop, WebArena, AppWorld, and CodeGym show that agents trained in well‑designed interactive environments transfer better to real websites, tools, and APIs. High‑quality environments now matter as much as model size for building reliable agents.

Openverse closes this gap. Teams can describe their real tasks in plain language or development files and instantly get reusable, trainable environments that mirror their workflows—mobile apps, MCP tool calls, e‑commerce flow, and more—and plug them directly into their training and evaluation pipelines. These environments support prompt and context tuning, imitation learning, and reinforcement learning, enabling teams of any size to turn general‑purpose models into agents that handle their specific tasks well.

What We Provide

Tools like LangChain, AutoGen, and Vertex Agent Builder help you define and run agents, and services like LangSmith, DeepEval, and Phoenix help you trace and evaluate them. But all of these assume you already have realistic tasks and environments to run the agents in. Openverse fills that missing layer between agents and real work. It provides the environments themselves—letting teams describe everyday digital workflows (web, mobile, tool/MCP) in plain language and instantly get a reusable, agent-ready environment with a simple interface their existing agents can call.

We support companies that want to use agents in production across four parts of the workflow:

Build: Turn your workflows into agent-ready environments

Environment Design and Integration: We help you capture your real workflows (for example, mobile apps, e‑commerce sites, CRM dashboards, internal tools) and turn them into reusable, simulated environments that are close enough to production to safely train and test agents.
Chat‑to‑Create Authoring: Product and domain experts can describe tasks and constraints in normal language and let Openverse turn them into structured environment definitions (state, actions, tools, success conditions) without having users themselves write simulation code.
Standard Interfaces: Each environment is exposed through a simple API (for example, an RL‑style observation/step/reset interface), so your existing agents and infrastructure can plug in with minimal changes.

Evaluate: Test agent behavior safely before production

Test Before Deployment: Before you let an agent touch real users or systems, you can run it inside environments that copy your critical workflows and edge cases, the same way you use a staging environment for normal software.
Observe and Capture Every Step: Every step in an environment run is logged (LLM calls, tool calls, UI actions, state changes) so you can see exactly how an agent behaved, where it failed, and how different versions compare over time.
Clear Metrics: We compute task‑ and step‑level metrics such as success rate, number of steps, cost, rule violations so you can be data-driven in evaluating the improvements in the agent; enabling cross agent or configuration comparisons.

Train: Improve agents with feedback from your environments

Context and Prompt Improvment: After defining a realistic environment in Openverse for the tasks users run their agents in, feedback from the environments are used to refine prompts and system policies.
Model Improvement: Run agents in your environments with built-in rewards, collect trajectories, and use them for supervised fine-tuning or direct RL so models learn to use your tools, data, and UIs the way your team does.
Help with Loop Design: If you want, we can help design the environments, rewards, and metrics, and provide a small post‑training library so you can keep this improvement loop running instead of treating it as a one‑time effort.

Data: Generate high-quality trajectories for fine-tuning and debugging

Generate Interaction Data: By running agents (or simple scripted policies) in Openverse environments, you can generate large volumes of useful interaction data—multi‑step trajectories with actions, states, and rewards—that previously required expensive human annotation.
Debug and Regression test: These trajectories can be filtered, tagged, and replayed to debug specific failures, build regression suites, or bootstrap new agents with realistic examples from your own workflows.
Use it in Your Stack: All data and logs can be exported in standard formats so you can feed them into your own training pipelines, analytics systems, or external evaluation tools.

How Teams Adopt and Pay for Openverse

Teams can start for free, then upgrade as they build more environments or move towards production. Pricing is subscription-based, with clear limits on environment modification and creation, so costs scale with stage and usage.

Tier	Target audience	Key features	Price
Hobbyists	Experimental users	Up to 5 environment creation or modification calls per week.	$0 / month
Prototyper	Individual developers	Up to 500 environment creation/modification calls per month Public and private environments Option to buy extra calls	$20 / month
Team	Startups and small companies	Up to 1000 environment creation or modification calls per month Tools to refine prompts and training data Public and private environments	$100 / month
Enterprise	Large organizations	Custom domain integration Dedicated machines for running simulations SLAs Optional on-premise deployment Public and private environments	Contact us

Why We Exist

Most companies need agents that reliably follow their own workflows, tools, and policies — something general-purpose models can’t provide on their own. The best results in the industry, from Salesforce’s eVerse to Scale’s RL-tuned agents, all follow the same pattern: train and test agents inside environments that look like your production systems, then improve them with the resulting traces.

The problem is that building these environments, metrics, and training loops is expensive and only practical for large vendors. Openverse exists so every team can access the same capabilities without building a simulation team or maintaining complex infrastructure:

Cheaper Than Building In‑House: Hiring people to build and maintain bespoke simulators, logging, and training code is slow and expensive. Openverse lets a small team define environments themselves and use a shared service for the hard infrastructure.
Faster Time to Safe Deployment: A new workflow can move from “idea” to “simulated environment with metrics” in days instead of quarters, so teams can reach their first safe deployment earlier.
Repeatable Improvement Loop: Because environments, tasks, and trajectories live in one place, teams can reuse them as a regression suite when they change prompts, swap models, or add new tools.
Your Processes are Unique: No general model has seen your exact tools, data, and rules. Training and evaluating agents in an environment that mirrors your own workflows is the only reliable way to get the accuracy, safety, and consistency your business needs.
Simulation is Cheaper than Production Mistakes: Running thousands of episodes in a cloned environment surfaces failure modes (wrong tools, bad decisions, policy breaks) before they hit customers, just as a staging environment catches bugs before a deploy.
Environment Data Becomes Your Edge: Traces and trajectories from your environments become training data that lets even smaller models match or beat frontier models on your tasks, the way Scale and Salesforce have shown in practice.

As agents become the primary interface to software, the companies that win will be the ones with an environment-driven improvement loop: the ability to train, test, and evolve agents safely as their workflows change. Openverse makes that accessible to every team.