The awkward question after most agent pilots is not, “Did it work?” It is, “What exactly did we just connect?”
A team gives an AI agent access to a ticket queue, a CRM, a document store, a codebase, a browser, a database, or an inbox. The demo feels impressive. The agent resolves a few tasks, summarizes a few records, opens a pull request, or drafts a customer response. Then the pilot expands. Another department copies the pattern. A vendor adds an MCP connector. A workflow starts running on a schedule. Someone grants persistent permission because approving the same action every morning is annoying.
At that point, the organization no longer has one agent pilot. It has an unmanaged execution surface.
The next serious layer of enterprise AI is not a bigger prompt, a more charismatic demo, or a new spreadsheet of use cases. It is an agent inventory: a living registry of which agents exist, what they can touch, which systems they can change, what model stack they depend on, what safety evidence exists, and who can revoke their access.
The market is moving faster than the disclosure layer
The [2025 AI Agent Index](https://aiagentindex.mit.edu/) is useful because it treats deployed agents as systems worth documenting, not just products worth marketing. The index tracks 30 prominent agentic systems across 45 fields, including design, capabilities, ecosystem dependencies, autonomy, and safety disclosures.
The headline is not simply that agents are getting more capable. It is that capability is becoming easier to observe than safety.
According to the index, 24 of the 30 tracked agents launched or received major agentic updates during 2024–2025. Among 13 agents exhibiting frontier levels of autonomy, only four disclosed agentic safety evaluations. Across the full set, 227 of 1,350 fields had no public information, with missing information concentrated around ecosystem interaction and safety. The index also notes that many agents depend on a small set of frontier model families, while 20 of 30 support MCP for tool integration.
For buyers, builders, and operators, this points to a practical problem: the public market does not yet provide enough standardized information to make agent risk legible. If your organization adopts agents anyway, you need your own internal index.
That internal index should not be a procurement afterthought. It should be part of the deployment architecture.
Agent risk lives in paths, not permissions alone
Traditional access control asks whether a system can perform an action. Agent governance has to ask a harder question: whether this action is acceptable after the actions that already happened, for this user, in this workflow, against this data, at this moment.
That is the core argument of [Runtime Governance for AI Agents: Policies on Paths](https://arxiv.org/abs/2603.16586). The paper argues that agent behavior is path-dependent. A single database read may be acceptable. A database read followed by an external email may be an exfiltration pattern. A support-ticket lookup may be routine. A support-ticket lookup followed by a refund, a CRM update, and a message to a customer may require a different control.
Static permissions still matter. Prompt instructions still matter. But neither is enough when the risk emerges from a sequence.
This is where the inventory layer becomes operational infrastructure. Runtime policy cannot evaluate “the agent” in the abstract. It needs to know which agent is acting, who owns it, which workflow it belongs to, what tools it is allowed to use, what data classes it may encounter, which actions require approval, which paths are blocked, and where evidence is recorded.
Without that registry, every policy engine starts from memory and vibes.
Compliance starts with knowing what exists
The same pattern shows up in legal and regulatory analysis. [AI Agents Under EU Law](https://arxiv.org/abs/2604.04604) maps agent deployments across the EU AI Act, GDPR, Cyber Resilience Act, NIS2, Digital Services Act, Data Act, Data Governance Act, sector-specific rules, and product liability concerns. The paper’s most practical point is not a prediction about which law will dominate. It is the operational requirement underneath all of them: providers need an exhaustive inventory of an agent’s external actions, data flows, connected systems, and affected persons.
That sentence should be printed and taped above every enterprise agent roadmap.
You cannot classify a risk if you do not know which systems the agent touches. You cannot revoke access cleanly if no one knows which credential, connector, service account, or delegated permission the workflow is using.
An agent inventory is not legal advice. It is the substrate that makes review possible.
What belongs in the inventory
A useful agent inventory does not have to start as a huge platform. It can start as a controlled table backed by a change process. The key is to record fields that change decisions.
At minimum, each agent should have:
- Name, owner, business process, and deployment status. If nobody owns the agent, it is not production-ready.
- Trigger mode and autonomy level. Human-invoked, scheduled, event-driven, called by another agent, or always-on; plus what it can decide without approval.
- Model and provider dependencies. Primary model, fallback model, hosted environment, and material vendor dependencies.
- Tools, connectors, and permission scopes. APIs, databases, SaaS systems, browsers, code repositories, messaging systems, file stores, read/write permissions, and code execution rights.
- Data classes and external actions. PII, PHI, customer data, source code, financial data, emails, tickets, web forms, payments, record changes, or customer messages.
- Approval gates and revocation path. Which actions require confirmation, who can approve them, whether approvals persist, and how to stop the agent now.
- Evaluation evidence. Task evals, safety evals, red-team findings, known failure modes, and unresolved issues.
- Runtime policy hooks and evidence location. Blocked action sequences, escalation conditions, monitoring rules, logs, traces, tool calls, approvals, and retention rules.
- Compliance flags. Employment, healthcare, finance, critical infrastructure, consumer communication, children’s data, or cross-border data transfer.
The inventory should also include a field for missing information. Unknowns are not embarrassing; hidden unknowns are.
The inventory changes behavior
Once the inventory exists, it becomes easier to run the organization differently.
Procurement can compare vendors by the disclosure fields that matter, not by the best demo video. Security can ask which agents have write access to customer systems. Compliance can filter for workflows touching regulated data. Engineering can find all agents dependent on a model family, API connector, or MCP server before making a change. Incident response can identify which workflows used a compromised credential. Product leaders can see whether pilots are accumulating into a coherent platform or a pile of exceptions.
This is also how human oversight becomes more precise. Anthropic’s framework for [safe and trustworthy agents](https://www.anthropic.com/news/our-framework-for-developing-safe-and-trustworthy-agents) emphasizes human control, transparency, privacy, and secure interactions. Those principles are hard to operationalize if oversight is configured one agent at a time and forgotten. They become easier when each agent has an explicit record of what it can do, which actions are high-stakes, and where a human must remain in the loop.
The point is not to slow every agent down. The point is to stop treating every agent as a special case.
Start before the platform is perfect
The best time to build this layer is before the tenth pilot. The second-best time is before the first incident.
Start with the agents already in motion. Document the tools they touch, the data they read, the actions they can take, the approvals they require, and the person who can shut them off. Attach inventory updates to deployment reviews, connector changes, permission upgrades, and vendor onboarding. Review persistent permissions on a schedule. Make missing safety evidence visible. Require owners to update the record when an agent moves from pilot to production.
Do not wait for the perfect governance suite. A disciplined inventory in a simple system is better than an elegant architecture nobody maintains.
Pilots prove that an agent can do work. An inventory proves the organization understands the work it has authorized. That distinction is going to matter more as agents move from impressive demos into the daily machinery of the business.
Sources
- https://aiagentindex.mit.edu/
- https://arxiv.org/abs/2602.17753
- https://arxiv.org/abs/2603.16586
- https://arxiv.org/abs/2604.04604
- https://www.anthropic.com/news/our-framework-for-developing-safe-and-trustworthy-agents
Build Agents That Prove Their Work
If you are wiring agent workflows into real operations, Alchemic can help design the checkpoints, traces, and validation gates that keep automation honest.
Get the Field Guide - $10 ->