The 8 Levels of Agentic Maturity (and How to Measure Yours)

Not all AI agent deployments are created equal. A company using ChatGPT for customer support is not at the same maturity as a company running 14 autonomous agents that coordinate via a shared message bus. The first is a tool. The second is an organization. They face entirely different problems, require entirely different architectures, and fail in entirely different ways.

But until recently, there was no standard way to measure the difference. No common vocabulary. No framework that let you look at two AI deployments and say, objectively, which one has more sophisticated coordination. You either had agents or you didn't. That was the extent of the conversation.

The 8 Levels of Agentic Engineering by Bassim Eledath changed that. It gave the industry a clear, hierarchical framework for measuring how advanced an AI agent deployment actually is. Not how smart the model is. How well the agents coordinate. OTP adopted it as a core measurement dimension. Every published OOS on the platform receives an agentic level score based on this framework.

The 8 Levels

Each level builds on the one below it. Weaknesses at lower levels cap your score regardless of what you have built higher up. You cannot be L7 if your L3 tool integration is unreliable.

L1 Tab Complete

Basic autocomplete. GitHub Copilot suggestions. The AI predicts the next token in your workflow. No autonomy. No persistent action. The human does all the thinking. The AI finishes the sentence.

L2 Chat Assistant

Interactive Q&A. ChatGPT, Claude, Gemini. Single-turn or multi-turn conversations, but no persistent action beyond the chat window. The AI answers questions. It does not do work on your behalf. Close the tab and everything resets.

L3 Tool User

The agent can call external tools via MCP or function calling. Read databases. Send emails. Make API calls. Pull data from Slack. Post to a CRM. This is where the AI starts touching real systems. The risk profile changes here because mistakes now have consequences outside the chat window.

L4 Workflow Agent

Multi-step task execution with planning. The agent can break a goal into subtasks and execute them sequentially. "Compile a morning briefing" becomes: scan Slack, pull calendar events, check email, triage by priority, write to a daily note. Ten steps. One command. The agent plans the work, not just executes a single action.

L5 Autonomous Specialist

A domain-expert agent that operates independently within a defined scope. It handles its own errors. It knows what it owns and what it does not own. A call center manager agent that reads performance data, drafts coaching messages, and flags problems without being asked. The human sets the mission. The agent runs it.

L6 Multi-Agent System

Multiple specialized agents in the same environment. Each has a role. An analytics agent, a project manager agent, an email agent, a pipeline agent. They coexist. But coordination is ad-hoc. They share a codebase or a runtime, not a protocol. When they collide, the human resolves it.

L7 Orchestrated Agent Team

A coordinated agent fleet with shared state files, explicit ownership boundaries, pre-computed data handoffs, and escalation protocols. Every agent knows what it owns, what it does not own, and where to send problems it cannot solve. This is where most serious AI-native companies are today. The coordination is deliberate, documented, and testable.

L8 Autonomous Agent Teams

Agents coordinate with each other without human mediation. An agent-to-agent message bus carries structured messages: requests, proposals, challenges, responses. One agent detects an overspend anomaly, sends a diagnosis request to the analytics agent, receives the root cause, and escalates to the operations agent with a recommended action. The human sets strategy. The agents execute and self-correct. This is the frontier.

How OTP Measures It

Every published OOS on OTP receives an agentic level score. The score is not self-reported. It is calculated based on the coordination patterns, authority boundaries, and failure modes documented in the claims.

An OOS that describes shared state files, explicit agent ownership boundaries, and escalation protocols will score L7. One that also describes an agent message bus with structured message types (REQUEST, INFORM, PROPOSAL, RESPONSE, CHALLENGE) and documented agent-to-agent workflows will score L8.

The scoring looks at what you actually do, not what you aspire to. If your OOS describes an agent message bus but your claims show zero transactions through it, the score reflects the gap. Dead infrastructure does not count.

The agentic level score is visible on every published OOS. When you browse the registry, you can filter and sort by level. This makes it easy to find organizations operating at your level or the level you want to reach.

Where Most Organizations Are

Most organizations using AI today are between L2 and L4. They have chat assistants. Maybe a few tool-using agents that pull data from APIs or send automated emails. Some have built workflow agents that chain multiple steps together. That is the current center of gravity.

The jump from L4 to L5 is significant. It requires an agent that owns a domain, handles its own errors, and operates without being prompted for every action. Most teams underestimate how much specification that takes. You have to define what the agent owns, what it does not own, how it handles edge cases, and what it does when it is wrong.

The jump from L5 to L7 is where coordination intelligence becomes essential. You cannot get six agents to work together without explicit rules about shared state, ownership boundaries, data handoffs, and escalation paths. Ad-hoc coordination breaks down fast. Two agents write to the same file. One agent optimizes a metric that another agent depends on. An escalation fires but nobody picks it up. These are not model failures. They are organizational failures. And they require organizational solutions.

That is exactly what the OOS captures.

How to Level Up

The path from your current level to the next is not theoretical. It is documented in the OOS files already published on OTP. Browse them. Compare yours. See what L7 and L8 organizations do differently. The patterns are visible.

Here is what we have observed so far across published OOS files:

L4 to L5: Requires explicit ownership boundaries. Each agent needs a job description that includes what it does NOT own. Without this, agents drift into each other's lanes.
L5 to L6: Requires shared state. Agents need to read each other's output without direct coupling. Pre-computed files that one agent writes and another reads.
L6 to L7: Requires orchestration protocols. A chief of staff agent or morning briefing routine that reads all shared state files and compiles a single view. Escalation paths that are tested, not just documented.
L7 to L8: Requires an agent message bus. Agents that can send structured messages to each other. Proposals, challenges, requests. The human sets strategy. The agents coordinate execution.

Each transition has a characteristic failure mode. L5 agents fail when their boundaries are ambiguous. L6 agents fail when shared state gets corrupted. L7 agents fail when escalations are documented but never tested. L8 agents fail when the message bus exists but no workflow actually uses it.

The OOS makes these failure modes explicit and comparable. That is the point.

Your Score Is Not a Vanity Metric

Your agentic maturity level determines whether your AI team scales or collapses. At L4, adding a fifth agent is straightforward. At L7, adding a fifteenth agent without breaking the coordination layer requires serious architectural discipline. The level tells you what class of problems you are solving and what class of problems are about to hit you.

It also determines what you can learn from other organizations. An L3 organization publishing an OOS will not get much from comparing against an L8 deployment. The problems are too different. But two L7 organizations comparing their coordination patterns will find immediately actionable intelligence. OTP makes that comparison possible.

Publish Your OOS

See where you score on the 8 levels

Browse by Level

Find organizations at your maturity

Why Levels Matter

How we integrated the framework into OTP

Publish your OOS. See where you stand. The framework is clear. The measurement is automated. The only question is whether you are willing to be honest about where you are today.

David Steel

Founder of OTP and CEO of Sneeze It, a digital marketing agency running 14 AI agents in production.

Get OTP free →