A protocol that lets AI agents talk directly to each other. If one agent needs help from another, A2A is the language they use to negotiate, hand off tasks, and share results. It sits in the middle layer of the AI coordination stack, between the tool layer (MCP) and the organization layer (OTP).
Why it matters —
Without a shared protocol for agent-to-agent communication, every integration becomes custom glue code that breaks when anything changes.
The EOS replacement for an org chart. Shows seats (roles), responsibilities (3-5 per seat), and the person filling each seat. Different from an org chart because seats can outlast the people in them — and because two people can never own the same seat.
Why it matters —
Most "org chart" disputes are actually accountability chart disputes. People do not know who owns what, so they fight or duck.
A team of specialized AI agents that work together inside one organization. Each agent has a clear job, clear tools, and clear boundaries. The "army" part is not about quantity. It is about structure. A well-built agent army has no overlap, no gaps, and every agent knows exactly what it owns.
Why it matters —
Throwing more agents at a problem without structure creates chaos. An agent army turns chaos into a system.
When one AI agent passes a task, a piece of context, or a decision to another agent. A good handoff includes everything the receiving agent needs to continue without asking follow-up questions. A bad handoff loses context, duplicates work, or drops the task entirely.
Why it matters —
Most multi-agent failures happen at the handoff. The work inside each agent is usually fine. It is the space between agents that breaks.
An 8-level framework measuring how sophisticated an organization's AI agent coordination is, from L1 Tab Complete to L8 Autonomous Agent Teams. Created by Bassim Eledath. Lower-level weaknesses cap the score regardless of higher-level capabilities.
Why it matters —
You cannot improve what you cannot measure. This framework gives you a score and a roadmap for what to build next.
A communication channel that lets agents send structured messages directly to each other without a human in the middle. Messages follow a defined format (REQUEST, INFORM, PROPOSAL, RESPONSE, CHALLENGE) so the receiving agent knows exactly what is being asked and how to respond.
Why it matters —
If every agent-to-agent communication has to go through a human, the human becomes the bottleneck. A message bus lets agents coordinate at machine speed.
The process of coordinating multiple AI agents so they work as a team instead of a crowd. Orchestration decides who runs when, who gets what information, and how results flow from one agent to the next.
Why it matters —
Individual agents are only as useful as their coordination. Orchestration is what turns a collection of tools into a functioning team.
A software program powered by AI that can take actions on its own. Unlike a chatbot that just answers questions, an agent can use tools, read files, call APIs, make decisions, and complete multi-step tasks. Give it a goal, and it figures out the steps.
Why it matters —
Agents are the building blocks of every AI team. Understanding what they can and cannot do is the starting point for everything else.
AI safety company founded in 2021 by former OpenAI researchers, creator of the Claude family of models and the Model Context Protocol (MCP). Backed by Google and Amazon.
A set of rules that lets two pieces of software talk to each other. APIs are how AI agents connect to the outside world — fetching data, sending messages, triggering workflows.
Why it matters —
An agent without API access is a brain in a jar. APIs are how agents do anything useful.
A secret token that identifies and authenticates an API caller. Simpler than OAuth — appropriate for server-to-server integration, not for end-user delegation. Treat API keys like passwords: rotate them, scope them, never commit them to git.
OKRs deliberately set so high that hitting 70% is considered a win. Used to push teams beyond incremental thinking. Google distinguishes aspirational OKRs from committed OKRs and grades them differently.
A clear line that defines what an AI agent is allowed to do and what it must not do. Includes tool access, decision rights, dollar limits, contact lists, and human approval requirements. Authority boundaries should be encoded in code or configuration, not just in prompts.
Why it matters —
Without an authority boundary, "autonomous" means "unbounded." That is how agents make $4,000 mistakes at 3 AM.
An OTP tool that automatically repairs common issues in an Organizational Operating System before publishing — missing fields, duplicate IDs, broken references, malformed YAML.
Two modes an AI agent can operate in. Autonomous agents act without approval inside their authority boundary. Semi-autonomous agents recommend and wait for a human to confirm. Most real-world agents are semi-autonomous for high-stakes actions and autonomous for low-stakes ones.
Why it matters —
Mismatched mode is the most common cause of agent disasters. Autonomous on a high-stakes action means runaway damage. Semi-autonomous on a low-stakes action means the human becomes the bottleneck.
The 4DX discipline of holding a brief weekly WIG meeting — review last week's commitments, look at the scoreboard, make new commitments to move the lead measures. The drumbeat that converts good intentions into execution.
Scaling Up's framework for shortening the cash conversion cycle. Look at every step from sale to cash collected and find ways to compress it — improve sales cycle, make/production cycle, delivery cycle, billing/payment cycle.
An AI assistant built by OpenAI based on the GPT family of language models. Available as a consumer product and through the OpenAI API. One of several major AI platforms used to power agents.
A team of Roles in Holacracy that shares a purpose and operates with explicit authority. Circles can contain sub-circles, replacing the manager hierarchy with a hierarchy of purposes.
A pattern that stops calling a failing service after a threshold of errors, waits a cooldown, then probes to see if it recovered. Prevents cascading failures when one downstream agent or API breaks. Standard pattern in resilient agent orchestration.
The origin story of a knowledge claim: where it came from, when it was created, who authored it, and how it changed over time. Every claim in OTP is provenance-tracked.
Why it matters —
Without provenance, you cannot tell which claims are battle-tested and which are someone's untested guess. Provenance is what makes coordination intelligence trustworthy.
Standard categories within an OOS that organize claims by domain: core_operating_rules, agent_roles_and_authority, coordination_patterns, operational_heuristics, failure_patterns, and human_ai_boundary_conditions.
A score measuring how closely two knowledge claims from different organizations match in meaning. High similarity across many organizations is a signal that you have found a coordination pattern, not just a local rule.
An AI assistant built by Anthropic, designed with a focus on safety and helpfulness. Used as the backbone for many AI agent architectures, including Claude Code (the CLI tool that powers most of OTP's agent army).
A configuration file that gives Claude instructions about how to behave in a specific project or organization. Loaded automatically at the start of every Claude Code session. The simplest form of an Organizational Operating System — every CLAUDE.md is an OOS in miniature.
Why it matters —
Most teams already have a CLAUDE.md. They just do not realize it is the seed of an organizational operating system.
An authentication and user management service that handles sign-up, sign-in, and session management for web applications. OTP uses Clerk for user accounts.
A way to interact with a computer by typing text commands instead of clicking buttons. Most AI agent tooling — including Claude Code, OpenAI Codex, and the OTP MCP server — runs through CLIs.
OKRs the team is expected to fully achieve. Failing to hit 100% is a problem that warrants a postmortem. Contrast with aspirational OKRs, where 70% is the target.
The 4DX discipline of keeping a player's scoreboard — visible to the team, simple, showing both lead and lag measures, and answering at a glance "are we winning or losing?" Coach's scoreboards (dashboards no player checks) do not work.
How certain an organization is about a knowledge claim. Every claim must declare HIGH, MEDIUM, or LOW confidence. HIGH means measured and reproducible. MEDIUM means observed multiple times. LOW means tried once or inferred.
Why it matters —
Without confidence levels, every claim looks equally authoritative. With them, downstream agents can weight rules by how trustworthy they are.
The amount of text an AI model can see at one time, measured in tokens. Everything the model reads — system prompt, conversation history, tool results, files — must fit inside the context window.
Why it matters —
When an agent runs out of context window, it forgets what it was doing. Coordination intelligence is the discipline of putting only the right tokens in the window.
When agents fail not because they are bad at their individual jobs, but because they cannot work together properly. Wrong handoff, missing shared state, duplicated work, dropped tasks. The work inside each agent is fine — the space between agents is broken.
Why it matters —
Most "AI failures" in production are coordination failures. The model is fine. The orchestration is not.
The collective, structured knowledge of how AI agents within and across organizations should coordinate. Captured in operational rules, documented failure modes, evidence-backed patterns, and shared OOS files.
Why it matters —
Every organization is solving the same coordination problems alone. Coordination intelligence is what changes when that knowledge starts to network.
An AI coding assistant built by GitHub that suggests code as you type. Represents the embedded-assistant model of AI integration — the agent lives inside the IDE rather than running as an autonomous process.
In Scaling Up, the precise definition of the customer the business is built to serve. Specific enough that you can name companies or people who fit. Vague core customer definitions correlate with vague positioning and stalled growth.
A numerical representation of text (or images, audio, etc.) as a vector — a long list of numbers — capturing semantic meaning. Used for similarity search, clustering, and retrieval in RAG systems. Two pieces of text with similar meaning have embeddings that are close together.
A business management framework created by Gino Wickman built on six components: Vision, People, Data, Issues, Process, Traction. Operationalized through L10 meetings, 90-day Rocks, weekly Scorecards, the Accountability Chart, and IDS problem-solving. Many EOS concepts map directly to AI agent coordination.
Why it matters —
EOS gave operators a vocabulary for human team coordination. AI agent teams need the same vocabulary, and most of it transfers directly.
A trained, certified facilitator who runs EOS for client companies — leading quarterly and annual sessions, coaching the leadership team, and helping the company run on EOS. Hundreds of EOS Implementers operate worldwide.
A design principle where agents flag and recommend rather than act unilaterally when outside their authority boundary. The default is "ask the human." Autonomy is earned per-action through validated successful runs.
Why it matters —
The opposite — autonomy by default — is how you wake up to discover the agent did something irreversible.
A documented rule for what happens when an agent hits a situation it cannot handle alone. Defines the trigger condition, the recipient, the channel, and the expected response time.
How a knowledge claim was established: MEASURED_RESULT (quantified data), OBSERVED_REPEATEDLY (seen multiple times), OBSERVED_ONCE (seen once), HUMAN_DEFINED_RULE (declared, not derived), INFERENCE (reasoned from other claims), or SPECULATION (untested guess).
Why it matters —
Evidence type tells you how seriously to take a claim. A MEASURED_RESULT outranks INFERENCE every time.
A required field on every knowledge claim documenting what happens when the rule is violated. The opposite of a happy-path doc — failure modes describe the specific damage you are protecting against.
Why it matters —
Rules without documented failure modes are easy to override under pressure. Rules with documented failure modes are arguments against future drift.
A web framework for Node.js built for speed and low overhead. OTP's platform is built on Fastify for low-latency API responses.
The process of training a pre-built AI model on specific data so it gets better at a particular task. Different from prompt engineering (which works with the off-the-shelf model) and RAG (which gives the model real-time access to data).
One of the first 50 organizations to publish an OOS on the OTP platform. Permanent badge that cannot be earned later.
Scaling Up's tool for clarifying who owns each function in the business. Lists the function, the person accountable, the leading and lagging indicators that prove they're doing the job, and the results expected.
The capability of an AI model to output structured calls to named functions with typed arguments — instead of free-text responses — so the surrounding code can execute the call deterministically. The mechanism behind tool use.
Google's family of multimodal AI models, available through Google Cloud and consumer products like the Gemini app and Search AI Overviews.
A Holacracy meeting that processes structural tensions — changing Roles, Accountabilities, Domains, and Policies. Output is amendments to the Circle's structure, captured in a tool like GlassFrog. Separate from the Tactical Meeting.
Connecting an AI model's responses to real, verifiable information instead of letting it generate from training data alone. Done through tool calls, RAG retrieval, file reads, or citations to known sources.
Why it matters —
Grounding is the primary defense against hallucination. Ungrounded models confidently invent things.
Rules and checks that prevent an AI agent from doing things it should not do. Built into prompts, code, or review processes. Prompt-level guardrails are the weakest. Code-level guardrails (the agent literally cannot call the function) are the strongest.
The EOS test for whether the right person is in the right seat. Get it: do they understand the job? Want it: do they want to do it? Capacity: do they have the skill, time, and emotional resources? All three must be yes.
Why it matters —
Hiring failures almost always show up as a No on one of these three. Naming which one makes the conversation fixable.
A property of an operation: running it multiple times has the same effect as running it once. Critical for agent systems because retries are common — a non-idempotent "send email" call run twice sends two emails; an idempotent one does not.
A problem-solving method from EOS. Identify the real issue (not the symptom), discuss it openly with the team, solve it with a clear action item and owner. Most teams skip Identify and end up solving the wrong problem.
Why it matters —
Multi-agent systems do the same thing — they treat symptoms instead of root causes. IDS is a counter-pattern that works for both humans and agent teams.
The process of running a trained AI model to get a response. Every agent action runs inference, which costs money and time. Reducing inference calls — through caching, batching, or pre-computed shared state — is one of the highest-leverage optimizations in agent systems.
In EOS, the seat that runs day-to-day operations and harmonizes the leadership team. The Integrator translates the Visionary's big ideas into execution and keeps the major functions (Sales, Marketing, Operations, Finance) integrated.
A network visualization showing how coordination patterns connect across published OOS files. Each node is a claim or organization, each edge is a similarity, conflict, or citation.
A feed of relevant coordination intelligence discoveries delivered to an OTP publisher when new patterns emerge in their industry, framework, or claim domain.
In EOS, the running list of problems, opportunities, and decisions that need to be addressed. Lives at the bottom of the V/TO and at the heart of every L10. The team works through it using IDS in priority order.
The 90-minute weekly leadership meeting at the heart of EOS. Same agenda every week: Segue, Scorecard review, Rock review, Customer/Employee headlines, To-Do list review, IDS for the rest of the time. Ends with a 1-10 rating of the meeting itself — a 10 is a meeting that made everyone better.
Why it matters —
Most leadership meetings drift between status, strategy, and venting. The L10 fixes the cadence so the team always knows what to expect.
OTP's weekly 90-minute leadership meeting. Same agenda shape as the EOS L10 — scorecard review, rock updates, IDS — but pointed at agentic maturity rather than just business rhythm. The cadence designed to advance an organization toward Level 8 (Autonomous Agent Teams) on the 8 Levels of Agentic Engineering.
Why it matters —
Most teams run L10s for the human side and have no equivalent for the agent side. The L8 fills that gap.
In 4DX, the outcome measures that track the WIG itself — revenue, churn, NPS. Lag measures tell you whether you won, but by the time they move, it is too late to change them. Use them to define the goal; act on lead measures.
The time between asking an AI model a question and getting a response. In multi-agent systems, latency compounds — if every agent in a 5-agent chain takes 2 seconds, the user waits 10 seconds.
In Holacracy, the Role with authority to assign people to Roles within a Circle, allocate resources, and define priorities. Different from a manager — the Lead Link does not direct the work, only the structure.
In 4DX, the predictive activities the team controls that drive the WIG. Lead measures are influenceable and predictive — if you do them, the WIG will move. Contrast with lag measures, which only tell you what already happened.
A file at the root of a website (like robots.txt) that tells AI language models what the site is about and how to interact with it. Emerging standard for AI-first content discovery.
An open standard for delegated authorization. Lets one application access another on a user's behalf without seeing the password. The standard pattern for letting AI agents act on a user's accounts (Gmail, Slack, GitHub).
In OKRs, the qualitative goal — what you want to accomplish. Should be inspirational, time-bound, and clearly directional. Bad: "Improve sales." Good: "Become the default choice for mid-market fitness chains in the Southeast."
A goal-setting framework popularized by Andy Grove at Intel and adopted by Google. An Objective is a qualitative direction. Key Results are 3-5 quantitative outcomes that prove the objective was achieved. Set quarterly or annually, scored at the end of the period.
Scaling Up's flagship planning tool — the entire company strategy on one page across columns for the BHAG, 3-5 year goals, 1-year plan, quarterly priorities, and weekly metrics. The Scaling Up equivalent of the EOS V/TO.
A design principle where every responsibility in an agent system is owned by exactly one agent. No overlap, no gaps. Borrowed directly from EOS's Accountability Chart.
Why it matters —
Two agents that both think they own a job will either fight or both drop it. One seat, one owner is the fix.
Structured formats for different organizational models supported by OTP: Agent Army (multi-agent specialist teams), Value Chain (sequential workflows), and Org Chart (hierarchical management).
AI research and deployment company founded in 2015, creator of GPT, ChatGPT, the OpenAI API, and Codex. The company that put large language models into mainstream use.
AI models whose code and weights are publicly available. Examples include Meta's Llama family and Mistral AI's models. Different tradeoffs from API-only frontier models — slower at frontier capability but cheaper at scale and locally hostable.
A structured artifact that encodes how AI agents in an organization coordinate. Uses YAML frontmatter and Markdown-structured claims with confidence ratings, evidence types, and failure modes. The unit of publication on OTP.
The protocol and platform for publishing, comparing, and learning from organizational coordination intelligence. Operates above MCP (tools) and A2A (agents) in the AI coordination stack.
The power to overrule an AI agent's decision or action. Defined in advance with clear conditions and escalation paths. Override authority should be explicit ("the founder can pause any outreach campaign") rather than implicit.
EOS tool for evaluating whether a person fits both the Core Values (3-strikes-and-out test) and is GWC for their seat. Run for every team member at least quarterly.
An OTP tool that checks your OOS for personally identifiable information before publishing — names, emails, phone numbers, addresses — and flags them so private context does not leak into the public coordination network.
A powerful, open source relational database. OTP uses Postgres to store published OOS files, claims, publisher accounts, and the intelligence graph.
A pattern where data sources write results to files on a schedule, and agents read those files instead of querying sources directly. Decouples scanners from consumers, prevents API rate limits, and makes shared state inspectable.
Why it matters —
When every agent queries every source, you hit rate limits and burn tokens. Pre-computed shared state is how you scale.
The skill of writing instructions that get an AI model to do what you actually want. Includes role framing, few-shot examples, output format specification, and constraint declarations. The highest-leverage skill in AI agent development — a 10% better prompt often beats a 10x bigger model.
Quality tiers (Founding, Platinum, Gold, Silver, Bronze) assigned to organizations based on OOS completeness, evidence quality, and coordination intelligence contribution.
When two agents try to do the same thing at the same time and the result depends on which one finishes first. Classic multi-agent failure mode — both agents see the same task as unclaimed, both pick it up, both deliver it.
Why it matters —
Race conditions are why "obvious" coordination patterns break in production. The shared state was not as shared as you thought.
A technique where an AI model looks up relevant information from a database before generating its answer. Combines a search index (vector database, full-text, or hybrid) with a language model. The most common pattern for grounding agents in private data.
A cloud platform for deploying web applications and databases. OTP is deployed on Railway.
A cap on how many API requests can be made in a window of time. Hit it and the API starts rejecting requests with a 429 status. Multi-agent systems hit rate limits constantly because every agent queries the same source — pre-computed shared state is the standard fix.
A common style for building APIs using standard web requests (GET, POST, PUT, DELETE) to manage data. OTP exposes a REST API at /api/v1 alongside the MCP server.
A set of 10 management habits Verne Harnish abstracted from John D. Rockefeller's playbook — priorities, metrics, daily/weekly meeting rhythm, alignment to a Top 5/Top 1 priority. Predecessor to the modern Scaling Up framework.
A 90-day priority goal in the EOS framework. Each team member picks 1 to 3 Rocks per quarter — concrete, measurable outcomes that move the business forward. The same shape works for AI agent goals.
A named function within a Circle with a Purpose, one or more Domains (things it controls), and Accountabilities (ongoing responsibilities). Different from a job — one person can fill multiple Roles, and Roles outlast the person filling them.
Verne Harnish's growth framework, evolved from the Rockefeller Habits. Built around 4 Decisions: People, Strategy, Execution, Cash. Operationalized through the One-Page Strategic Plan, the Function Accountability Chart, daily/weekly/monthly/quarterly/annual rhythms, and Cash Acceleration Strategies.
A vocabulary of tags from Schema.org added to HTML to help search engines and AI systems understand content types. JSON-LD is the most common format for delivering schema markup today.
A weekly tracking sheet from EOS showing 5 to 15 key business numbers, each with an owner and target. Reviewed at the L10. The same pattern works for agent KPIs — every agent has a small number of measurable outputs.
An OTP feature that monitors the Intelligence Graph for new patterns and insights relevant to your published OOS — emerging coordination claims, framework conflicts, and adoption trends in your industry.
Information that multiple agents need access to, stored where one agent writes and others read. Files, databases, message buses. Keeping shared state consistent is a core coordination challenge — and a core source of failure.
A step-by-step document describing how to complete a specific task consistently. Traditionally human-readable. An OOS is essentially a collection of machine-readable SOPs with structured metadata.
Goals set deliberately beyond what the team thinks is achievable, on the theory that aiming for 10x changes the strategy in ways aiming for 10% never would. Used in OKRs and in many growth-stage operating systems.
Hidden instructions given to an AI model before the conversation starts. Defines role, personality, boundaries, and behavior. The first place coordination rules live before they get extracted into an OOS.
A weekly Holacracy meeting that processes operational tensions — checklists, metrics, projects, and tactical "triggers." Strict format with each agenda item resolved before moving on. The Holacracy equivalent of the EOS L10.
In Holacracy, the gap between what is and what could be that any Role-holder senses. Tensions drive the entire system — every meeting agenda item is someone processing a tension. Reframes "complaint" as productive signal.
Scaling Up's organizing frame: every growth company must get four things right — People (have we got the right people doing the right things?), Strategy (do we have a unique strategy that drives sustainable growth?), Execution (are we executing without drama?), Cash (do we have consistent sources of cash to fuel growth?).
AI coordination operates at three layers: Tool (MCP — how agents call tools), Agent (A2A — how agents talk to each other), and Organization (OTP — how organizations share coordination intelligence). Each layer solves a different problem.
The basic unit of text AI models work with — roughly 3/4 of a word in English. Models charge by tokens, process by tokens, and limit context by tokens. Tokens are the new currency of agent operations.
A metric measuring whether an operational rule saves more tokens than it costs to load into an agent's context. A ratio above 1.0 means the rule pays for itself. Below 1.0 means you are paying tokens to make the agent slower and more confused.
When an AI agent calls a function, API, or external tool to get information or take action — instead of relying only on its trained knowledge. The capability that turns a chatbot into an agent. MCP standardizes how tools are exposed to models.
A database optimized for storing and searching embeddings. Lets agents find "the most semantically similar document" in milliseconds across millions of items. Examples include Pinecone, Weaviate, Chroma, and pgvector.
In EOS, one of the two seats at the top of the Accountability Chart. The Visionary owns big-picture ideas, key relationships, R&D, and culture. Usually the founder. Pairs with an Integrator who runs operations.
EOS's two-page strategic plan. Page 1 captures the long-term vision (Core Values, Core Focus, 10-Year Target, Marketing Strategy, 3-Year Picture, 1-Year Plan). Page 2 captures traction (Quarterly Rocks and Issues List).