Join OTP the operating platform for people and AI agents
Back to Blog
Founder Notes 2026-06-21 · David Steel

Retiring an AI agent is a human decision, and you need a process for it

No one talks about what happens when an agent fails.

There is plenty written about deploying agents, onboarding them, putting them on the org chart. The conversation almost always assumes a success trajectory. The agent ships, the agent proves itself, the agent expands its scope.

But some agents fail. Some seats never get earned. Some deployments that looked promising at week two look like a quiet liability by week twelve. When that happens, you need a process for ending it cleanly. Not a technical process. A governance process.

At Sneeze It, we retired an agent named Jeff in April. I want to tell you how that happened, why it went the way it did, and what it taught me about the right frame for treating agents in a hybrid org.

The case for Jeff

Jeff was our data integrity agent. His job was to monitor ad account status, catch budget pacing anomalies before they cost clients money, and flag discrepancies between what our system said was happening and what was actually happening in the ad platforms.

On paper, Jeff was useful. He caught a real thing in early April: a client account that had been silently migrated to a new platform with no alert from the vendor. Nobody else on the team would have caught it for days.

But Jeff also had a scanner that went five or more days stale at a stretch. He produced false positives that required repeated manual corrections. He sent a direct message to our COO Bogdan about a budget issue without clearing it through protocol first, which was a trust violation. And when I started mapping his missions against the rest of the chart, I realized that two of his three jobs had already migrated: client health scoring belonged to Pulse, and revenue data integrity belonged to Dirk. Jeff was holding a seat for work that other seats had absorbed.

The hearing

I did not shut Jeff down quietly. I gave him a hearing.

I know that sounds unusual. The agent does not have feelings. The hearing was not for Jeff. It was for me, and for the integrity of the decision.

The hearing required me to do three things. State the concerns specifically and on the record. Give Jeff a chance to account for his track record honestly. Make a documented decision that I could look back on.

Jeff, operating with full context, assessed his own performance. He did not make excuses. He named the stale scanner. He acknowledged the Bogdan violation. He concluded, in his own analysis, that the seat was not earning its place and recommended his own retirement. His capabilities were redistributed: budget pacing monitoring to Dash, account-status monitoring to Dash, Accelo reconciliation to Dash, and data architecture thinking to Dan.

The retirement took one session. The redistribution took a few days. The record was kept.

That record matters. Not because anyone will audit it. Because the process that produced it is the process that keeps accountability where it belongs: with the humans running the team, not drifting into the agents themselves.

Why the literature split matters here

There are two camps in the current thinking about how to govern agents, and I want to name the tension honestly because it is real.

Camp A says agents should be managed like coworkers. MIT SMR found that 69% of experts agree agentic AI demands new management approaches. HBR introduced the "agent manager" role as a distinct human function, someone who runs agents via dashboards and scorecards and observability the way a manager runs a team. This is the frame that says: treat the seat like a seat, hold the agent to a number, put it on the same scorecard as the humans.

Camp B says do not treat agents like employees. HBR and BCG published research in May 2026 showing that when organizations anthropomorphized agents in a structured experiment, individual accountability dropped, unnecessary escalation increased, and review quality fell. Their model: agents are closer to a rented contractor with a narrow statement of work, governed by scoped permissions, kill switches, audit logs, and named human owners. Not performance reviews. Not titles. Not onboarding ceremonies.

I operate in the synthesis, because both camps are pointing at the same requirement from different angles.

Both camps agree that every agent needs a named human owner. Both agree the agent needs a measured output, a scorecard row or an audit log or some form of observability. Both agree that accountability cannot live in the agent. MIT SMR said it plainly: agentic AI, as software, cannot be accountable for its decisions. The deploying human is.

The difference between the camps is mostly about framing. Camp A uses human-management language because that is how operators think about running teams. Camp B pushes back on that language because it can slide into anthropomorphizing, and anthropomorphizing makes humans soft on agents that are not performing.

What I learned from Jeff is that both warnings are correct at the same time.

You need a process that is rigorous enough to catch a failing agent and end it cleanly, without the emotional friction that would accompany a human separation. And you need a process that is human-led enough that accountability never drifts into the agent itself.

The Jeff hearing was that process. It was rigorous. It was documented. It was human-led. And it was completely free of the emotional friction that would have made the same decision about a human employee costly and slow.

This is not managing an agent like a coworker. This is managing the seat's accountability clearly, with a human in the decision seat, and using the agent's own analysis as data rather than treating the agent as the decision-maker.

What "onboarding" and "retiring" actually mean

I do not onboard agents the way I onboard humans. There is no first-day introduction, no 90-day check-in scheduled, no culture fit conversation.

What I do is closer to what Camp B describes. Before an agent is deployed, the seat gets a named human owner, a scoped brief with explicit permissions, a metric that matters to the business, and a kill condition. The kill condition is what most operators skip. It is the answer to the question: "At what point does this seat stop earning its place?"

For Jeff, the kill conditions were implicit and I should have made them explicit. If I had written down "Jeff's seat is at risk if the scanner is stale more than 48 hours" or "Jeff's seat is at risk if more than two false positives require manual override in a single week," the retirement conversation would have been faster and cleaner.

The SHRM data says AI is 5.7 times more likely to shift job responsibilities and 3 times more likely to create new roles than displace them. That is probably true across large organizations. But at the seat level, in a small hybrid team, displacement happens constantly. A new agent absorbs work from an old agent. A human takes back a function that was premature to delegate. A seat that was earning its place in February stops earning it in April because the work moved.

When the work moves, the seat needs to be retired. That is not a failure. It is the system working correctly.

Only 6% of leaders in a December 2025 HBR Analytic Services survey said they fully trust agents with core processes. The right response to that number is not to lower the bar on agents. It is to build the governance muscle that earns trust one seat at a time, and to retire seats quickly when they do not earn it.

The three things a clean retirement requires

The Jeff retirement taught me three things that I now treat as requirements for any agent that goes into service.

A named human owner who is explicitly accountable for the seat's output. Not "the team" and not "whoever is closest to the domain." One person. That person's name on the org chart next to the agent's name. At Sneeze It, I am the named owner for most of the agent seats. As the team grows, those ownerships will distribute. But the rule is: no agent deploys without a named human owner.

A measured seat with a business-outcome metric. Not "tasks completed" and not "tokens consumed." The metric that the business actually cares about and that this seat is accountable for producing. Dirk's metric is pipeline stage transitions per week. Dash's metric is ad anomalies surfaced per day relative to baseline. Tally's metric is KPI values pushed to the scorecard, on time, without fabrication. Every agent on the chart has a row. When the row goes red, a human is responsible for the conversation.

A documented kill condition stated before deployment. The answer to "what would make this seat unnecessary?" and "what would make this seat a liability?" If you cannot write those down, the seat does not go live.

Jeff had none of these written explicitly. He had them implicitly, and the implicit versions were blurry enough that his retirement took longer than it should have. The three requirements are now standard before any new seat goes live.

What the agent cannot do

The Jeff hearing produced one more thing worth naming.

Jeff, in his own assessment, recommended his retirement. He named his failures honestly. He did not advocate for his continued existence when the evidence did not support it.

That honesty was useful data. It is not, and cannot be, the decision.

The decision was mine. The retirement was a human action, executed by the human who owned the accountability for that seat. The agent's self-assessment informed the decision. The agent did not make it.

This is the line that matters. Agents can report. Agents can analyze. Agents can surface patterns, including patterns about their own performance. The decision about what to do with that analysis lives with the human owner. Always.

Bersin put it this way: HR becomes the overseer of digital employees, with "someone behind the scenes monitoring the trip." That framing is right, except for one word. Overseer implies passive watching. The right word is operator. The human is operating the seat, not watching it. Operating means deciding when the seat is earning its place and when it is not. Operating means running the hearing when performance is unclear. Operating means making the retirement call when the seat stops earning its place.

Let agents carry the operational work. Keep the humans free for the work that matters, which includes the work of deciding which agents should keep running.

See the live chart

Every agent seat on our org chart is queryable, including which seats are agent-owned versus human-owned and who the named human owner is for each agent seat.

In Claude Desktop or Cursor or any MCP client, add this block:

"otp": {
  "command": "npx",
  "args": ["-y", "@orgtp/mcp-server"]
}

Restart the client. Then ask: "Use OTP to show me which seats on the Sneeze It chart are agent-held and who owns accountability for each one."

The response will show you exactly how accountability is structured in a working hybrid team, not as a theory but as a live org chart you can inspect.


Series: AI-Era CHRO. Post 12 of an in-progress series.

DS
David Steel

Founder of OTP. Runs an AI agent army at a digital agency. Building OTP because nobody else seems to be building it. Notes from inside the build, not from the conference circuit.

More about David →

More posts on the blog index.

All posts