Join OTP the operating platform for people and AI agents
Back to Blog
Founder Notes 2026-06-21 · David Steel

The CEO's automation sequencing decision is a judgment call, not a to-do list

Most CEOs approach automation the way they approach software purchases: make a list of painful things, rank by frequency, start at the top.

That logic fails with agents. Not because frequency is the wrong input. Because frequency alone tells you what is loud, not what is safe to hand off. Loud and safe are different problems, and conflating them is why so many agent rollouts stall at three pilots and never scale.

The central claim of this post is simple. Before you automate anything, you need to run four questions in sequence. Each question is a gate. If a candidate fails a gate, it either gets redesigned or gets deferred. You do not proceed to the next question until the current one clears. That is what a decision tree looks like in practice. It is not complicated. It is disciplined.

Here is what the tree looks like, and why I built it the way I did.

Gate 1: Is the output verifiable?

The first question has nothing to do with the task itself. It has to do with whether you can tell, without expert review, whether the agent did the task correctly.

An agent scanning for budget pacing anomalies produces an output you can verify in thirty seconds. Either the anomaly is there or it is not. Either the spend matches the cap or it does not. You do not need to reconstruct the agent's reasoning to know if it was right.

An agent making strategy recommendations produces an output that requires context, judgment, and expertise to evaluate. You cannot verify it in thirty seconds. The verification cost is almost as high as the cost of doing the work yourself.

Automate the first category early. Be careful with the second category. Not because agents cannot produce useful output in hard-to-verify territory, but because unverifiable output with no human gate is where agents drift silently. They keep producing. The output looks reasonable. Three months later, you discover the output has been systematically wrong in a way nobody caught because nobody had a fast verification loop.

Verifiability is not a permanent property of a task. It is often something you can engineer. Before passing a candidate to Gate 2, ask whether you can build a simple verification step into the workflow. If yes, the task becomes automatable. If the verification step would itself require an agent, you have compounded the problem, not solved it.

Gate 2: Does a failure here damage a client relationship, your brand, or your cash flow?

Every task that clears Gate 1 goes here. This gate sorts by blast radius.

A task with high blast radius and low verifiability is the danger quadrant. You can automate it eventually, but not first. You need to earn trust in the low-stakes seats before you deploy agents in the high-stakes ones.

I think about this in three tiers. Low blast radius: internal reporting, data aggregation, inbound triage, scheduling logistics. Mistakes are fixable in hours with no external damage. Medium blast radius: client-facing data summaries, proposal follow-up, appointment scheduling. Mistakes are visible to clients but correctable quickly. High blast radius: billing, contract language, client strategy recommendations, anything that moves money or shapes a client's expectations about their results.

We did not give Dirk, our sales agent, the authority to send outbound emails in his first week. He ran research, built prospect lists, wrote draft sequences. David approved before anything left the building. Dirk earned send authority incrementally, after we had verified his output quality across fifty drafts without a false alarm. Now he operates with wider latitude because we have an evidence base for it. Not because we trusted him on day one.

The principle is: start in the tier where a mistake is recoverable. Build the evidence base. Expand the authority.

Gate 3: Is the process stable enough to hand off?

This is the gate most CEOs skip, and it is the gate responsible for the most failed agent deployments.

An agent is not a repair mechanism for a broken process. An agent given a broken process will execute it faster and at greater scale, which means the breakage propagates faster and at greater scale. Accenture, whose work I've read on this, frames it this way: don't make inefficiency run efficiently. That sentence should be tattooed somewhere visible.

Before you automate a process, you need to know three things about it. First, is the process documented? If the only place the process exists is in one person's head, you cannot hand it to an agent without first extracting it from that person's head. That extraction is a prerequisite, not a post-automation fix. Second, does the process produce consistent outputs when different humans run it? If two people run the same process and produce different outputs, the agent will inherit that inconsistency and you will not know which version it is running. Third, has the process been stable for at least ninety days? Processes that are still changing every few weeks are not ready for automation. You will spend more time retraining the agent than you would have spent just doing the work.

At Sneeze It, we spent several weeks documenting and stabilizing our client reporting process before Dash, our analytics agent, took it over. Dash now runs it faster and more consistently than any human we had in that seat. But Dash could not have done that on a process we were still inventing. The stability was the precondition.

Gate 4: Does automating this free a human for something only a human can do?

This is the strategic gate. The first three gates are about whether automation is safe and feasible. This gate is about whether it matters.

The mission underneath all of this is letting agents carry the operational work so people are free for the work that matters. That framing has a direction built into it. Agents take over execution. Humans concentrate on judgment, relationships, and the decisions that require contextual wisdom agents cannot replicate.

If you automate a task and the human who used to do it just... does less, you have not created value. You have created slack. Slack is not a strategy.

The question to answer before deploying any agent is: what will the human in this seat do with the time the agent returns? If you cannot answer that question concretely, the automation is not yet valuable enough to prioritize.

Arin, our call center manager agent, handles daily performance data aggregation and coaching message drafts for our call center team. Before Arin, a human was spending hours pulling and formatting that data. The human now spends that time on calls that require judgment, coaching moments that require reading a person, and decisions about how to structure the team. The automation is valuable because the returned time went somewhere that matters.

If you do not have a clear answer for Gate 4, it does not mean you skip the automation. It means you solve the "returned time" question before you deploy the agent. Because the deployment without that answer is efficiency theater.

How I sequence from here

Once a candidate clears all four gates, I order deployments by two factors: how much time does this seat currently consume, and how confident am I in the evidence base that the output quality will hold.

High time cost, high confidence: deploy now. This is the sweet spot. Radar, our chief-of-staff agent, landed here. Daily briefing compilation is time-intensive and the output quality is easily verified against ground truth. Radar was our first deployment and the ROI justified every subsequent one.

High time cost, lower confidence: run a validation window first. Have the agent produce outputs alongside the human for four to six weeks. Measure output quality. Expand authority when the evidence supports it. This is how Tally, our scorecard agent, was built. The stakes of a wrong KPI push are medium-high, so we ran a parallel validation period before giving Tally autonomous publishing rights.

Lower time cost: defer unless the strategic value is clear. The time savings from automating a small task rarely justify the overhead of building and managing the seat. Your agent deployment bandwidth is finite. Spend it on the seats that return the most human time or the most operational reliability.

The sequence that fell out of this logic at Sneeze It went roughly: internal reporting and data aggregation first (Dash, Radar, Tally), then inbound triage and scheduling support (Pepper, Crystal), then outbound pipeline and prospecting (Dirk, Nick), then client-facing management functions (Arin, Pulse). That sequence was not a plan we wrote down in advance. It was the output of running every candidate through the four gates and letting the gates do the sorting.

What the gates leave out

The decision tree tells you what to automate first. It does not tell you what to automate eventually.

Some tasks that fail Gate 2 or Gate 3 today will pass in six months. Gate 3 failures become eligible when the process stabilizes. Gate 2 failures become eligible when the agent has an evidence base in lower-stakes territory. The gates are not permanent rejections. They are current readings.

Deloitte's 2026 study of over three thousand enterprises found that only 21 percent have a mature governance model for agentic AI. That number implies a lot of organizations deploying agents without a sequencing discipline. The absence of a decision tree is not a sign of speed. It is a sign of exposure.

The sequence matters because trust compounds. An agent that earns its authority in a low-stakes seat before moving to a higher-stakes one carries a track record. That track record is what makes the higher-stakes deployment defensible, to yourself, to your team, and to your clients.

MIT CISR's research on enterprise AI maturity shows that Stage 4 firms, the ones that have built sustainable AI-driven operating models, outperform their industries by 13.9 percentage points in revenue growth. The gap between Stage 1 and Stage 4 is not a tools gap. It is an operating discipline gap. Sequencing is operating discipline.

See the live chart

From the OTP MCP, you can query any seat on our chart and see where it sits in our deployment timeline, what gates it cleared, and what human time it returned.

In Claude Desktop or Cursor or any MCP client, add this block:

"otp": {
  "command": "npx",
  "args": ["-y", "@orgtp/mcp-server"]
}

Restart the client. Then ask: "Use OTP to show me the agent seats on the Sneeze It org chart and what operational function each one owns."

You will see the full seat structure in one query. That structure did not emerge from a wish list. It emerged from running every seat through four gates, in order, until the sequencing made sense.

DS
David Steel

Founder of OTP. Runs an AI agent army at a digital agency. Building OTP because nobody else seems to be building it. Notes from inside the build, not from the conference circuit.

More about David →

More posts on the blog index.

All posts