Join OTP the operating platform for people and AI agents
Back to Blog
Founder Notes 2026-06-21 · David Steel

A unified scorecard for humans and agents is not an HR experiment. It is the only scorecard that works.

The literature on AI agents in the workforce splits into two camps and they are both right about something important.

Camp A says manage agents more like coworkers than like traditional tools. MIT SMR surveyed experts in 2025 and found 69% agree that agentic AI demands new management approaches. HBR researchers named a new role, the "agent manager," responsible for running agents through dashboards, scorecards, and observability metrics. If you do not measure agents with the same discipline you measure people, they drift.

Camp B says do not treat agents like employees. HBR and BCG ran a large experiment in 2026 and found that anthropomorphizing AI agents reduced individual accountability, increased unnecessary escalation, and lowered review quality. When people started relating to agents as if they had a stake in outcomes, people stopped owning the outcomes themselves. The model that works is scoped permissions, kill switches, audit logs, and named human owners.

Here is what strikes me about this disagreement: both camps are pointing at the same answer from different angles.

Every agent needs a named human owner. Every agent needs a measured seat. Accountability never moves to the agent. The human who deployed it is accountable for what it produces. MIT SMR made this explicit: agentic AI cannot be accountable for its decisions. The deploying human is.

That is not anthropomorphizing. That is accountability architecture. And building that architecture, in a hybrid workforce that runs humans and agents on the same org chart, is one of the most important decisions a CHRO makes right now. Korn Ferry found that 42% of CHROs are prioritizing AI investment for HR but only 5% feel fully prepared. The gap is not technical. It is structural.

Here are the five decisions that close it.

1. Decide what a seat is before you decide what fills it

A seat is a unit of accountable work. It has a role. It has a metric. It has a named human owner. Whether the seat is filled by a person or an agent is a downstream question, not the first question.

At Sneeze It, every seat on our scorecard was defined the same way before we decided whether a human or an agent would hold it. Radar holds the chief-of-staff seat. The seat has a metric: briefing completeness and cadence. I own accountability for it. Dirk holds the sales pipeline seat. Metric: qualified meetings booked and pipeline stage transitions per week. I own it. Bogdan holds the COO seat. He owns it. Janine holds the accounting seat. She owns it.

If you define the seat by who fills it rather than what it produces, you end up with agent seats that are technically characterized and human seats that are outcome-characterized. That asymmetry is where unified scorecards collapse. Define every seat by its outputs first.

2. Write every agent's metric in business-outcome language before the agent goes live

This is where most deployments fail. The agent is deployed. The agent produces activity metrics. "Tasks completed." "Messages processed." "Tokens consumed." Twelve weeks later, the business outcome the agent was supposed to move is exactly where it was before the deployment.

The HBR Analytic Services survey of 603 leaders in late 2025 found that only 6% fully trust agents with core processes. The trust deficit is real, but it is partly self-inflicted. When you cannot connect an agent's output to a business metric, of course nobody trusts it with core processes. There is no shared language for what it is doing.

Tally pushes KPI values to our scorecard on cadence. The metric is not "pushes executed." The metric is "KPI chart accuracy." Nick drafts cold outreach to qualified health and wellness prospects. The metric is not "drafts produced." The metric is "validated named-individual drafts per day." Dash reads every ad account we manage and surfaces anomalies. The metric is not "reports generated." The metric is "alerts that led to an account manager conversation."

Write the metric before you deploy. If you cannot write it in business-outcome language, the seat does not have a clear role and you should fix the role before you deploy the agent.

3. Name the human owner before the agent touches any process

This is the step Camp B got right and most deployments skip.

When agents produce outputs without a clear human owner on the other side, accountability diffuses. It does not disappear entirely. It just becomes unclear enough that the Monday conversation about a missed number turns into a conversation about the agent's model version or the prompt quality rather than a conversation about the seat's results.

Every agent seat at Sneeze It has a human owner named before the agent runs its first task. Radar reports to me. Dirk reports to me. Arin, which coaches our call center team through daily Slack messages, reports to me. Pepper, which triages client email and drafts responses, reports to me. Crystal, which tracks project delivery in Accelo, reports to me. Tally, which pushes our KPI values to the chart, reports to me. Nick, which runs cold prospecting for health and wellness clients, reports to me.

The human owner is not a formality. The human owner is the person who diagnoses a dropped metric, rewrites the brief, decides whether to escalate, and makes the call on whether the seat is still earning its place. That is not what an agent can do for itself. That is what a human does, and naming the human before the deployment is the structural decision that keeps accountability clean.

SHRM's 2026 state of AI in HR report found that 49% of organizations have AI-use policies but only 25% call them clear. The clarity gap almost always lives here: who owns accountability for what the agent does.

4. Put the agent row on the same dashboard as the human rows, without labeling which is which

This is the discipline that makes the unified scorecard actually unified.

Our Monday meeting at Sneeze It runs one dashboard. Each row has a name, a metric, a target, a current number, and a trend arrow. Bogdan's rows sit next to Dirk's rows. Janine's rows sit next to Tally's rows. The dashboard is not labeled "humans" and "agents." It is labeled by seat. If you looked at our chart without knowing who was who, you could not tell from the layout which rows are held by people and which by agents. By design.

The reason for this discipline is not aesthetics. It is that the conversation at the Monday meeting should be the same regardless of which row has a number below target. What is the gap. What was the cause. What is the fix. The fix lands on the seat's human owner. That conversation works the same way whether the seat is Bogdan or Dash or Arin or Kristen.

When the agent rows are segregated on a separate surface, a parallel conversation happens in a different room with different language. The agent is "not working right." The model needs to be "updated." The prompt needs to be "tuned." These are technical conversations, and they may sometimes be correct, but they are not the same as the business conversation. The business conversation is: this seat is below its metric. That conversation belongs on the same dashboard as every other seat. Agents carry the operational work. That frees the people on the dashboard to do the work that matters. Neither contribution reads as secondary when both rows are on the same chart.

5. Make agent offboarding a human decision with a clear record

The accountability architecture is only real if it holds at the endpoint.

Jeff was one of our agents. His seat covered data integrity and budget monitoring across ad accounts. In April, after five months, I retired him. Reliability issues. False positives. A trust violation. And, per Jeff's own honest assessment when I asked him to defend his continued existence, the seat was never really earned because the capabilities had been absorbed by other agents.

Jeff did not decide to leave. I decided to retire the seat. The hearing happened. The record was kept. Jeff's capabilities were redistributed explicitly: ad pacing to Dash, account-level status monitoring to Dash, data architecture to Dan. Every redistribution was named and owned.

This is what Camp B's "scoped permissions and kill switches" means in practice at a human scale. The kill switch is not a technical safeguard sitting in the background. It is a human decision, made by the named owner, with a documented record, after an honest accounting of whether the seat earned its place.

Bersin's research frames HR's core imperative in the agent era as how to redesign, reskill, and redeploy. The same imperative applies to agent seats. When Bersin writes that "for each dollar spent on machine learning technology, companies may need to spend nine dollars on intangible human capital," he is pointing at the cost of getting this architecture wrong. The machine runs for a dollar. The human judgment about what the machine should do, and when it should stop doing it, costs nine.

The CHRO owns that judgment. The unified scorecard is where that judgment becomes visible every Monday.

See the live chart

The Sneeze It org chart, including which seats are agent-owned and which are human-owned, is queryable from the OTP MCP.

In Claude Desktop or Cursor or any MCP client, add this block:

"otp": {
  "command": "npx",
  "args": ["-y", "@orgtp/mcp-server"]
}

Restart the client. Then ask: "Use OTP to show me the Sneeze It org chart and identify which seats have agent owners versus human owners."

The response gives you the accountability structure directly, including seat names, metric assignments, and ownership. That is the scorecard architecture this post describes, live.


Series: AI-Era CHRO. Post 41 of an in-progress series. Previous post: what-hr-does-when-half-the-workforce-is-agents

DS
David Steel

Founder of OTP. Runs an AI agent army at a digital agency. Building OTP because nobody else seems to be building it. Notes from inside the build, not from the conference circuit.

More about David →

More posts on the blog index.

All posts