Join OTP the operating platform for people and AI agents
Back to Blog
Founder Notes 2026-05-22 · David Steel

Annual planning with an AI red team, stress-testing the V/TO™

The Annual session is two days. The leadership team rebuilds the V/TO™. The 3-Year Picture™ gets revisited. The 1-Year Plan gets reset. Major Rocks get set for Q1 of the new year. Real money rides on these decisions.

The historical weakness of the Annual session is that the team is in the room with itself. The same people who proposed the goals approve the goals. Healthy skepticism dries up. Optimism inflates. The biggest blind spots in the V/TO™ tend to be the ones the team has agreed to ignore for years.

This is exactly the work an AI red team can do. Not the work of writing the V/TO™. The work of attacking it.

What a red team is for

In security, a red team is an internal or external group whose job is to attack the system the way an adversary would. The red team is not malicious. The red team is loyal. The red team's purpose is to find weaknesses before a real adversary finds them.

For an Annual session, a red team's job is similar. Loyal to the company. Charged with attacking the proposed V/TO™ the way a competitor, a market shift, or an internal blind spot would. Surface every weakness. Force the leadership team to either fix it or consciously accept it.

Most companies never run a red team on their V/TO™. The session ends with the team feeling good about the year ahead. Six months later they discover the weakness the room missed because no one in the room was paid to find it.

An AI model is unusually well-suited to this work. It is not in the room. It does not need to maintain harmony. It has read the prior year of company data and can spot the recurring patterns. It can be asked to argue against any line item without political cost.

How to set up the AI red team

A practical pattern.

Pre-session prep. A few days before the Annual, the Integrator hands the model the full set of artifacts: current V/TO™, current 1-Year Plan, current 3-Year Picture™, last 12 months of Scorecards, last 12 months of Issues Lists, Quarterly Rock retrospectives, Customer Headlines themes, Employee Headlines themes, win/loss notes from sales, and major external events of the year.

The red team prompt. Hand the model a prompt like: "You are a loyal critic of this company's leadership team. Your only job is to find the weaknesses in the proposed V/TO™ and the 1-Year Plan. Be specific. Cite evidence from the prior year. Propose three to five weaknesses ranked by likelihood of mattering."

The model's output. Five weaknesses. Each one with evidence. Each one with a proposed counter-question for the leadership team to consider.

Annual session usage. The leadership team reviews the red team output in the morning of day one. The team agrees on which weaknesses are real and worth addressing in the new V/TO™, and which the team consciously decides to accept.

That is the whole protocol. About four hours of total work for the Integrator to prep, about 45 minutes of session time to review.

Why this works

Three reasons.

One, the model has no political stake. A human red team inside the company has to navigate relationships. The model does not. The model can say "your retention numbers contradict your 1-Year Plan growth assumption" without worrying about who proposed the 1-Year Plan.

Two, the model reads more data than a human can. A year of Scorecards is hundreds of weekly entries. A year of Issues is dozens. A year of Headlines runs into the thousands. No human can hold all of it in mind during the Annual. The model can.

Three, the model surfaces contradictions humans miss. The leadership team's optimism for the new year often contradicts evidence in the prior year. The model catches the contradictions directly. "Your 1-Year Plan assumes 30% lead growth. Your Issues List shows a recurring theme of marketing capacity constraints for the past three quarters. What changes in the new year to enable the assumption."

The leadership team can decide to defend the assumption. The point is the team is forced to defend it consciously.

What to ask the red team to attack specifically

A standard battery of questions works well.

  • Where does the new V/TO™ contradict last year's evidence.
  • Which 1-Year Plan goals have the weakest dependency analysis.
  • Which Rocks proposed for Q1 are the most likely to slip given the team's historical Rock completion rate.
  • Which Core Values have shown stress in the prior year's Issues List.
  • Which Scorecard rows are the weakest predictors of the 1-Year Plan goals they claim to support.
  • Which key customer segments have shown leading indicators of decline that are not reflected in the new V/TO™.
  • Which competitors have made moves in the prior year that the new V/TO™ does not address.
  • Which seats on the Accountability Chart are most at risk in the coming year.

Eight questions. The model returns eight focused analyses with evidence. The leadership team has eight pre-staged conversations they would not have otherwise had.

What the red team should not be

Three things to avoid.

Avoid letting the model propose the new V/TO™. The model's job is to attack the team's proposal. The team's job is to propose. Asking the model to write the V/TO™ collapses the value of the red team because the team has nothing to attack except its own model-generated draft.

Avoid using the model to score the team. "How would you rate this V/TO™ on a 1-10 scale" is a poor use of the model. Numerical scores create false precision. Specific weakness analyses create useful conversation.

Avoid letting the model's read become the team's read. The model is one voice. The team has many. If the model says "this is the biggest weakness" and the team disagrees, the team is probably right. The model's gift is the surface area of analysis, not the final judgment.

What this does to the room

Two effects worth naming.

The leadership team gets sharper. Knowing the red team is going to attack the V/TO™ before approval makes the team write a sharper V/TO™. Soft language gets cut. Hopeful assumptions get cited or removed. The proposal that goes into the red team is already a better proposal than it would have been without the red team's existence.

The room becomes less hierarchical. Junior leaders are more comfortable challenging the V/TO™ when they can point to the red team's evidence rather than challenging the Visionary directly. The Visionary's natural defensiveness drops because the criticism is coming from outside the room.

Both effects make the Annual session better. The session output is a sturdier V/TO™.

What we do at Sneeze It

We run our Annual with this red team protocol. The model we use is Claude Opus because of the long context window. Could use ChatGPT or another high-end model. The choice is less important than the discipline.

The hardest part is being honest about what the model surfaces. The first year we ran this, the model identified three weaknesses I had been quietly avoiding. Confronting them changed the V/TO™. Confronting them also changed how I run the company.

This is the unexpected benefit of an AI red team. The team gets sharpened by an outside voice that has no ego and no agenda. That kind of feedback is rare. The model can provide it as long as the team is willing to hear it.

FAQ

Should the red team output be shared with the whole team? Leadership team only, during the Annual session. After the session, the Integrator can share the red team's findings with the broader team as part of the new year's framing.

What if the red team is wrong? Often it will be. The model does not understand the company the way the team does. The point is not that the model is right. The point is that the model raises questions the team must answer.

Can we run a red team for Quarterlies too? Yes, lighter version. The Quarterly red team focuses on Rock proposals and Issues themes. Same protocol, smaller scope.

Should the EOS® Implementer® see the red team output? Yes if they are facilitating the Annual. They can use it as input for their own facilitation questions.

EOS®, Entrepreneurial Operating System®, V/TO™, Vision/Traction Organizer™, Level 10 Meeting®, L10®, Rocks™, Scorecard, Issues List, Customer Headlines, Employee Headlines, 3-Year Picture™, 1-Year Plan, Core Values, Accountability Chart, Annual, Quarterly, and EOS® Implementer® are concepts and trademarks of EOS Worldwide, LLC. This article is an independent practitioner perspective and is not affiliated with or endorsed by EOS Worldwide.

DS
David Steel

Founder of OTP. Runs an AI agent army at a digital agency. Building OTP because nobody else seems to be building it. Notes from inside the build, not from the conference circuit.

More about David →

More posts on the blog index.

All posts