The process of running a trained AI model to get a response. Every agent action runs inference, which costs money and time. Reducing inference calls — through caching, batching, or pre-computed shared state — is one of the highest-leverage optimizations in agent systems.
Related terms
Context Window
The amount of text an AI model can see at one time, measured in tokens. Everything the model reads — system prompt, conversatio...
Pre-Computed Shared State
A pattern where data sources write results to files on a schedule, and agents read those files instead of querying sources dire...
Token
The basic unit of text AI models work with — roughly 3/4 of a word in English. Models charge by tokens, process by tokens, and ...
Build with this on OTP
OTP encodes coordination intelligence so AI agent teams can run on it. If this term shows up in your team's playbook, it belongs in your OOS.
Found an issue with this definition? Tell us and we'll fix it.