Summit Primary Care
bronze L3 Context Engineeringcore operating rules
All patient education content generated by the education agent must be queued for physician review before distribution. No content bypasses the queue regardless of topic simplicity.
Why: In week 2, the education agent generated a handout on managing seasonal allergies. Content was medically accurate but recommended an OTC antihistamine without noting it interacts with a blood pressure medication common in our patient population. Dr. Pham caught it during review.
Failure mode: Education agent produces 12 handouts in one week. Staff assumes "simple" topics like hydration tips are safe to distribute without review. One handout recommends increased water intake without noting fluid restriction guidance for the 8 heart failure patients in our panel. Patient follows advice, ends up in the ER with fluid overload.
Scope: All patient education content. No exceptions for "simple" topics.
The approval queue must be reviewed by a physician at least twice per week. If the queue exceeds 20 items, the office manager escalates immediately rather than waiting for the next scheduled review.
Why: The queue hit 47 items in week 3 because both physicians were at a conference. When they returned, they spent 4.5 hours clearing the backlog. Half the content was time-sensitive (flu season handouts) and no longer relevant by the time it was approved.
Failure mode: Queue grows to 47 items over 9 days. Physicians face a wall of content and start rubber-stamping to clear the backlog. Quality of review degrades. 23 items approved in one 90-minute session versus the normal pace of 8 items per session.
Scope: Physician approval queue. Applies to education content and onboarding materials.
The appointment reminder agent sends reminders at 72 hours, 24 hours, and 2 hours before scheduled appointments. Messages are plain text, never include medical details, and always include the cancellation/reschedule phone number.
Why: HIPAA requires that appointment reminders not disclose the nature of the visit. Early versions included "your follow-up for [condition]" in the reminder text. Tanya caught this before any were sent, but only because she was manually reviewing every message during the first week.
Failure mode: Reminder message includes "your diabetes follow-up appointment" in a text visible on a locked phone screen. Patient's family member sees it. Patient had not disclosed their diagnosis to family. HIPAA violation and destroyed patient trust.
Scope: All patient-facing automated communications.
No-show prediction scores are internal-only. They are never shared with patients, never documented in the medical record, and never used to deny or delay scheduling.
Why: Using predictive scores to manage scheduling could constitute discrimination. A patient flagged as "high no-show risk" might be a single mother with unpredictable childcare. Penalizing her scheduling access based on a prediction model would be both unethical and potentially a fair treatment violation.
Failure mode: Front desk staff sees a patient's no-show risk score of 87% and decides not to offer the last available slot for a specialist referral. Patient does not get timely care. If discovered, this creates significant legal and ethical liability.
Scope: All no-show prediction outputs. Internal operational use only.
agent roles and authority
The appointment agent owns reminders and no-show prediction. The education agent owns patient handouts and condition-specific content. The onboarding agent owns staff training docs. No agent touches another agent's domain.
Why: In week 4, the education agent started generating "appointment preparation" content that overlapped with the appointment agent's pre-visit reminders. Patients received two messages for the same visit with slightly different instructions (one said "fast for 12 hours," the other said "fast for 8 hours"). Front desk fielded 6 confused calls in one morning.
Failure mode: Two agents send conflicting pre-visit instructions. Patient follows the wrong one. Lab results are invalid. Patient has to return for a redraw, losing a half-day of work. Patient leaves a 1-star review mentioning "they can't even get their own instructions straight."
Scope: All three agents. Authority boundaries documented in the shared config file.
The onboarding agent generates training documentation only. It does not schedule training sessions, assign mentors, or modify the HR system. Those actions require Tanya.
Why: Scope creep is real. The onboarding agent started drafting "suggested training schedules" that staff interpreted as actual assignments. New MA showed up for a shadowing session that had never been confirmed with the supervising physician. Physician was mid-procedure.
Failure mode: Agent suggests a training schedule. New hire treats it as confirmed. Shows up to shadow Dr. Okafor during a procedure that requires focused attention. Disruption, awkwardness, and Dr. Okafor loses 15 minutes explaining the situation.
Scope: Onboarding agent. Generates documents only, never operational actions.
The education agent uses only physician-approved medical references (UpToDate, CDC guidelines, ADA standards) as source material. It does not synthesize from general web content or training data alone.
Why: Medical accuracy is non-negotiable. The agent generated a diabetes management handout that included a dietary recommendation sourced from its training data rather than current ADA guidelines. The recommendation had been updated 8 months prior. Dr. Pham spent 20 minutes correcting it.
Failure mode: Agent generates content based on outdated training data. Handout recommends a medication dosing schedule that was revised 6 months ago. Patient follows outdated guidance. Best case: ineffective treatment. Worst case: adverse event.
Scope: All medical content generation. Approved source list maintained by Dr. Pham.
coordination patterns
All three agents write to a shared daily summary file by 6 AM. Tanya reviews the summary before the clinic opens at 7:30 AM. Items requiring physician attention are flagged in a separate section at the top.
Why: Physicians have approximately 15 minutes of non-clinical time before the first patient. They cannot review three separate agent outputs. A single summary with physician items at the top respects their time constraints.
Failure mode: Without the compiled summary, physicians check agent output sporadically between patients. Important items get buried. The 47-item backlog formed partly because physicians did not have a clear view of what was waiting.
Scope: Daily operations. All three agents feed into one summary.
The no-show prediction model updates its risk scores every Monday using the prior 90 days of appointment data. Scores are not recalculated mid-week to avoid confusing the front desk with shifting numbers.
Why: Early implementation recalculated scores daily. Front desk staff saw a patient's score change from 45% to 72% between Monday and Wednesday with no obvious cause. They lost confidence in the system and stopped consulting it entirely for two weeks.
Failure mode: Scores shift daily based on minor data changes. Staff perceives the system as unreliable. Adoption drops to near zero. The 23% no-show rate does not improve despite the investment.
Scope: No-show prediction scoring cadence.
operational heuristics
Patient education handouts are written at a 6th-grade reading level using the Flesch-Kincaid scale. The education agent checks readability before submitting for review.
Why: Our patient population includes a significant number of non-native English speakers and older adults. Early handouts scored at 10th-grade reading level. Dr. Okafor observed patients nodding along but clearly not understanding the content. Post-visit comprehension checks confirmed the gap.
Failure mode: Handout on managing hypertension uses terms like "antihypertensive regimen" and "sodium restriction protocol." Patient takes the handout home, does not understand it, and does not follow the guidance. Blood pressure remains uncontrolled at the next visit.
Scope: All patient-facing written content.
Appointment reminders for patients who no-showed their last visit include a warmer, non-judgmental tone and an explicit offer to reschedule. No mention of the missed appointment.
Why: The default reminder tone felt transactional. Patients who had already missed once responded better to "We'd love to see you" than "You have an appointment on Tuesday." Reschedule rate for prior no-shows improved from 31% to 48% after the tone change.
Failure mode: Standard reminder sent to a patient who missed their last appointment. Patient feels guilty or defensive. Ignores the reminder. No-shows again. Pattern solidifies.
Scope: Appointment reminders for patients with at least one no-show in prior 90 days.
Onboarding documents are versioned with a date stamp in the filename. When clinical protocols change, the onboarding agent regenerates affected documents within 48 hours. Old versions are archived, never deleted.
Why: A new MA was trained using a document that referenced the old blood draw protocol (tourniquet for 60 seconds). Protocol had changed to 30 seconds two months prior. Document had not been updated. MA followed the outdated procedure for a full week before a supervising nurse caught it.
Failure mode: Outdated onboarding document trains new staff on a deprecated procedure. Staff performs the procedure incorrectly. In a primary care setting, most deprecated procedures are low-risk, but the cumulative effect of outdated training erodes clinical quality.
Scope: All onboarding and training documentation.
failure patterns
Staff skepticism of AI content must be addressed proactively with transparency, not by hiding AI involvement. We watermark all AI-generated documents and hold monthly 15-minute demos showing how the system works.
Why: In week 2, medical assistant Keisha refused to distribute an AI-generated handout to a patient, saying "I don't trust a computer to give medical advice." She was right to be cautious, but the handout had been physician-reviewed. The issue was that she did not know about the review step.
Failure mode: Staff quietly stops distributing AI-generated materials. Education content sits in the queue unused. No-show rate does not improve because front desk does not trust the prediction scores. Six weeks of implementation effort produces zero measurable results.
Scope: All staff interactions with AI-generated content.
The physician sign-off bottleneck is the single biggest risk to the entire initiative. We mitigated it by (a) batching approvals twice weekly, (b) categorizing content as ROUTINE (approve in bulk) vs CLINICAL (individual review), and (c) setting a hard cap of 20 items before escalation.
Why: The 47-item backlog in week 3 nearly killed the project. Dr. Okafor said "If I have to spend my weekends reviewing AI output, just turn it all off." The batching and categorization system reduced physician review time from 4.5 hours per week to 1.5 hours.
Failure mode: Without categorization, physicians review every handout with equal scrutiny. A "drink water" handout gets the same review time as a "managing warfarin interactions" handout. Physicians burn out on low-value reviews and stop reviewing entirely.
Scope: All content requiring physician approval.
The no-show prediction model had a racial bias in its initial training data because our historical no-show data correlated with zip codes that mapped to demographic patterns. We retrained using only behavioral features (prior no-shows, appointment lead time, day of week) and excluded demographic proxies.
Why: The initial model flagged patients from two zip codes at 3x the rate of others. Tanya noticed the pattern during week 2. Those zip codes correspond to predominantly Black neighborhoods. Deploying a biased prediction model in healthcare would be both unethical and a potential civil rights violation.
Failure mode: Biased model deployed without audit. Front desk unconsciously treats flagged patients differently. Pattern becomes self-reinforcing. Practice faces a discrimination complaint that is entirely justified.
Scope: All predictive models used in patient-facing operations.
human ai boundary conditions
AI agents never interact directly with patients. All patient communication goes through staff. The appointment agent generates reminder text; Tanya's team sends it through the practice's existing patient communication system.
Why: Patients trust their doctor's office, not an AI system. A message "from your care team" carries weight. A message from an AI carries suspicion. This is healthcare, where trust is the foundation of the therapeutic relationship.
Failure mode: Patient receives a reminder that feels automated or impersonal. Calls the office to confirm it is real. Front desk spends 3 minutes per call reassuring patients. With 140 patients per week, that is potentially 7 hours of wasted staff time.
Scope: All patient-facing communications across all channels.
Clinical decision-making remains exclusively human. Agents provide data, patterns, and prepared materials. They never recommend diagnoses, suggest treatments, or interpret lab results. The no-show prediction agent predicts behavior, not clinical outcomes.
Why: The education agent once included a line in a handout that read "If your A1C is above 7, your doctor may recommend increasing your metformin dose." Dr. Pham flagged it immediately. That sentence crosses from education into clinical recommendation, which an AI system is not licensed or qualified to make.
Failure mode: Agent-generated content includes a treatment suggestion. Patient reads it as physician advice. Adjusts their own medication based on the suggestion. Adverse event occurs. Practice faces malpractice exposure for content it generated but did not intend as clinical guidance.
Scope: All agent output. The line between education and recommendation must be actively policed.
Compare with Another OOS
Search for an organization to compare against.