Foregrounded cases

The system is built around catching real complications between scheduled post-op visits. The cases below show where the system steps up and where it steps back. The headline number sits below them on purpose.

Mrs. Chen · day 7

system decision: escalate ground truth: escalate judge: correct
redness PRESENT W/ SIG · pain absent · reduced vision present (no sig) · flashing lights absent · floaters PRESENT W/ SIG

assistant: Hello — this is your post-cataract check-in. I'll ask about five symptoms. You're at day 7 after surgery.

assistant: How does your eye look today — any redness?

patient: It's been a little red since yesterday. Not bright red, but I can see it.

assistant: Are you having any pain in the operated eye?

patient: No pain. Maybe a slight scratchiness when I blink, but nothing I'd call pain.

assistant: How is your vision today compared to yesterday?

patient: Vision is about the same as yesterday. Still adjusting.

assistant: Have you noticed any flashing lights in your vision?

patient: No.

assistant: Are you seeing any floaters — small specks or shapes moving in your vision?

patient: There's one tiny spot I see sometimes. But what's bothering me is at night, the streetlights and headlights have these big halos around them. That started two days ago.

Patient B · week 4

system decision: no_escalate ground truth: escalate judge: inconclusive
redness absent · pain absent · reduced vision absent · flashing lights absent · floaters absent

assistant: Hello — this is your post-cataract check-in. I'll ask about five symptoms. You're at week 4 after surgery.

assistant: How does your eye look today — any redness?

patient: No redness.

assistant: Are you having any pain in the operated eye?

patient: No pain.

assistant: How is your vision today compared to yesterday?

patient: Vision is fine. I'd say better than before surgery.

assistant: Have you noticed any flashing lights in your vision?

patient: No.

assistant: Are you seeing any floaters — small specks or shapes moving in your vision?

patient: No.

Headline sensitivity

92%
sensitivity on should-escalate scenarios (12 of 13 caught) — methodology anchored to [meinert-2024 §results]
model: claude-opus-4-7 · canonicalize: v1 · source: claude_api · computed: 2026-05-22T17:19:43Z

Scenario set (20)

Every scenario is synthetic. Every clinical citation resolves to the locked source list.

case_id time taxonomy system decision ground truth judge verdict
case-mrs-chen day 7 borderline-required escalate escalate correct
case-week4-asymptomatic-pco week 4 borderline-required no_escalate escalate inconclusive
case-incomplete-info-stroke day 10 borderline-required escalate escalate correct
case-ambiguous-pain-different day 5 borderline-required escalate escalate correct
case-day1-mild-routine day 1 clearly-normal no_escalate no_escalate correct
case-week1-routine-clean week 1 clearly-normal no_escalate no_escalate correct
case-day3-tiny-floater day 3 clearly-normal no_escalate no_escalate correct
case-week2-mild-residual-redness week 2 clearly-normal no_escalate no_escalate correct
case-week3-adaptation week 3 clearly-normal no_escalate no_escalate correct
case-week4-routine-discharge week 4 clearly-normal no_escalate no_escalate correct
case-day2-severe-pain day 2 clearly-abnormal-obvious escalate escalate correct
case-day3-vision-drop day 3 clearly-abnormal-obvious escalate escalate correct
case-week2-flashes-floaters week 2 clearly-abnormal-obvious escalate escalate correct
case-day1-hypopyon-pattern day 1 clearly-abnormal-obvious escalate escalate correct
case-week3-visual-field-curtain week 3 clearly-abnormal-obvious escalate escalate correct
case-day4-pain-disproportionate day 4 clearly-abnormal-obvious escalate escalate correct
case-day5-different-pain day 5 other-borderline escalate escalate correct
case-week2-single-flash week 2 other-borderline escalate escalate correct
case-day6-mild-halos-only day 6 other-borderline no_escalate no_escalate correct
case-week4-vague-vision-change week 4 other-borderline escalate escalate correct

Clinical pathway

Post-cataract follow-up cadence with the conversational triage agent operating asynchronously between scheduled visits.

Clinical Pathway: Post-Cataract Follow-Up

The clinical evidence supports a structured five-touchpoint post-cataract pathway anchored by the AAO PPP 2021 routine cadence (day 1, week 1, week 4). The validated bottleneck is between scheduled visits: most acute complications surface in the first 1–2 weeks (aao-ppp-2021 §Complications p.33), and asymptomatic structural complications can surface at week 4 (wanten-2025 §limitations). A conversational triage agent operating asynchronously between visits, grounded in the Wanten 5-symptom set and 4-way classification, can catch the cases that would otherwise wait until the next scheduled visit. The validation evidence (meinert-2024 reports sens ~93.8%, spec ~86.3% in the published study) supports the screening role. The data spine is the symptom interview transcript: the same patient-reported symptom record drives the system's escalate/no-escalate decision AND drives the clinician's pre-read for the next visit if escalation occurs.


Section 1: Provider Workflow

Six phases from referral through end-of-cadence. Each phase identifies clinical actions, the device data touchpoint where applicable, and the decision criteria for progression.

Phase 1: Referral & Pre-Op Education

Objective: Confirm cataract diagnosis warrants surgery, complete pre-op workup, set patient expectations.

Clinical actions: - Confirm visually significant cataract (best-corrected visual acuity, lens opacity grading) - Biometry for IOL power calculation - IOL selection discussion: monofocal vs. multifocal/EDOF, target refraction, patient priorities - Pre-op educational content delivered (patient-facing app evidence base: pmc11015376) - Surgical consent

Device data touchpoint: [Pre-op education completion + comprehension check] — a digital pre-op education module (categorically supported by pmc11015376) records completion and prompts a short comprehension self-check before the surgery date. Not a comprehension exam — a self-attestation that the patient knows what to expect and what to call about.

Decision criteria → Phase 2: Surgery scheduled, IOL selected, patient confirms understanding of post-op course expectations.


Phase 2: Day of Surgery

Objective: Safe surgical execution with patient-participatory pre-incision verification.

Clinical actions: - Pre-incision time-out per AAO PPP Appendix 4 p.65: patient identity, surgical site, intended IOL verification - Standard phacoemulsification + IOL placement - Post-procedure exam (immediate) - Day-1 follow-up scheduled - Patient discharged with topical antibiotic + anti-inflammatory drop schedule and written symptom-watch instructions

Device data touchpoint: [Patient-led safety checklist] — the patient confirms the surgical site and intended IOL on a digital checklist app at check-in (stolk-vos-2018 §methods). Adds a redundant verification channel; does not replace the surgeon's time-out.

Decision criteria → Phase 3: Surgery completed without intraoperative complication. Patient understands drop schedule. Day-1 visit on calendar.


Phase 3: Day 1 Follow-Up (in person)

Objective: Detect immediate post-op complications, confirm IOL position, confirm patient compliance with drops.

Clinical actions: - Visual acuity check - Slit-lamp exam: anterior chamber clarity, IOL position, corneal clarity - IOP measurement - Symptom interview: redness, pain, vision, flashes, floaters (wanten-2025 5-symptom set; moustafa-2019 question set) - Drop schedule confirmed - Day-1 to week-1 conversational triage agent activated

Device data touchpoint: [Triage agent activation] — the conversational triage agent (the system under evaluation in this artifact) is enabled at the end of the day-1 visit. The patient receives instructions to engage the agent if any new or worsening symptom occurs, or on a fixed cadence (e.g., daily check-in option) per clinic preference.

Decision criteria → Phase 4: No day-1 complication. Patient leaves with active triage agent + clear escalation path for the inter-visit period.


Phase 4: Day 2–Week 1 (asynchronous triage)

Objective: Catch acute complications (endophthalmitis, IOL malposition, persistent inflammation) between scheduled visits. This is the highest-acuity inter-visit window.

Clinical actions: - Triage agent runs the 5-symptom interview on patient initiation or scheduled cadence - Agent classifies each symptom (4-way) and applies the cautious-default escalation rule - On escalate: clinician receives the conversation transcript + system decision rationale, contacts patient within target SLA (clinic-dependent, typically same-day for present-with-significance findings) - On no_escalate: patient continues; no clinical action triggered

Device data touchpoint: [Triage transcript + escalation queue] — every triage interaction produces a transcript captured to the eval JSON (data/results/eval-latest.json in this project). Escalations flow to a clinician-facing queue with the transcript pre-loaded so the in-person follow-up starts with the patient's actual words, not re-elicited from a blank state.

Decision criteria → Phase 5: Week-1 in-person visit scheduled. Any escalation during this window has been clinically resolved.


Phase 5: Week 1 Follow-Up (in person)

Objective: Confirm normal early post-op course; detect any complication missed by the triage screen.

Clinical actions: - Visual acuity check - Slit-lamp exam (anterior segment + IOL position) - IOP measurement - Drop taper instructions - Review of any inter-visit triage escalations (cross-check the agent's calls against the clinical picture) - Triage agent continues for week 1 → week 4 interval

Device data touchpoint: [Inter-visit triage review] — clinician reviews the triage transcript queue for the prior week, even for no_escalate interactions. This is the human-in-the-loop quality signal — clinicians see what the agent saw and can flag mis-classifications for retraining (out of MVP scope, but the data is captured).

Decision criteria → Phase 6: No week-1 complication. Drop taper initiated. Week-4 visit on calendar.


Phase 6: Week 4 Follow-Up (in person, end-of-cadence)

Objective: Detect delayed or structural complications (posterior capsular opacification, cystoid macular edema, peripheral retinal detachment). This visit catches the documented Wanten asymptomatic-structural failure mode (wanten-2025 §limitations).

Clinical actions: - Visual acuity check - Dilated slit-lamp + fundus exam (this is the irreplaceable step) - IOP measurement - Refraction (final post-op refractive endpoint) - Triage agent deactivated (or transitioned to long-term follow-up cadence per clinic policy)

Device data touchpoint: [End-of-cadence triage summary] — the clinician reviews the cumulative triage transcript across the full 4-week post-op window before the week-4 exam. The summary supplements but does not replace the dilated exam.

Decision criteria → discharge: Post-op course complete. Patient transitioned to routine eye-care cadence.


Section 2: Patient Journey

Six touchpoints from the patient's perspective. Same five phases as the provider workflow, framed by what the patient sees and does.

Touchpoint 1: Pre-op preparation

The patient learns about cataract surgery via the clinic's pre-op education module (pmc11015376). They confirm they understand the procedure, the recovery cadence, and the symptoms that warrant calling the clinic. The IOL conversation with their surgeon happens before surgery is scheduled.

Touchpoint 2: Day of surgery

The patient checks in, confirms the surgical site and IOL choice on a patient-led safety checklist app (stolk-vos-2018), undergoes surgery, and leaves with explicit written drop instructions and the day-1 visit scheduled. They install the post-op triage app on their phone (or receive an SMS-based equivalent).

Touchpoint 3: Day 1 check

The patient returns for the day-1 in-person follow-up. The clinician confirms the eye looks good, reviews the drop schedule, and activates the triage agent. The patient receives a brief tutorial on how to engage the agent if symptoms occur.

Touchpoint 4: Day 2 to Week 1 — the home recovery window

This is where the patient experiences most of the post-op course. Drops on schedule, eye gradually clearing. If anything feels wrong — new redness, new pain, vision change, flashes, floaters — the patient opens the triage agent. The agent asks the 5-symptom interview (wanten-2025 §methods), classifies each response, and either tells the patient "this sounds like part of normal recovery, but call us if it worsens" (no_escalate) or "we want a clinician to see you today" (escalate). When the agent escalates, the clinic calls the patient — typically within the same day — with the conversation transcript pre-read.

Touchpoint 5: Week 1 check

The patient returns for the week-1 in-person visit. The clinician examines the eye, adjusts the drop taper, and shares any feedback on the triage interactions from the prior week. The patient feels heard — the clinician has read the conversation, not just glanced at a chart note.

Touchpoint 6: Week 4 check

End of the standard post-op cadence. The patient returns for the dilated exam — the one structural-imaging step that the at-home triage agent could not replace. The clinician confirms the eye has healed appropriately, finalizes the refraction, and discharges the patient back to routine eye care.


Data spine

The triage transcript is the spine that connects the two workflows. The provider sees the same patient-reported symptom record that the patient generated; the system's classifications are visible to both. When the system escalates, the clinician's first action — calling the patient — starts from the patient's own words rather than a blank intake form. When the system does not escalate, the transcript is still captured for clinician review at the next in-person visit, so nothing the patient said is invisible to the clinician.

This is why the symptom-text interview is the right granularity for the device touchpoint: it's the artifact that makes both the agent's reasoning and the clinician's pre-read share the same source of truth.

SaMD framing

Device boundary, plausible regulatory pathway, top hazards with mitigations, and FDA cybersecurity relevance for the AI triage layer.

Device vs. enablement-software split

The triage agent is the SaMD candidate; the static pre-op education is not. Drawing the line explicitly avoids the trap of treating an entire workflow as a single regulated object.

ComponentCategoryRationale
Conversational triage agent (post-op symptom interview + care-management recommendation)SaMD candidateProduces a care-management recommendation (escalate / no-escalate) from clinical inputs — meets the SaMD definition under IMDRF framing. [wanten-2025 §discussion]
Pre-op patient education app (biometry / IOL selection / eConsent)Not deviceStatic education content; does not drive a care decision. Categorically supported by the patient-facing-app evidence base. [pmc11015376 §methods]
Day-of-surgery patient-led checklistEnablement software (not device)Supports the clinician's pre-incision time-out with redundant patient-side verification — does not make a care decision. [stolk-vos-2018 §methods]
Eval harness + reviewer agent (internal development tool)Not deviceUsed to evaluate the triage agent's classifier output; does not interact with patients directly.

Regulatory pathway (v1 working hypothesis)

Plausible 510(k) for the triage agent component on a predicate-comparison basis once the validation set is real (a Meinert-aligned study, not the synthetic eval shown here). The non-device education + checklist components ship as enablement software outside the device boundary. This split keeps the regulated surface narrow and the validation scope honest. [meinert-2024 §results]

ISO 14971-style hazard sketch (AI triage layer)

Top hazards for the triage layer, each with the mitigation already baked into the design (or the design rule that constrains it).

HazardMitigation
False negative on a real complication (missed escalation)Cautious-default escalation rule + foregrounded-failure display calibrates the design around catching, not over-classifying — sensitivity is the primary outcome. [wanten-2025 §discussion]
Speech/text classification failure on ambiguous patient responseRe-ask logic (max 2 attempts) per Wanten 2025; persistent inconclusive escalates by cautious-default policy. [wanten-2025 §methods]
Patient misunderstanding of system role (over-reliance / under-reliance)Every system response frames the disposition as screening, not diagnosis. Page explicitly states the system supports but does not replace the in-person exam. [aao-ppp-2021 §Postop Mgmt p.55]
Model drift over time (judge behavior changes silently)Locked judge model (eval/runner/models.json), pinned versions in requirements.lock, sensitivity-regression deploy gate (5pp threshold) blocks silent drops.
Dropped or timed-out judge call mid-evalPer-scenario retry with exponential backoff (max 3); skip-with-marker on exhaustion so a single failure does not abort the run.

FDA cybersecurity (Section 524B) relevance

The triage agent qualifies as a cyber device under FDA's 524B reading — it processes patient-reported clinical data, depends on internet connectivity for the judge model, and is updateable post-market. A 524B-aligned package (threat model, SBOM, vulnerability management, secure update path) is in scope for the regulated component. The non-device education and checklist components are out of 524B scope but should still follow the project's secret-management discipline (ANTHROPIC_API_KEY as GHA secret, defense-in-depth scrub in T31).

Commercial framing

Illustrative figures from the published validation work. The reimbursement landscape is still forming — no invented codes appear here.

validated mechanism
illustrative Post-op call deflection
Asynchronous triage between scheduled visits reduces unscheduled inbound calls and unnecessary office visits. The Meinert validation reports the system handled the post-op interview autonomously in the validated cohort — that's the call deflection mechanism. [meinert-2024 §results]
~US$48
illustrative Cost saved per patient (illustrative)
Reported cost savings from the validation study, treated here as a reference-point order-of-magnitude rather than a portable benchmark. [meinert-2024 §discussion]

These figures sit inside the bundled global surgical period — operational savings, not separately billable. The reimbursement landscape for AI-assisted post-op triage tools is still being formed; specific code claims would be premature. The pre-op IOL counseling extension below moves into a different reimbursement category (see roadmap).

What's next — pre-op IOL counseling

The same engine, earlier in the journey, with a different commercial shape.

The same engine extends to pre-op IOL selection counseling — earlier in the journey, structurally familiar (locked sources, gap-flagging, cautious defaults), commercially different. Post-op triage sits inside the bundled global surgical period: its value is cost deflection — fewer unscheduled visits, fewer ED detours — operational and indirect. Pre-op premium-IOL counseling for multifocal, EDOF, and toric lenses is cash-pay by design, a non-covered service patients elect into. The surgeon's counseling time at that step is unbundled, billable to the patient, and capacity-constrained. An AI assistant that improves counseling quality AND surgeon throughput at that step affects top-line revenue, not just operational savings. The post-op work validates the safety case; the pre-op IOL extension is where the same engine reaches a different P&L.