A working portfolio demo. Independent work. Synthetic data, labeled.
01
Eval
Turning between-visit patient symptoms into same-day clinical escalations.
Graded by LLM judges.
A 20-case eval of post-cataract follow-up triage. A conversational agent runs the 5-symptom interview Wanten 2025 validated, classifies each response, and on escalate hands the clinician a same-day call-back with the patient's own transcript pre-loaded, not a blank intake form.
A judge model grades each scenario against a ground-truth escalation decision. The headline number isn't accuracy. It's 12 of 13 should-escalate cases caught, paired with the one we missed, week-4 asymptomatic PCO, which a symptom-based screen has no signal to detect. The miss is counted in the 92%, and it sits next to the headline on purpose.
Cataract is the first care path. The same pattern (interview, classify, cautious-default escalation, transcript pre-loaded) extends to any post-op recovery with a defined symptom set.
Two cases under the headline. Mrs. Chen is the kind of complication the system is built to catch. The week-4 PCO is the one this class of tool can't, that's what the methodology above accounts for.
Mrs. Chen · day 7
system decision: escalateground truth: escalatejudge: correct
rednessPRESENT W/ SIG · painabsent · reduced visionpresent (no sig) · flashing lightsabsent · floatersPRESENT W/ SIG
assistant:Hello — this is your post-cataract check-in. I'll ask about five symptoms. You're at day 7 after surgery.
assistant:How does your eye look today — any redness?
patient:It's been a little red since yesterday. Not bright red, but I can see it.
assistant:Are you having any pain in the operated eye?
patient:No pain. Maybe a slight scratchiness when I blink, but nothing I'd call pain.
assistant:How is your vision today compared to yesterday?
patient:Vision is about the same as yesterday. Still adjusting.
assistant:Have you noticed any flashing lights in your vision?
patient:No.
assistant:Are you seeing any floaters — small specks or shapes moving in your vision?
patient:There's one tiny spot I see sometimes. But what's bothering me is at night, the streetlights and headlights have these big halos around them. That started two days ago.
Patient B · week 4
system decision: no_escalateground truth: escalatejudge: inconclusive
assistant:Hello — this is your post-cataract check-in. I'll ask about five symptoms. You're at week 4 after surgery.
assistant:How does your eye look today — any redness?
patient:No redness.
assistant:Are you having any pain in the operated eye?
patient:No pain.
assistant:How is your vision today compared to yesterday?
patient:Vision is fine. I'd say better than before surgery.
assistant:Have you noticed any flashing lights in your vision?
patient:No.
assistant:Are you seeing any floaters — small specks or shapes moving in your vision?
patient:No.
03
Clinical pathway
Post-cataract follow-up cadence with the conversational triage agent operating asynchronously between scheduled visits. The same six touchpoints, viewed from both sides.
Clinician journey
Where 92% lives The 92% sensitivity headline is measured in this window. 12 of 13 should-escalate cases caught here; the 1 FN is structurally invisible to a symptom-based screen.
Where the miss is caught The dilated week-4 exam catches the asymptomatic-PCO miss featured in #cases. The system stepping back here is the system honoring its scope.
Pre-op
BEFORE
Day of surgery
DAY 0
Day 1 check
DAY 1
Week 1 check
WEEK 1
Week 4 check
WEEK 4
Device-led window
beforeday 0day 1day 7day 28
Scale: cube-root of days from surgery. Compresses the week 1 → week 4 tail so the early-window touchpoints get adequate visual room while preserving honest temporal ordering. Pre-op is off-scale (variable lead time), rendered behind a discontinuity break.
Safe surgical execution with patient-participatory pre-incision verification.
aao-ppp-2021stolk-vos-2018
day 1
Day 1 check
triage agent activated
Detect immediate post-op complications, confirm IOL position, confirm drop compliance.
wanten-2025moustafa-2019
days 2–7
Day 2 – Week 1
ACTIVE, triage agent runs 5-symptom interview between visits
Catch acute complications (endophthalmitis, IOL malposition, persistent inflammation) between scheduled visits. The highest-acuity inter-visit window.
wanten-2025aao-ppp-2021meinert-2024
week 1
Week 1 check
inter-visit transcript review
Confirm normal early post-op course; detect any complication missed by the triage screen.
aao-ppp-2021
week 4
Week 4 check
end-of-cadence summary, then deactivation
Detect delayed or structural complications (PCO, cystoid macular edema, peripheral retinal detachment). This visit catches the documented asymptomatic-structural failure mode.
aao-ppp-2021wanten-2025
Patient journey
What the patient experiences here If something feels wrong, the patient opens the agent. It runs the 5-symptom interview and either reassures or escalates. On escalate, the clinic calls within the day, with the transcript pre-read.
What the system can't do alone The dilated exam, the structural-imaging step the at-home agent could not replace. The patient returns to the clinic; the system honors its scope.
Pre-op
BEFORE
Surgery
DAY 0
First check-up
DAY 1
Week 1 check-up
WEEK 1
Final check-up
WEEK 4
Home recovery · agent on call
beforeday 0day 1day 7day 28
before day 0
Pre-op
Learns about cataract surgery via the clinic's pre-op education module. Confirms they understand the procedure, the recovery cadence, and the symptoms that warrant calling the clinic. The IOL conversation with their surgeon happens before surgery is scheduled.
day 0
Surgery
Checks in, confirms the surgical site and IOL choice on a patient-led safety checklist app, undergoes surgery, and leaves with explicit written drop instructions and the day-1 visit scheduled. Installs the post-op triage app on their phone (or receives an SMS-based equivalent).
day 1
First check-up
Returns for the day-1 in-person follow-up. The clinician confirms the eye looks good, reviews the drop schedule, and activates the triage agent. The patient receives a brief tutorial on how to engage the agent if symptoms occur.
days 2–7
Day 2 – Week 1
The home recovery window. Drops on schedule, eye gradually clearing. If anything feels wrong (new redness, pain, vision change, flashes, floaters), the patient opens the triage agent. The agent runs the 5-symptom interview, classifies, and either reassures or escalates. On escalate, the clinic calls within the same day with the transcript pre-read.
week 1
Week 1 check-up
Returns for the week-1 in-person visit. The clinician examines the eye, adjusts the drop taper, and shares any feedback on the triage interactions from the prior week. The patient feels heard, the clinician has read the conversation, not just glanced at a chart note.
week 4
Final check-up
End of the standard post-op cadence. The patient returns for the dilated exam, the one structural-imaging step the at-home triage agent could not replace. The clinician confirms the eye has healed appropriately, finalizes the refraction, and discharges the patient back to routine eye care.
04
SaMD framing
Device boundary, plausible regulatory pathway, top hazards with mitigations, and FDA cybersecurity relevance for the AI triage layer.
Device vs. enablement-software split
The triage agent is the SaMD candidate; the static pre-op education is not. Drawing the line explicitly avoids the trap of treating an entire workflow as a single regulated object.
Produces a care-management recommendation (escalate / no-escalate) from clinical inputs, meets the SaMD definition under IMDRF framing. [wanten-2025 discussion]
Static education content; does not drive a care decision. Categorically supported by the patient-facing-app evidence base. [pmc11015376 methods]
Day-of-surgery patient-led checklist
Enablement software (not device)
Supports the clinician's pre-incision time-out with redundant patient-side verification, does not make a care decision. [stolk-vos-2018 methods]
Eval harness + reviewer agent (internal development tool)
Not device
Used to evaluate the triage agent's classifier output; does not interact with patients directly.
Regulatory pathway (v1 working hypothesis)
Plausible 510(k) for the triage agent component on a predicate-comparison basis once the validation set is real (a Meinert-aligned study, not the synthetic eval shown here). The non-device education + checklist components ship as enablement software outside the device boundary. This split keeps the regulated surface narrow and the validation scope honest. [meinert-2024 results]
ISO 14971-style hazard sketch (AI triage layer)
Top hazards for the triage layer, each with the mitigation already baked into the design (or the design rule that constrains it).
Hazard
Mitigation
False negative on a real complication (missed escalation)
Cautious-default escalation rule + foregrounded-failure display calibrates the design around catching, not over-classifying, sensitivity is the primary outcome. [wanten-2025 discussion]
Speech/text classification failure on ambiguous patient response
Re-ask logic (max 2 attempts) per Wanten 2025; persistent inconclusive escalates by cautious-default policy. [wanten-2025 methods]
Patient misunderstanding of system role (over-reliance / under-reliance)
Every system response frames the disposition as screening, not diagnosis. Page explicitly states the system supports but does not replace the in-person exam. [aao-ppp-2021 Postop Mgmt p.55]
Model drift over time (judge behavior changes silently)
Locked judge model (eval/runner/models.json), pinned versions in requirements.lock, sensitivity-regression deploy gate (5pp threshold) blocks silent drops.
Dropped or timed-out judge call mid-eval
Per-scenario retry with exponential backoff (max 3); skip-with-marker on exhaustion so a single failure does not abort the run.
FDA cybersecurity (Section 524B) relevance
The triage agent qualifies as a cyber device under FDA's 524B reading, it processes patient-reported clinical data, depends on internet connectivity for the judge model, and is updateable post-market. A 524B-aligned package (threat model, SBOM, vulnerability management, secure update path) is in scope for the regulated component. The non-device education and checklist components are out of 524B scope but should still follow the project's secret-management discipline (ANTHROPIC_API_KEY as GHA secret, defense-in-depth scrub in T31).
04
Commercial framing
Illustrative figures from the published validation work. The reimbursement landscape is still forming, no invented codes appear here.
validated mechanism
illustrative Post-op call deflection
Asynchronous triage between scheduled visits reduces unscheduled inbound calls and unnecessary office visits. The Meinert validation reports the system handled the post-op interview autonomously in the validated cohort, that's the call deflection mechanism. [meinert-2024 results]
~US$48
illustrative Cost saved per patient (illustrative)
Reported cost savings from the validation study, treated here as a reference-point order-of-magnitude rather than a portable benchmark. [meinert-2024 discussion]
These figures sit inside the bundled global surgical period, operational savings, not separately billable. The reimbursement landscape for AI-assisted post-op triage tools is still being formed; specific code claims would be premature. The pre-op IOL counseling extension below moves into a different reimbursement category (see roadmap).
05
What's next, pre-op IOL counseling
The same engine, earlier in the journey, with a different commercial shape.
The same engine extends to pre-op IOL selection counseling: earlier in the journey, structurally familiar (locked sources, gap-flagging, cautious defaults), commercially different. Post-op triage sits inside the bundled global surgical period: its value is cost deflection (fewer unscheduled visits, fewer ED detours), operational and indirect. Pre-op premium-IOL counseling for multifocal, EDOF, and toric lenses is cash-pay by design, a non-covered service patients elect into. The surgeon's counseling time at that step is unbundled, billable to the patient, and capacity-constrained. An AI assistant that improves counseling quality AND surgeon throughput at that step affects top-line revenue, not just operational savings. The post-op work validates the safety case; the pre-op IOL extension is where the same engine reaches a different P&L.
APPX
Scenario register (20)
The full scored case set, including the inconclusive verdict on the week-4 FN that anchors the methodology callout at the top. Every clinical citation resolves to the locked source list.