CareCaller Auto-Flagger
Automatically detect which AI healthcare calls need human review
The Problem
CareCaller uses AI agents to call patients for medication refill check-ins, asking 14 health questions. About 9% of calls have issues. This tool automatically flags those calls for human review.
Speech-to-text drops digits from health numbers, recording dangerous values that look plausible.
The AI marks the call as complete but only asked a fraction of the required health questions.
The AI oversteps its role and offers medical guidance it is not qualified to give.
How It Works
Two models work together: LightGBM finds patterns in numbers, DeBERTa catches contradictions in text. A meta-learner combines both predictions.
The Journey
F1 score on hidden test set
Results
F1 = 1.000 on the hidden test set. Every problematic call was caught. Zero false alarms.
Out of 159 test calls, 18 were flagged for human review (11.3%). The rest were confirmed clean.
5-fold cross-validation F1 = 0.974. Consistent performance across 5 random splits of the data — not just lucky on one test.
Example: What a Flagged Call Looks Like
NLI contradiction 100%, 15% responses not in transcript, 4 heuristic rules triggered
View full call detail →