Week 1 — Document Intake + Extraction + Validation
Problem:
AI extraction output is unreliable unless it becomes validated product state.
Built:
- PDF/email intake
- Gemini extraction
- deterministic validation
- missing fields/conflicts
- persisted run timeline
Hard parts:
- schema design
- parsing model output safely
- handling incomplete claim data
- converting AI output into app status
Demo:
<link>
What this proves:
I can build AI workflows where LLM output is checked, stored, and made reviewable.
https://github.com/aws-samples/sample-multimodal-claims-processing-recommendation-system
Day 1 - DB and Schema Design + Creating Sample Data
Day 2 - Upload & Save PDF / EMAIL_TEXT
Day 3 - Move runs from UPLOADED → EXTRACTED
Day 6 - Soft Deleting Document
Day 6 - Part 2 ( Soft Delete Testing )
The model estimates overallConfidence based on how clear/complete/ambiguous the extraction was.
Synthetic expected-extraction JSON includes overallConfidence because it is pretending to be the expected Gemini output, not the source document. ( for testing models’s output )
Architectural Decisions through Label Studio & A2I