Week 1 — Document Intake + Extraction + Validation

Problem:
AI extraction output is unreliable unless it becomes validated product state.

Built:
- PDF/email intake
- Gemini extraction
- deterministic validation
- missing fields/conflicts
- persisted run timeline

Hard parts:
- schema design
- parsing model output safely
- handling incomplete claim data
- converting AI output into app status

Demo:
<link>

What this proves:
I can build AI workflows where LLM output is checked, stored, and made reviewable.

Primary Sample Data

https://github.com/aws-samples/sample-multimodal-claims-processing-recommendation-system

Day 1 - DB and Schema Design + Creating Sample Data

Day 2 - Upload & Save PDF / EMAIL_TEXT

Day 3 - Move runs from UPLOADED → EXTRACTED

Day 4 - Validating JSON

Day 5 - Review Queue Page

Day 6 - Soft Deleting Document

Day 6 - Part 2 ( Soft Delete Testing )

The model estimates overallConfidence based on how clear/complete/ambiguous the extraction was.

Synthetic expected-extraction JSON includes overallConfidence because it is pretending to be the expected Gemini output, not the source document. ( for testing models’s output )

Architectural Decisions through Label Studio & A2I

How my review system should be built inspired by A2I ?

Detailed Review System Architecture - Week 2