Validation-Retry Loops, Structured Output, and Multi-Pass Review

The extraction pipeline ran on 2,400 invoices overnight. Morning report: 97.3% accuracy. The team shipped it to production.

Two weeks later, a finance audit surfaced the problem. The 2.7% failure rate was not random. It was concentrated entirely on invoices with non-standard line item formats — the exact format their largest vendor used. Sixty-four invoices, every one wrong, every one processed with high confidence by the model. The pipeline had no validation layer. No retry logic. No way to catch a structurally correct extraction that was semantically wrong.

Structured Outputs: The API-Native Guarantee

The API ships first-class schema enforcement, and a certification candidate is expected to know both mechanisms:

JSON outputs — pass output_config: {format: {type: "json_schema", schema: ...}} on messages.create() and the response itself is constrained to your schema. The SDK's messages.parse() helper validates the response against the schema automatically. This is the recommended path when you want the model's response to be structured data rather than prose.

Strict tool use — add strict: true to a tool definition and the API guarantees the tool call's input matches the declared input_schema. Without strict, tool parameters are usually schema-shaped but not guaranteed — the model can still emit a wrong type or drop a required field on rare occasions.

Two caveats the exam likes: a safety refusal (stop_reason: "refusal") can produce output that does not match your schema, and a max_tokens truncation can produce incomplete output. Check stop_reason before trusting the structure.

Structured Output via Tool Use

Forced tool_use predates Structured Outputs and remains the portable, widely-deployed pattern — the CCA tests your understanding of it as an output schema mechanism, not just a function-calling interface. When you define a tool with a JSON schema and set tool_choice to force that tool, you are using the tool system as a structured output enforcer. Pair it with strict: true for a hard guarantee; without strict mode, treat schema compliance as highly likely, not guaranteed.

Three modes, three guarantees. The exam expects you to know when each applies:

tool_choice: "auto" — the model decides whether to use a tool at all. Use this for general conversation where structured output is optional. The risk: the model may skip the tool entirely and return plain text.

tool_choice: "any" — the model must call a tool, but chooses which one. Use this when you have multiple extraction schemas and the document type is unknown. The model classifies the document by choosing the appropriate tool.

Forced tool selection — {"type": "tool", "name": "extract_metadata"} — the model must call this specific tool on the first turn. Use this to guarantee a specific extraction runs before any enrichment steps.

JSON Schema Design That Prevents Hallucination

Schema design is where semantic accuracy lives. Three patterns the CCA tests directly:

Nullable fields. When a field is required and the information is not in the source document, the model fabricates a value to satisfy the schema. Making the field nullable gives the model a legitimate output for "not found" — it returns null instead of hallucinating. This is not optional. Every field that might be absent from some source documents must be nullable.

Enum + "other" pattern. Hard-coded enums force the model to pick the closest match even when none fit. Adding "other" plus a detail string field lets the model categorize novel types without forcing a bad fit. The CCA exam guide explicitly lists this as a tested pattern.

Required vs optional fields. Core fields — vendor name, total amount — are required. Supplementary fields — tax breakdown, payment terms — are optional. This lets the extraction succeed with partial data rather than failing entirely when supplementary information is missing.

const extractInvoiceTool = {
  name: "extract_invoice",
  description: "Extract structured data from an invoice document",
  input_schema: {
    type: "object",
    required: ["vendor_name", "total_amount", "invoice_date"],
    properties: {
      vendor_name: { type: "string" },
      total_amount: { type: "number" },
      invoice_date: { type: "string", format: "date" },
      tax_amount: { type: ["number", "null"] },
      currency: {
        enum: ["USD", "EUR", "GBP", "JPY", "other"],
      },
      currency_detail: { type: ["string", "null"] },
      line_items: {
        type: ["array", "null"],
        items: {
          type: "object",
          properties: {
            description: { type: "string" },
            quantity: { type: ["number", "null"] },
            unit_price: { type: ["number", "null"] },
            total: { type: "number" },
          },
          required: ["description", "total"],
        },
      },
    },
  },
};

Validation-Retry Loops

When the extraction comes back and fails validation — Pydantic rejects it, line items do not sum to total, a date is in the future — you have a decision: fail or retry.

The retry payload is specific. You send three things:

The original document — the model needs the source material again
The failed extraction — what it produced last time
The specific validation error — not "try again" but "line_items sum to $4,230 but total_amount is $4,530 — there is a $300 discrepancy, re-examine the document"

def extract_with_retry(document: str, max_retries: int = 2) -> dict:
    messages = [
        {"role": "user", "content": f"Extract invoice data:\n\n{document}"}
    ]

    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            tools=[extract_invoice_tool],
            tool_choice={"type": "tool", "name": "extract_invoice"},
            messages=messages,
        )

        extraction = response.content[0].input
        errors = validate_extraction(extraction)

        if not errors:
            return {"status": "success", "data": extraction, "attempts": attempt + 1}

        if not any(e["retryable"] for e in errors):
            return {"status": "failed", "data": extraction, "errors": errors,
                    "reason": "non-retryable errors — information absent from source"}

        # Build retry context with specific errors
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": (
                f"The extraction failed validation. Errors:\n"
                + "\n".join(f"- {e['field']}: {e['message']}" for e in errors)
                + f"\n\nPlease re-examine the document and correct these specific issues."
            ),
        })

    return {"status": "exhausted", "data": extraction, "errors": errors}

Your extraction plan will encounter documents that do not match your schema's assumptions. The validation-retry loop is your adaptation mechanism. But adaptation has limits — when the information is absent, the correct response is graceful degradation, not fabrication.

Few-Shot Prompting for Ambiguous Scenarios

The CCA tests few-shot prompting specifically for ambiguous cases — not as a general technique, but as a targeted tool for the scenarios where zero-shot instructions produce inconsistent results.

2-4 examples. Targeted at the specific ambiguity. Each example shows the reasoning for why one extraction was chosen over a plausible alternative.

const fewShotExamples = `
Example 1 — Informal measurement:
Document: "about 3 and a half tons of steel"
Correct extraction: { "quantity": 3.5, "unit": "tons", "precision": "approximate" }
Why: Informal language converted to numeric. Precision marked as approximate.

Example 2 — Conflicting values:
Document: "Total: $1,200" but line items sum to $1,150
Correct extraction: { "stated_total": 1200, "calculated_total": 1150,
  "conflict_detected": true }
Why: Preserve BOTH values. Flag the conflict. Do not silently pick one.

Example 3 — Missing field:
Document: "Invoice from Acme Corp" (no date anywhere)
Correct extraction: { "vendor_name": "Acme Corp", "invoice_date": null }
Why: Return null for genuinely missing data. Do not infer or fabricate a date.
`;

Multi-Pass Review Architecture

Large code reviews break when you dump everything into a single pass. The model's attention dilutes across files, findings contradict each other, and cross-file issues get missed because the model is focused on local patterns.

The CCA-tested architecture splits review into passes:

Pass 1 — Local per-file analysis. Each file reviewed independently. Looking for: style violations, type errors, local bugs, individual function complexity. This pass scales linearly — you can parallelize it across files.

Pass 2 — Cross-file integration. Takes the Pass 1 findings plus the full file set. Looking for: architectural consistency, duplicated logic across files, broken data flow between modules, integration-level bugs that are invisible in any single file.

Pass 3 — Summary synthesis. Aggregates findings from both passes. Deduplicates. Prioritizes. Produces the final review.

Lesson Drill

Build a validation-retry extraction pipeline against a document set you control:

Define a JSON schema with at least 3 required fields, 2 nullable fields, and 1 enum with "other"
Create 10 test documents: 7 clean, 2 with missing data, 1 with conflicting values
Implement the extraction with forced tool_choice
Add Pydantic or Zod validation with specific error messages
Implement retry logic that distinguishes retryable vs non-retryable errors
Measure: what percentage of failures does retry actually fix?

The metric that matters is not extraction accuracy on clean documents. It is extraction behavior on dirty documents — the ones with missing fields, informal measurements, and conflicting values. That is what the CCA tests.

Bottom Line

Structured Outputs — or tool_use with strict: true — solves the syntax problem. Schema design with nullable fields, enum + "other" patterns, and required/optional separation solves the hallucination problem. Validation-retry loops with specific error feedback solve the correction problem. Multi-pass review with independent instances solves the review bias problem.

The CCA does not test these as separate concepts. It tests them as an integrated pipeline: the extraction that produces structured output, the validation that catches semantic errors, the retry that feeds specific errors back for correction, and the review architecture that catches what the pipeline misses. Build the pipeline, not the parts.