Message Batches API and CI/CD Integration Patterns

Your team reviews 80 pull requests per week. Each review takes a developer 30 minutes. That is 40 hours of engineering time — one entire person — spent reading diffs and writing comments. Half of those comments are about the same five patterns: missing error handling, inconsistent naming, untested edge cases, stale comments, and type safety gaps.

This is the problem the CCA tests from two angles: batch processing for cost-efficient bulk analysis, and CI/CD integration for automated per-PR review. Both reduce the human review burden. Neither is appropriate for all workloads. The exam tests whether you know which is which.

The Message Batches API

The Batches API is the cost optimization layer for latency-tolerant workloads. Instead of sending requests one at a time and paying full price, you submit a batch of requests and Anthropic processes them within a 24-hour window at 50% cost savings.

The mechanics are straightforward. You construct an array of message requests, each tagged with a custom_id that you define. You submit the batch. You poll for completion. When the batch finishes, you retrieve results and match each response to its request using the custom_id.

// Submit a batch of 100 document reviews
const batch = await anthropic.messages.batches.create({
  requests: documents.map((doc, i) => ({
    custom_id: `doc-review-${doc.id}`,
    params: {
      model: "claude-sonnet-4-6",
      max_tokens: 2048,
      messages: [{
        role: "user",
        content: `Review this document for compliance issues:\n\n${doc.content}`
      }]
    }
  }))
});

// Poll for completion
let status = await anthropic.messages.batches.retrieve(batch.id);
while (status.processing_status !== "ended") {
  await sleep(60_000); // Poll every minute
  status = await anthropic.messages.batches.retrieve(batch.id);
}

// Retrieve results — correlate by custom_id
for await (const result of anthropic.messages.batches.results(batch.id)) {
  const docId = result.custom_id; // "doc-review-123"
  if (result.result.type === "succeeded") {
    await saveReview(docId, result.result.message);
  } else {
    await markForResubmission(docId, result.result.error);
  }
}

When to Batch — and When Not To

This is the judgment call the CCA tests hardest.

Use batches for: Reviewing 100 PRs overnight. Processing 500 documents for compliance. Generating test suites for an entire codebase. Nightly code quality reports. Weekly security audits. Anything where you submit work, go home, and retrieve results in the morning.

Do not use batches for: Pre-merge code review that blocks PR merging — developers cannot wait 24 hours. Interactive debugging sessions. Real-time customer support. Any workflow where a human is waiting for the response.

Cannot use batches for: Any workflow requiring a multi-turn tool loop. A batch request accepts all Messages API features, including tool definitions, and a batch item can emit a tool_use response — but that tool_use block ends the item. You cannot execute the tool, append the tool_result, and continue the conversation within the same batch. If your review needs to call tools and act on their results — reading files, running tests, checking documentation, then reasoning over what came back — it cannot be a batch request. This constraint eliminates most agentic workflows from batch eligibility.

The batch constraint on multi-turn tool loops is why CI/CD integration uses Claude Code directly rather than the Batches API. CI/CD review needs to read files, trace imports, and understand context — an iterative tool loop, not a single tool_use response. Batches give you 50% savings but no in-batch tool loop: a batch item can declare tools and return one tool_use block, yet you cannot run the tool and continue within the batch. The exam expects you to make this tradeoff explicitly.

Handling Batch Failures

Not every request in a batch succeeds. The CCA tests whether you handle failures correctly.

When a batch completes, each result has a type: succeeded, errored, or expired. The correct pattern is to identify failed requests by their custom_id, determine why they failed, and resubmit only those requests with appropriate modifications.

// Resubmit only failed requests
const failures = results.filter(r => r.result.type !== "succeeded");
const resubmitBatch = await anthropic.messages.batches.create({
  requests: failures.map(f => {
    const original = getOriginalRequest(f.custom_id);
    // If the request was rejected as too large, chunk the document
    if (f.result.error?.type === "invalid_request_error") {
      return { ...original, params: { ...original.params, messages: chunkDocument(original) } };
    }
    return original; // Retry unchanged for transient errors
  })
});

Claude Code in CI/CD

The second half of this lesson is the operational pattern the CCA tests most in Scenario 5: running Claude Code as an automated code reviewer in your CI/CD pipeline.

The critical flag is -p (or --print). Without it, Claude Code starts an interactive session and waits for input. In a CI pipeline, there is no human to type. The process hangs. The build times out. The -p flag runs Claude Code in non-interactive mode — it processes the prompt and exits.

# .github/workflows/claude-review.yml
name: Claude Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Claude Code Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          claude -p \
            --output-format json \
            --json-schema '{"type":"object","properties":{"findings":{"type":"array","items":{"type":"object","properties":{"file":{"type":"string"},"line":{"type":"integer"},"severity":{"type":"string","enum":["critical","warning","info"]},"issue":{"type":"string"},"suggestion":{"type":"string"}},"required":["file","severity","issue"]}}}}' \
            "Review the changes in this PR for bugs, security issues, and code quality. Focus on: missing error handling, type safety gaps, and untested edge cases. Do not flag minor style issues." \
            > review-output.json

      - name: Post Review Comments
        run: node scripts/post-review-comments.js review-output.json

The --output-format json flag produces machine-parseable output. The --json-schema flag enforces a specific structure. Together, they let you parse Claude's findings programmatically and post them as inline PR comments.

The Self-Review Trap

The CCA tests a subtle but critical anti-pattern: using the same Claude session that generated code to review that code.

Why this fails: the generating session retains its reasoning context. It remembers why it made each decision. When asked to review its own work, it is predisposed to confirm its decisions rather than question them. This is not a hypothetical concern — it is a measured reliability gap that independent review instances close.

The correct pattern: use a separate Claude instance for review. The reviewer has no reasoning context from the generation phase. It evaluates the code on its merits, not on remembered justifications.

This is directly tested in Task 4.6 of the CCA exam guide. The exam will present a scenario where a team's code reviews are missing subtle issues, and the correct answer involves switching from self-review to independent-instance review.

Multi-Pass Review Architecture

This section frames multi-pass review as a CI pipeline concern — how to wire the passes into an automated PR check. The canonical mechanics and the validation/retry framing live in the "Validation-Retry Loops, Structured Output, and Multi-Pass Review" lesson (Domain 4); here the focus is the CI orchestration.

Large PRs — 15 files, 800 lines changed — cannot be reviewed effectively in a single pass. Attention dilution causes the model to catch bugs in the first few files and miss them in the later files. The multi-pass architecture solves this.

Pass 1: Per-file local analysis. Each file is reviewed independently for local issues — style violations, type errors, missing error handling, dead code. The scope is narrow. Attention is focused.

Pass 2: Cross-file integration. All changed files are reviewed together for cross-file issues — data flow consistency, API contract alignment, import correctness, architecture violations. These issues are invisible in per-file review.

Pass 3: Summary and severity. All findings are aggregated, deduplicated, and severity-classified. Confidence scores are attached to each finding for review routing — high-confidence findings go directly to the developer, low-confidence findings go to a human reviewer for validation.

// Multi-pass review pipeline
async function reviewPR(changedFiles: string[]) {
  // Pass 1: Per-file analysis
  const localFindings = await Promise.all(
    changedFiles.map(file =>
      claude("-p", `Review ${file} for bugs, type errors, missing error handling.`)
    )
  );

  // Pass 2: Cross-file integration
  const crossFileFindings = await claude("-p",
    `Given these files and their local findings, identify cross-file issues:
     Data flow inconsistencies, API contract violations, shared state mutations.
     Files: ${changedFiles.join(", ")}
     Local findings: ${JSON.stringify(localFindings)}`
  );

  // Pass 3: Summary with confidence scores
  return await claude("-p", "--output-format json",
    `Aggregate and deduplicate these findings. Assign severity and confidence:
     Local: ${JSON.stringify(localFindings)}
     Cross-file: ${JSON.stringify(crossFileFindings)}`
  );
}

Re-Review After New Commits

When a developer pushes fixes after an initial review, the CCA tests whether you include prior findings in the re-review context. The correct pattern:

const reReview = await claude("-p",
  `Review the new changes in this PR.
   Prior review findings (may be addressed): ${JSON.stringify(priorFindings)}
   Report only NEW issues or issues that are STILL unaddressed.
   Do not duplicate prior findings that have been fixed.`
);

Without this context injection, the re-review produces duplicate comments — the same findings the developer already fixed. This erodes developer trust in the review system.

Lesson Drill

Design a review architecture for your team's CI/CD pipeline:

Identify which of your team's review workloads are latency-tolerant (batch-eligible) vs blocking (require sync)
For batch-eligible workloads, calculate cost savings: current monthly API spend × 50%
For CI/CD review, write the CLAUDE.md section that documents your team's review criteria
Design a three-pass review pipeline for your largest typical PR size
Implement re-review context injection to prevent duplicate findings
Calculate your batch submission frequency given a 30-hour SLA requirement (24h processing + buffer)

The output is not the pipeline itself. It is the judgment about which workload goes where and why. That judgment is what the CCA scores.

Bottom Line

The Batches API saves 50% but cannot call tools and has no latency guarantee. CI/CD integration with -p gives real-time review with full tool access but at full price. Multi-pass review prevents attention dilution on large PRs. Self-review is unreliable — use independent instances. Re-review needs prior findings in context to avoid duplicate comments.

The CCA tests all of these as tradeoff decisions, not feature knowledge. When the exam presents a workflow and asks which API approach is correct, the answer depends on three questions: Does it need tool calling? Is it latency-sensitive? Can it wait 24 hours? Those three questions determine the correct architecture every time.