Scenario Description
You're integrating Claude Code into CI/CD pipelines to automate code reviews, generate missing tests, and provide PR feedback at scale. The challenge is designing Claude Code prompts and configurations that deliver actionable feedback while minimizing false positives. A poorly designed review agent might flag every variable name or nitpick indentation; a well-designed one catches real architectural issues and missing test coverage.
This scenario requires understanding the -p flag for non-interactive mode (essential for pipelines), structured output via --output-format json with --json-schema, and session context isolation to prevent cross-contamination between different reviews. You'll design multi-pass reviews (per-file checks followed by cross-file integration analysis), configure CLAUDE.md with testing standards and fixture expectations, and decide whether to use the synchronous API for blocking checks or the Batch API for overnight reports.
Critical design decisions include: Should the review agent comment on every finding, or only high-confidence issues? How do you avoid duplicate comments when the same file is reviewed in multiple PRs? How do you set review criteria precisely enough that the model understands what to focus on? The answer shapes whether your CI review becomes a trusted gate or noise that developers ignore.
Domains Tested
Key Concepts to Study
Claude Code Configuration & Workflows
-pflag for non-interactive pipeline mode: ensures Claude Code runs without prompting for approval, essential for CI integration--output-format jsonwith--json-schema: generating structured review findings (location, severity, fix suggestion) that can be parsed by CI systems- Session context isolation: each review runs in its own session to prevent context pollution from prior reviews
- CLAUDE.md for CI context: embedding testing standards, fixture requirements, and review criteria into configuration
- Batch API vs. synchronous execution: trade-off between cost savings (50% cheaper with batches, 24-hour window) and blocking check speed
Prompt Engineering & Structured Output
- Explicit review criteria: instead of "review for quality", specify "check for missing unit tests, security issues in SQL queries, and architectural inconsistencies"
- Few-shot examples in review prompts: showing examples of "good finding" vs. "not worth flagging" helps the model calibrate severity
- Avoiding duplicate comments: tracking prior findings in the prompt so the model doesn't re-flag the same issue across multiple PR updates
- Multi-pass review logic: first pass focuses on per-file issues (style, logic), second pass on cross-file consistency (API contracts, shared state)
Study Tips for This Scenario
-
Define Review Criteria as a Scoring Rubric: Don't give vague instructions like "check the code quality." Instead, provide a rubric: "Severity 1 (block merge): security vulnerabilities, missing error handling in critical paths. Severity 2 (nice to have): performance improvements, refactoring suggestions. Severity 3 (ignore): style inconsistencies." This guides the model's focus and helps developers prioritize.
-
Use
--json-schemato Enforce Structure: Define a JSON schema for review findings that includesfile,line_number,severity,category("security", "test_coverage", "architecture"),message, andsuggested_fix. This schema enforces consistent output that your CI system can reliably parse and convert to PR comments or dashboard entries. -
Implement Prior-Finding Tracking: When Claude Code reviews a file, include a list of previous findings for that file (from earlier PR updates or reviews). Prompt the model to "address only new or changed issues; don't re-flag items already known." This prevents reviewer fatigue from duplicate comments.
-
Design a Two-Pass Review for Confidence: Configure Claude Code to run two passes: first, a per-file review focusing on correctness and tests; second, a cross-file review checking consistency and architectural fit. Run both passes in one session (via CLAUDE.md instructions) and merge findings, prioritizing cross-file issues. This catches subtle integration bugs that single-file analysis would miss.