Hands-On Preparation Exercises

This file contains 4 hands-on exercises designed to reinforce core CCA exam concepts through practical implementation. Each exercise includes an objective, detailed steps, and links to relevant domain notes.

Exercise 1: Build a Multi-Tool Agent with Escalation Logic

Objective: Practice agentic loop design, tool integration, structured error handling, and escalation patterns. By the end, you'll have a working agent that makes intelligent decisions about when to handle issues autonomously vs. escalate to humans.

Context: You're building a customer support agent that processes refund requests. The agent has access to get_customer, lookup_order, process_refund, and escalate_to_human tools. The challenge is implementing decision logic and error handling that improves first-contact resolution while maintaining safety.

Steps:

Define 3-4 MCP Tools with Detailed Descriptions
Create tool definitions for: get_customer (input: customer_name or email, returns: customer_id, account_status, history), lookup_order (input: customer_id + order_id, returns: order details, damage report, purchase date), process_refund (input: customer_id, order_id, reason, amount, returns: confirmation or error), escalate_to_human (input: reason, context, returns: ticket_id)
Include in each tool description: input formats (required vs. optional fields), example queries, edge cases, and clear boundaries (when NOT to use)
Example: "lookup_order: Requires customer_id (from get_customer) and order_id. Does NOT accept customer name. Edge case: Fails if order_id is malformed. Use only after customer identity is verified."
Document what each tool should NOT do (explicit boundaries reduce ambiguity)
Implement Agentic Loop Checking stop_reason
Create a loop that processes user messages and calls tools until stop_reason = "end_turn"
Implement each tool call with its response handling
Log all tool calls and responses to understand agent behavior
Add a max iteration limit (e.g., 10 calls) to prevent infinite loops
Example structure: while stop_reason != "end_turn" and iterations < 10: call_tool() → check_response() → continue_loop()
Add Structured Error Responses
Define error response format: {errorCategory: "validation_error" | "timeout" | "business_rule" | "unknown", isRetryable: boolean, message: string, suggestion: string}
Implement error handling in tool responses (e.g., if process_refund returns amount > $500, errorCategory = "business_rule", isRetryable = false, suggestion = "escalate to supervisor")
Test error responses: invalid customer IDs, orders not found, refund amount out of policy
Ensure the agent interprets these structured errors and decides to retry or escalate
Implement Hook for Business Rule Enforcement (Block Refunds Above Threshold)
Create a PostToolUse hook that intercepts process_refund calls
Add logic: if amount > $500, block the call and return a structured error instead of executing
Ensure that even if the agent tries to call process_refund directly, it cannot bypass the $500 limit without escalation
This demonstrates how hooks enforce deterministic business rules that prompts alone cannot guarantee
Test with Multi-Concern Messages
Create test messages with multiple concerns: "My order arrived damaged AND I want to return it for a refund AND I have questions about my account"
Verify the agent uses get_customer first to establish identity
Verify it can handle sequential concerns (damage → lookup_order → process_refund)
Test edge case: agent tries to call lookup_order before get_customer (should be blocked by hook or tool description)
Verify the agent escalates when appropriate (e.g., refund > $500, damaged item requires supervisor approval)

Domains Reinforced: - Agentic Architecture & Orchestration (agentic loops, escalation patterns, tool ordering) - Tool Design & MCP Integration (tool definitions, descriptions, boundaries) - Context Management & Reliability (error handling, structured responses)

Exercise 2: Configure Claude Code for a Team Development Workflow

Objective: Practice setting up a complete Claude Code environment with project-level configuration, path-specific rules, team commands, and MCP server integration. You'll learn how CLAUDE.md hierarchies, rules, and commands scale across a team.

Context: Your team has 3 developers working on a full-stack app with frontend (React), backend (Python Flask), and infrastructure (Terraform). Different areas have different coding conventions, dependencies, and tooling. You want Claude Code to automatically apply the right conventions in each area without developers thinking about it.

Steps:

Create Project-Level CLAUDE.md
Create /CLAUDE.md at the project root with global context: "This is a full-stack app. Frontend: React + Jest. Backend: Python 3.11 + Pytest. Infra: Terraform + AWS. Common conventions: error handling, logging, testing."
Add instructions for when to ask for clarification: "Before writing infrastructure code, ask if this is for dev/staging/prod. Before database migrations, clarify impact scope."
This file applies globally to all work in the project
Create .claude/rules/ with YAML Frontmatter Glob Patterns
Create /app/frontend/.claude/rules/frontend.yaml with frontmatter: path: "frontend/**" and rules for React (components in src/components/, tests in __tests__/, use Jest, import styles as modules)
Create /app/backend/.claude/rules/backend.yaml with frontmatter: path: "backend/**" and rules for Python (Flask conventions, tests in tests/, use pytest, type hints required)
Create /infra/.claude/rules/terraform.yaml with frontmatter: path: "infra/**" and rules for Terraform (variables in variables.tf, outputs in outputs.tf, comments required on all resources)
Test that these rules apply automatically regardless of file location (e.g., /app/frontend/hooks/useData.js matches frontend/**)
Create Project-Scoped Skill with Context and Allowed Tools
Create /app/.claude/skills/SKILL.md with a skill definition, e.g., "code-review" skill
Set context: fork_session so the skill creates a new session for isolated code analysis
Set allowed-tools: ["Read", "Write", "Bash", "Grep"] to limit what the skill can do (no deletion, no external API calls)
Document when developers should invoke this skill: "Use /code-review before opening a PR to catch issues early"
This demonstrates how to create safe, repeatable team processes
Configure MCP Server in .mcp.json with Env Var Expansion + Personal Server
Create /.mcp.json at project root with: servers: [{"type": "stdio", "command": "mcp-server-github", "env": {"GITHUB_TOKEN": "${GITHUB_TOKEN}", "GITHUB_ORG": "my-org"}}]
Set up environment variable expansion so GITHUB_TOKEN is loaded from the developer's shell environment (not hardcoded)
Create ~/.claude.json (personal config) with a personal MCP server: {"servers": [{"type": "stdio", "command": "mcp-server-personal-tools"}]}
Test that both project servers (from .mcp.json) and personal servers (from ~/.claude.json) are available to Claude
Test Plan Mode vs. Direct Execution
Run a complex task in plan mode: "Refactor the user authentication system to support OAuth 2.0" → Observe plan mode exploring the codebase, identifying files, and designing the approach
Run the same task in direct execution mode (without plan) and observe the difference in approach
Run a simple task ("Add a console.log statement to App.js") in direct execution and verify it completes quickly without planning
Note that plan mode is for architectural decisions; direct execution is for straightforward changes

Domains Reinforced: - Claude Code Configuration & Workflows (CLAUDE.md, .claude/rules/, .claude/commands/, .claude/skills/) - Claude Code Architect (hierarchies, path-specific rules, skill definition) - Tool Design & MCP Integration (MCP server configuration, environment variables)

Exercise 3: Build a Structured Data Extraction Pipeline

Objective: Practice JSON schema design, tool_use in structured output, validation-retry loops, and batch processing at scale. You'll build a pipeline that reliably extracts data from varied document formats and handles failures gracefully.

Context: Your company processes invoices from different vendors. Each invoice has a different format, but you need to extract: vendor_name, invoice_number, total_amount, line_items, tax_amount, payment_terms. Some invoices are scanned PDFs (text-only), others are digital. You need to handle extraction failures and route low-confidence extractions for human review.

Steps:

Define Extraction Tool with JSON Schema
Create a tool schema with:
- Required fields: vendor_name (string), invoice_number (string), total_amount (number)
- Optional fields: line_items (array of {description, quantity, unit_price}), tax_amount (number), payment_terms (string)
- Enum with "other": document_type: enum ["invoice", "receipt", "quote", "other"] — allows "other" when document doesn't fit standard types
- Nullable fields: po_number?: string | null (present but might be missing from document)
This schema allows flexible handling of varied document formats
Implement Validation-Retry Loop
Create a loop that sends a document to Claude with the extraction tool
First attempt: Extract data from the document using the JSON schema
Validation: Check if required fields are present and reasonable (e.g., total_amount > 0)
On failure (e.g., missing vendor_name, total_amount = 0): Send the document + failed extraction + validation error back to Claude with a retry prompt: "The extraction failed because vendor_name is missing. Re-examine the document and try again. Look for company letterhead, signatures, or 'Bill From' sections."
On success: Return the extracted data
Test with documents where vendor_name is in unusual locations (footer, watermark, small print)
Add Few-Shot Examples for Varied Document Formats
Create examples showing extraction from:
- A standard digital invoice with clear fields
- A scanned PDF with skewed text and OCR errors
- A receipt with minimal information
- An invoice with multiple line items and taxes
Each example shows: input (document text), expected output (JSON), and any special handling (e.g., "Tax was listed as '10% of subtotal,' calculate from subtotal amount")
Include edge cases: "If vendor_name is in a logo, use nearby text; if no clear vendor found, use 'other'."
These examples improve extraction accuracy across document types
Design Batch Processing with Message Batches API
Create a batch of 100 invoices to process
Use custom_id for tracking: invoice_12345 → maps to invoice ID
Implement batch request chunking: If a document is oversized (> 100KB), split into smaller chunks before batching
Calculate processing SLA: Batch API processes within 24 hours, costs 50% less, so suitable for overnight batch processing
Implement failure handling by custom_id: If custom_id invoice_12345 fails (validation error), retry that specific invoice with additional context
Example: Process 100 invoices in one batch, track failures by custom_id, queue failed invoices for retry with augmented prompts
Implement Human Review Routing with Field-Level Confidence Scores
Add a confidence scoring step after extraction: For each field, score 0.0-1.0 based on how clear the document was
Example: {vendor_name: 0.95, invoice_number: 0.9, total_amount: 0.85, tax_amount: 0.3}
Route to human review if:
- Any field has confidence < 0.7 (unclear extraction)
- Validation warning but not error (e.g., vendor_name found but in unusual location)
- Document type = "other" (doesn't fit standard format)
Store human review feedback as examples for future improvements
This creates a feedback loop where hard cases improve the system over time

Domains Reinforced: - Prompt Engineering & Structured Output (JSON schema, structured output, tool_use) - Context Management & Reliability (validation, error handling, retry logic) - Claude Code Configuration & Workflows (batch processing, API selection)

Exercise 4: Design and Debug a Multi-Agent Research Pipeline

Objective: Practice orchestrating subagents with proper context passing, error propagation, and provenance tracking. You'll build a research system where a coordinator delegates work to subagents and synthesizes conflicting information.

Context: You're building a research system to answer "What are the latest developments in AI safety?" A coordinator breaks this into subagent tasks, gathers results, and synthesizes findings. Challenges: subagents might return contradictory data, some subagents timeout, and the synthesis must track sources.

Steps:

Build Coordinator with 2+ Subagents (AgentDefinition + Task Tool)
Create a coordinator AgentDefinition that can call the Task tool to spawn subagents
Define 2-3 subagents:
- Subagent 1: "Find recent academic papers on AI safety alignment"
- Subagent 2: "Find industry reports and policy statements on AI safety"
- (Optional) Subagent 3: "Find news articles on recent AI safety incidents"
Set allowedTools in coordinator to include Task (to call subagents) and Read/Grep (to synthesize)
Create coordinator prompt: "You are coordinating research on AI safety. Use the Task tool to ask subagents for specific information. Their allowedTools are limited to web search and document reading. Synthesize their findings into a coherent report."
Document what each subagent CAN do: "Subagent 1 can: search papers, read PDFs, cite sources. Subagent 1 cannot: run code, access databases."
Implement Parallel Subagent Execution
In the coordinator's first response, make multiple Task tool calls in one turn (not sequential)
Example: Task: "Search for papers on AI alignment published in the last 6 months. Return results with title, author, date, URL." Task: "Find policy statements from OpenAI, Anthropic, and DeepMind on AI safety. Return title, date, key points." Task: "Search news for AI safety incidents in the last 3 months. Return headline, date, summary."
This is more efficient than waiting for each subagent to complete before starting the next
Verify all Task calls execute in parallel (not sequentially) by checking timestamps
Design Structured Subagent Output
Define the format for subagent responses: json { "claim": "String describing the finding", "evidence": ["Supporting fact 1", "Supporting fact 2"], "source_url": "URL or citation", "publication_date": "ISO 8601 date", "confidence": 0.8 }
This structure makes synthesis easier (evidence is explicitly separated from claims)
Include in subagent prompts: "Format each finding as: claim, evidence, source URL, date. Confidence 0.0-1.0."
Test with varied outputs: Some subagents return full structured data, others return partial data (no URL), others return prose that needs parsing
Implement Error Propagation
Simulate a subagent timeout: Subagent 2 times out after 30 seconds trying to fetch a policy document
Expect structured error: {errorType: "timeout", attemptedQuery: "OpenAI policy on AI safety", partialResults: [found 1 policy from Anthropic], suggestedAlternatives: ["search Anthropic and DeepMind only", "search in smaller time increments"]}
Verify the coordinator receives this error context and decides to:
- Proceed with partial results (Anthropic + DeepMind, skip OpenAI)
- Retry Subagent 2 with narrower scope
- Escalate and note "OpenAI policy not available"
Test with multiple subagent failures and verify coordinator can work with partial results
Test with Conflicting Source Data
Create test case: Subagent 1 finds paper saying "Alignment approach X is infeasible," Subagent 2 finds industry statement saying "Approach X is promising"
Verify synthesis preserves both findings with attribution: "Academia argues X is infeasible (cite: Paper 1), while industry reports X is promising (cite: Statement 1). This discrepancy reflects different evaluation methodologies."
Do NOT let the coordinator pick one side; ensure both perspectives are preserved
Test with 3+ conflicting sources and verify the synthesis presents them clearly rather than averaging or contradicting

Domains Reinforced: - Agentic Architecture & Orchestration (coordinator design, subagent orchestration, task decomposition, error propagation) - Context Management & Reliability (context passing, error handling, partial results) - Claude Agent SDK (Task tool, AgentDefinition, allowedTools, stop_reason)

How to Use These Exercises

Do them in order: Exercises 1-4 progress from single-agent patterns → team configuration → data extraction → multi-agent orchestration
Time allocation: 1-2 hours per exercise
Test thoroughly: Include test cases that should fail (to verify error handling) and test cases that should succeed
Document your setup: Take screenshots or notes of your final configurations; these are good interview preparation materials
Relate to real work: After each exercise, think about how you'd apply it to a real project you've worked on

Hands-On Preparation Exercises

Exercise 1: Build a Multi-Tool Agent with Escalation Logic

Steps:

Exercise 2: Configure Claude Code for a Team Development Workflow

Steps:

Exercise 3: Build a Structured Data Extraction Pipeline

Steps:

Exercise 4: Design and Debug a Multi-Agent Research Pipeline

Steps:

How to Use These Exercises

Conclusion