AI & Machine Learning

Hands-On Preparation Exercises

lex@lexgaines.com · · 13 min read
Four hands-on labs mapping to CCA exam domains: build a multi-tool agent, configure Claude Code for a team, implement a structured extraction pipeline, and design a multi-agent research system.

This file contains 4 hands-on exercises designed to reinforce core CCA exam concepts through practical implementation. Each exercise includes an objective, detailed steps, and links to relevant domain notes.


Exercise 1: Build a Multi-Tool Agent with Escalation Logic

Objective: Practice agentic loop design, tool integration, structured error handling, and escalation patterns. By the end, you'll have a working agent that makes intelligent decisions about when to handle issues autonomously vs. escalate to humans.

Context: You're building a customer support agent that processes refund requests. The agent has access to get_customer, lookup_order, process_refund, and escalate_to_human tools. The challenge is implementing decision logic and error handling that improves first-contact resolution while maintaining safety.

Steps:

  1. Define 3-4 MCP Tools with Detailed Descriptions
  2. Create tool definitions for: get_customer (input: customer_name or email, returns: customer_id, account_status, history), lookup_order (input: customer_id + order_id, returns: order details, damage report, purchase date), process_refund (input: customer_id, order_id, reason, amount, returns: confirmation or error), escalate_to_human (input: reason, context, returns: ticket_id)
  3. Include in each tool description: input formats (required vs. optional fields), example queries, edge cases, and clear boundaries (when NOT to use)
  4. Example: "lookup_order: Requires customer_id (from get_customer) and order_id. Does NOT accept customer name. Edge case: Fails if order_id is malformed. Use only after customer identity is verified."
  5. Document what each tool should NOT do (explicit boundaries reduce ambiguity)

  6. Implement Agentic Loop Checking stop_reason

  7. Create a loop that processes user messages and calls tools until stop_reason = "end_turn"
  8. Implement each tool call with its response handling
  9. Log all tool calls and responses to understand agent behavior
  10. Add a max iteration limit (e.g., 10 calls) to prevent infinite loops
  11. Example structure: while stop_reason != "end_turn" and iterations < 10: call_tool() → check_response() → continue_loop()

  12. Add Structured Error Responses

  13. Define error response format: {errorCategory: "validation_error" | "timeout" | "business_rule" | "unknown", isRetryable: boolean, message: string, suggestion: string}
  14. Implement error handling in tool responses (e.g., if process_refund returns amount > $500, errorCategory = "business_rule", isRetryable = false, suggestion = "escalate to supervisor")
  15. Test error responses: invalid customer IDs, orders not found, refund amount out of policy
  16. Ensure the agent interprets these structured errors and decides to retry or escalate

  17. Implement Hook for Business Rule Enforcement (Block Refunds Above Threshold)

  18. Create a PostToolUse hook that intercepts process_refund calls
  19. Add logic: if amount > $500, block the call and return a structured error instead of executing
  20. Ensure that even if the agent tries to call process_refund directly, it cannot bypass the $500 limit without escalation
  21. This demonstrates how hooks enforce deterministic business rules that prompts alone cannot guarantee

  22. Test with Multi-Concern Messages

  23. Create test messages with multiple concerns: "My order arrived damaged AND I want to return it for a refund AND I have questions about my account"
  24. Verify the agent uses get_customer first to establish identity
  25. Verify it can handle sequential concerns (damage → lookup_order → process_refund)
  26. Test edge case: agent tries to call lookup_order before get_customer (should be blocked by hook or tool description)
  27. Verify the agent escalates when appropriate (e.g., refund > $500, damaged item requires supervisor approval)

Domains Reinforced: - Agentic Architecture & Orchestration (agentic loops, escalation patterns, tool ordering) - Tool Design & MCP Integration (tool definitions, descriptions, boundaries) - Context Management & Reliability (error handling, structured responses)


Exercise 2: Configure Claude Code for a Team Development Workflow

Objective: Practice setting up a complete Claude Code environment with project-level configuration, path-specific rules, team commands, and MCP server integration. You'll learn how CLAUDE.md hierarchies, rules, and commands scale across a team.

Context: Your team has 3 developers working on a full-stack app with frontend (React), backend (Python Flask), and infrastructure (Terraform). Different areas have different coding conventions, dependencies, and tooling. You want Claude Code to automatically apply the right conventions in each area without developers thinking about it.

Steps:

  1. Create Project-Level CLAUDE.md
  2. Create /CLAUDE.md at the project root with global context: "This is a full-stack app. Frontend: React + Jest. Backend: Python 3.11 + Pytest. Infra: Terraform + AWS. Common conventions: error handling, logging, testing."
  3. Add instructions for when to ask for clarification: "Before writing infrastructure code, ask if this is for dev/staging/prod. Before database migrations, clarify impact scope."
  4. This file applies globally to all work in the project

  5. Create .claude/rules/ with YAML Frontmatter Glob Patterns

  6. Create /app/frontend/.claude/rules/frontend.yaml with frontmatter: path: "frontend/**" and rules for React (components in src/components/, tests in __tests__/, use Jest, import styles as modules)
  7. Create /app/backend/.claude/rules/backend.yaml with frontmatter: path: "backend/**" and rules for Python (Flask conventions, tests in tests/, use pytest, type hints required)
  8. Create /infra/.claude/rules/terraform.yaml with frontmatter: path: "infra/**" and rules for Terraform (variables in variables.tf, outputs in outputs.tf, comments required on all resources)
  9. Test that these rules apply automatically regardless of file location (e.g., /app/frontend/hooks/useData.js matches frontend/**)

  10. Create Project-Scoped Skill with Context and Allowed Tools

  11. Create /app/.claude/skills/SKILL.md with a skill definition, e.g., "code-review" skill
  12. Set context: fork_session so the skill creates a new session for isolated code analysis
  13. Set allowed-tools: ["Read", "Write", "Bash", "Grep"] to limit what the skill can do (no deletion, no external API calls)
  14. Document when developers should invoke this skill: "Use /code-review before opening a PR to catch issues early"
  15. This demonstrates how to create safe, repeatable team processes

  16. Configure MCP Server in .mcp.json with Env Var Expansion + Personal Server

  17. Create /.mcp.json at project root with: servers: [{"type": "stdio", "command": "mcp-server-github", "env": {"GITHUB_TOKEN": "${GITHUB_TOKEN}", "GITHUB_ORG": "my-org"}}]
  18. Set up environment variable expansion so GITHUB_TOKEN is loaded from the developer's shell environment (not hardcoded)
  19. Create ~/.claude.json (personal config) with a personal MCP server: {"servers": [{"type": "stdio", "command": "mcp-server-personal-tools"}]}
  20. Test that both project servers (from .mcp.json) and personal servers (from ~/.claude.json) are available to Claude

  21. Test Plan Mode vs. Direct Execution

  22. Run a complex task in plan mode: "Refactor the user authentication system to support OAuth 2.0" → Observe plan mode exploring the codebase, identifying files, and designing the approach
  23. Run the same task in direct execution mode (without plan) and observe the difference in approach
  24. Run a simple task ("Add a console.log statement to App.js") in direct execution and verify it completes quickly without planning
  25. Note that plan mode is for architectural decisions; direct execution is for straightforward changes

Domains Reinforced: - Claude Code Configuration & Workflows (CLAUDE.md, .claude/rules/, .claude/commands/, .claude/skills/) - Claude Code Architect (hierarchies, path-specific rules, skill definition) - Tool Design & MCP Integration (MCP server configuration, environment variables)


Exercise 3: Build a Structured Data Extraction Pipeline

Objective: Practice JSON schema design, tool_use in structured output, validation-retry loops, and batch processing at scale. You'll build a pipeline that reliably extracts data from varied document formats and handles failures gracefully.

Context: Your company processes invoices from different vendors. Each invoice has a different format, but you need to extract: vendor_name, invoice_number, total_amount, line_items, tax_amount, payment_terms. Some invoices are scanned PDFs (text-only), others are digital. You need to handle extraction failures and route low-confidence extractions for human review.

Steps:

  1. Define Extraction Tool with JSON Schema
  2. Create a tool schema with:
    • Required fields: vendor_name (string), invoice_number (string), total_amount (number)
    • Optional fields: line_items (array of {description, quantity, unit_price}), tax_amount (number), payment_terms (string)
    • Enum with "other": document_type: enum ["invoice", "receipt", "quote", "other"] — allows "other" when document doesn't fit standard types
    • Nullable fields: po_number?: string | null (present but might be missing from document)
  3. This schema allows flexible handling of varied document formats

  4. Implement Validation-Retry Loop

  5. Create a loop that sends a document to Claude with the extraction tool
  6. First attempt: Extract data from the document using the JSON schema
  7. Validation: Check if required fields are present and reasonable (e.g., total_amount > 0)
  8. On failure (e.g., missing vendor_name, total_amount = 0): Send the document + failed extraction + validation error back to Claude with a retry prompt: "The extraction failed because vendor_name is missing. Re-examine the document and try again. Look for company letterhead, signatures, or 'Bill From' sections."
  9. On success: Return the extracted data
  10. Test with documents where vendor_name is in unusual locations (footer, watermark, small print)

  11. Add Few-Shot Examples for Varied Document Formats

  12. Create examples showing extraction from:
    • A standard digital invoice with clear fields
    • A scanned PDF with skewed text and OCR errors
    • A receipt with minimal information
    • An invoice with multiple line items and taxes
  13. Each example shows: input (document text), expected output (JSON), and any special handling (e.g., "Tax was listed as '10% of subtotal,' calculate from subtotal amount")
  14. Include edge cases: "If vendor_name is in a logo, use nearby text; if no clear vendor found, use 'other'."
  15. These examples improve extraction accuracy across document types

  16. Design Batch Processing with Message Batches API

  17. Create a batch of 100 invoices to process
  18. Use custom_id for tracking: invoice_12345 → maps to invoice ID
  19. Implement batch request chunking: If a document is oversized (> 100KB), split into smaller chunks before batching
  20. Calculate processing SLA: Batch API processes within 24 hours, costs 50% less, so suitable for overnight batch processing
  21. Implement failure handling by custom_id: If custom_id invoice_12345 fails (validation error), retry that specific invoice with additional context
  22. Example: Process 100 invoices in one batch, track failures by custom_id, queue failed invoices for retry with augmented prompts

  23. Implement Human Review Routing with Field-Level Confidence Scores

  24. Add a confidence scoring step after extraction: For each field, score 0.0-1.0 based on how clear the document was
  25. Example: {vendor_name: 0.95, invoice_number: 0.9, total_amount: 0.85, tax_amount: 0.3}
  26. Route to human review if:
    • Any field has confidence < 0.7 (unclear extraction)
    • Validation warning but not error (e.g., vendor_name found but in unusual location)
    • Document type = "other" (doesn't fit standard format)
  27. Store human review feedback as examples for future improvements
  28. This creates a feedback loop where hard cases improve the system over time

Domains Reinforced: - Prompt Engineering & Structured Output (JSON schema, structured output, tool_use) - Context Management & Reliability (validation, error handling, retry logic) - Claude Code Configuration & Workflows (batch processing, API selection)


Exercise 4: Design and Debug a Multi-Agent Research Pipeline

Objective: Practice orchestrating subagents with proper context passing, error propagation, and provenance tracking. You'll build a research system where a coordinator delegates work to subagents and synthesizes conflicting information.

Context: You're building a research system to answer "What are the latest developments in AI safety?" A coordinator breaks this into subagent tasks, gathers results, and synthesizes findings. Challenges: subagents might return contradictory data, some subagents timeout, and the synthesis must track sources.

Steps:

  1. Build Coordinator with 2+ Subagents (AgentDefinition + Task Tool)
  2. Create a coordinator AgentDefinition that can call the Task tool to spawn subagents
  3. Define 2-3 subagents:
    • Subagent 1: "Find recent academic papers on AI safety alignment"
    • Subagent 2: "Find industry reports and policy statements on AI safety"
    • (Optional) Subagent 3: "Find news articles on recent AI safety incidents"
  4. Set allowedTools in coordinator to include Task (to call subagents) and Read/Grep (to synthesize)
  5. Create coordinator prompt: "You are coordinating research on AI safety. Use the Task tool to ask subagents for specific information. Their allowedTools are limited to web search and document reading. Synthesize their findings into a coherent report."
  6. Document what each subagent CAN do: "Subagent 1 can: search papers, read PDFs, cite sources. Subagent 1 cannot: run code, access databases."

  7. Implement Parallel Subagent Execution

  8. In the coordinator's first response, make multiple Task tool calls in one turn (not sequential)
  9. Example: Task: "Search for papers on AI alignment published in the last 6 months. Return results with title, author, date, URL." Task: "Find policy statements from OpenAI, Anthropic, and DeepMind on AI safety. Return title, date, key points." Task: "Search news for AI safety incidents in the last 3 months. Return headline, date, summary."
  10. This is more efficient than waiting for each subagent to complete before starting the next
  11. Verify all Task calls execute in parallel (not sequentially) by checking timestamps

  12. Design Structured Subagent Output

  13. Define the format for subagent responses: json { "claim": "String describing the finding", "evidence": ["Supporting fact 1", "Supporting fact 2"], "source_url": "URL or citation", "publication_date": "ISO 8601 date", "confidence": 0.8 }
  14. This structure makes synthesis easier (evidence is explicitly separated from claims)
  15. Include in subagent prompts: "Format each finding as: claim, evidence, source URL, date. Confidence 0.0-1.0."
  16. Test with varied outputs: Some subagents return full structured data, others return partial data (no URL), others return prose that needs parsing

  17. Implement Error Propagation

  18. Simulate a subagent timeout: Subagent 2 times out after 30 seconds trying to fetch a policy document
  19. Expect structured error: {errorType: "timeout", attemptedQuery: "OpenAI policy on AI safety", partialResults: [found 1 policy from Anthropic], suggestedAlternatives: ["search Anthropic and DeepMind only", "search in smaller time increments"]}
  20. Verify the coordinator receives this error context and decides to:
    • Proceed with partial results (Anthropic + DeepMind, skip OpenAI)
    • Retry Subagent 2 with narrower scope
    • Escalate and note "OpenAI policy not available"
  21. Test with multiple subagent failures and verify coordinator can work with partial results

  22. Test with Conflicting Source Data

  23. Create test case: Subagent 1 finds paper saying "Alignment approach X is infeasible," Subagent 2 finds industry statement saying "Approach X is promising"
  24. Verify synthesis preserves both findings with attribution: "Academia argues X is infeasible (cite: Paper 1), while industry reports X is promising (cite: Statement 1). This discrepancy reflects different evaluation methodologies."
  25. Do NOT let the coordinator pick one side; ensure both perspectives are preserved
  26. Test with 3+ conflicting sources and verify the synthesis presents them clearly rather than averaging or contradicting

Domains Reinforced: - Agentic Architecture & Orchestration (coordinator design, subagent orchestration, task decomposition, error propagation) - Context Management & Reliability (context passing, error handling, partial results) - Claude Agent SDK (Task tool, AgentDefinition, allowedTools, stop_reason)


How to Use These Exercises

  1. Do them in order: Exercises 1-4 progress from single-agent patterns → team configuration → data extraction → multi-agent orchestration
  2. Time allocation: 1-2 hours per exercise
  3. Test thoroughly: Include test cases that should fail (to verify error handling) and test cases that should succeed
  4. Document your setup: Take screenshots or notes of your final configurations; these are good interview preparation materials
  5. Relate to real work: After each exercise, think about how you'd apply it to a real project you've worked on

Conclusion

None

CCA exercises hands-on exam prep labs