CCAF Foundations — Study Guide

Official exam guide · Version 0.1 · Feb 2025 · 720/1000 to pass

D1: Agentic Architecture 27% D2: Tool Design & MCP 18% D3: Claude Code 20% D4: Prompting 20% D5: Context & Reliability 15%
TS 1.1 Design and implement agentic loops for autonomous task execution
Knowledge
  • Loop lifecycle: send request → inspect stop_reason → execute tools → return results for next iteration
  • "tool_use" means continue; "end_turn" means terminate
  • Tool results are appended to conversation history so the model can reason about its next action
  • Model-driven decision-making (Claude reasons which tool to call next) vs pre-configured decision trees
Skills
  • Implement loop control: continue on "tool_use", terminate on "end_turn"
  • Append tool results to conversation context between iterations
  • Avoid anti-patterns: parsing natural language for termination, using only iteration caps, checking assistant text content as a completion indicator
TS 1.2 Orchestrate multi-agent systems with coordinator-subagent patterns
Knowledge
  • Hub-and-spoke: coordinator manages all inter-subagent communication, error handling, information routing
  • Subagents have isolated context — they do NOT automatically inherit coordinator's conversation history
  • Coordinator role: task decomposition, delegation, result aggregation, selecting which subagents to invoke
  • Risk: overly narrow decomposition leads to incomplete coverage of broad research topics
Skills
  • Design coordinators that dynamically select subagents based on query complexity (not always the full pipeline)
  • Partition research scope across subagents to minimize duplication (assign distinct subtopics or source types)
  • Implement iterative refinement loops: coordinator evaluates output for gaps → re-delegates → re-synthesizes
  • Route all subagent communication through coordinator for observability and consistent error handling
TS 1.3 Configure subagent invocation, context passing, and spawning
Knowledge
  • Task tool is the mechanism for spawning subagents; allowedTools must include "Task"
  • Subagent context must be explicitly provided in the prompt — no automatic inheritance or shared memory
  • AgentDefinition config: descriptions, system prompts, tool restrictions per subagent type
  • Fork-based session management for exploring divergent approaches from a shared baseline
Skills
  • Include complete prior findings directly in the subagent's prompt (pass web search results, document analysis outputs)
  • Use structured data formats to separate content from metadata (source URLs, document names, page numbers) when passing context
  • Spawn parallel subagents by emitting multiple Task calls in a single coordinator response
  • Design coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions
TS 1.4 Implement multi-step workflows with enforcement and handoff patterns
Knowledge
  • Programmatic enforcement (hooks, prerequisite gates) vs prompt-based guidance — prompt instructions have non-zero failure rate
  • Deterministic compliance required (e.g., identity verification before financial operations) → must use programmatic enforcement
  • Structured handoff protocols for mid-process escalation: customer details, root cause analysis, recommended actions
Skills
  • Implement programmatic prerequisites that block downstream tool calls (e.g., block process_refund until get_customer returns a verified ID)
  • Decompose multi-concern customer requests into distinct items, investigate each in parallel using shared context, then synthesize a unified response
  • Compile structured handoff summaries (customer ID, root cause, refund amount, recommended action) when escalating to human agents who lack conversation access
TS 1.5 Apply Agent SDK hooks for tool call interception and data normalization
Knowledge
  • PostToolUse hooks intercept tool results for transformation before the model processes them
  • Hook patterns intercept outgoing tool calls to enforce compliance (e.g., blocking refunds above a threshold)
  • Hooks = deterministic guarantees; prompt instructions = probabilistic compliance
Skills
  • Implement PostToolUse hooks to normalize heterogeneous data formats (Unix timestamps → ISO 8601, numeric status codes → strings)
  • Implement tool call interception hooks that block policy-violating actions (e.g., refunds > $500) and redirect to alternative workflows (human escalation)
  • Choose hooks over prompt-based enforcement when business rules require guaranteed compliance
TS 1.6 Design task decomposition strategies for complex workflows
Knowledge
  • Fixed sequential pipelines (prompt chaining) vs dynamic adaptive decomposition based on intermediate findings
  • Prompt chaining pattern: analyze each file individually → then run a cross-file integration pass
  • Adaptive investigation plans generate subtasks based on what is discovered at each step
Skills
  • Select prompt chaining for predictable multi-aspect reviews; dynamic decomposition for open-ended investigation tasks
  • Split large code reviews into per-file local analysis passes + separate cross-file integration pass to avoid attention dilution
  • Decompose open-ended tasks by first mapping structure, identifying high-impact areas, then creating a prioritized adaptive plan
TS 1.7 Manage session state, resumption, and forking
Knowledge
  • Named session resumption using --resume <session-name> to continue a specific prior conversation
  • fork_session for creating independent branches from a shared analysis baseline to explore divergent approaches
  • Must inform agent about changed files when resuming after code modifications
  • Starting fresh with structured summary is more reliable than resuming with stale tool results
Skills
  • Use --resume with session names to continue named investigation sessions across work sessions
  • Use fork_session to create parallel exploration branches (comparing testing strategies, refactoring approaches)
  • Choose session resumption when prior context is mostly valid; start fresh with injected summaries when prior tool results are stale
  • Inform resumed session about specific file changes for targeted re-analysis rather than requiring full re-exploration
TS 2.1 Design effective tool interfaces with clear descriptions and boundaries
Knowledge
  • Tool descriptions are the primary mechanism LLMs use for tool selection — minimal descriptions → unreliable selection among similar tools
  • Good descriptions include: input formats, example queries, edge cases, and boundary explanations
  • Ambiguous/overlapping descriptions cause misrouting (e.g., analyze_content vs analyze_document with near-identical descriptions)
  • System prompt wording can create unintended tool associations (keyword-sensitive instructions)
Skills
  • Write tool descriptions that clearly differentiate each tool's purpose, expected inputs, outputs, and when to use it vs similar alternatives
  • Rename tools and update descriptions to eliminate functional overlap (e.g., rename analyze_content to extract_web_results with a web-specific description)
  • Split generic tools into purpose-specific tools with defined input/output contracts
  • Review system prompts for keyword-sensitive instructions that might override well-written tool descriptions
TS 2.2 Implement structured error responses for MCP tools
Knowledge
  • MCP isError flag pattern for communicating tool failures back to the agent
  • Error categories: transient (timeouts), validation (invalid input), business (policy violations), permission errors
  • Uniform generic errors ("Operation failed") prevent the agent from making appropriate recovery decisions
  • Retryable vs non-retryable errors — structured metadata prevents wasted retry attempts
Skills
  • Return structured error metadata: errorCategory (transient/validation/permission), isRetryable boolean, human-readable descriptions
  • Include retriable: false flags and customer-friendly explanations for business rule violations so the agent can communicate appropriately
  • Implement local error recovery within subagents for transient failures; propagate to coordinator only errors that cannot be resolved locally
  • Distinguish access failures (needing retry decisions) from valid empty results (successful queries with no matches)
TS 2.3 Distribute tools appropriately across agents and configure tool choice
Knowledge
  • Too many tools (e.g., 18 instead of 4–5) degrades tool selection reliability by increasing decision complexity
  • Agents with tools outside their specialization tend to misuse them (synthesis agent attempting web searches)
  • Scoped tool access: give each agent only the tools needed for its role
  • tool_choice options: "auto", "any" (must call a tool), forced {"type": "tool", "name": "..."}
Skills
  • Restrict each subagent's tool set to those relevant to its role, preventing cross-specialization misuse
  • Replace generic tools with constrained alternatives (e.g., replace fetch_url with load_document that validates document URLs)
  • Provide scoped cross-role tools for high-frequency needs (e.g., a verify_fact tool for the synthesis agent) while routing complex cases through coordinator
  • Use tool_choice: "any" to guarantee the model calls a tool rather than returning conversational text
  • Use forced tool selection to ensure a specific tool is called first (e.g., force extract_metadata before enrichment tools)
TS 2.4 Integrate MCP servers into Claude Code and agent workflows
Knowledge
  • MCP server scoping: project-level (.mcp.json) for shared team tooling vs user-level (~/.claude.json) for personal/experimental
  • Environment variable expansion in .mcp.json (e.g., ${GITHUB_TOKEN}) for credential management without committing secrets
  • Tools from all configured MCP servers are discovered at connection time and available simultaneously
  • MCP resources expose content catalogs (issue summaries, documentation hierarchies, database schemas) to reduce exploratory tool calls
Skills
  • Configure shared MCP servers in project-scoped .mcp.json with environment variable expansion for authentication tokens
  • Configure personal/experimental MCP servers in user-scoped ~/.claude.json
  • Enhance MCP tool descriptions to explain capabilities in detail, preventing the agent from preferring built-in tools (like Grep) over more capable MCP tools
  • Choose existing community MCP servers over custom implementations for standard integrations (Jira); reserve custom servers for team-specific workflows
  • Expose content catalogs as MCP resources to give agents visibility into available data without requiring exploratory tool calls
TS 2.5 Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively
Knowledge
  • Grep: search file contents for patterns (function names, error messages, import statements)
  • Glob: file path pattern matching (finding files by name or extension)
  • Read/Write: full file operations; Edit: targeted modifications using unique text matching
  • When Edit fails due to non-unique text matches → use Read + Write as a reliable fallback
Skills
  • Select Grep for searching code content across a codebase (finding all callers of a function, locating error messages)
  • Select Glob for finding files matching naming patterns (e.g., **/*.test.tsx)
  • Use Read to load full file contents then Write when Edit cannot find unique anchor text
  • Build codebase understanding incrementally: Grep to find entry points → Read to follow imports and trace flows (rather than reading all files upfront)
  • Trace function usage across wrapper modules: first identify all exported names, then search each name across the codebase
TS 3.1 Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization
Knowledge
  • Hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), directory-level (subdirectory CLAUDE.md files)
  • User-level settings apply only to that user — NOT shared with teammates via version control
  • @import syntax for referencing external files to keep CLAUDE.md modular
  • .claude/rules/ directory for topic-specific rule files as an alternative to a monolithic CLAUDE.md
Skills
  • Diagnose configuration hierarchy issues (e.g., new team member not receiving instructions because they're in user-level vs project-level configuration)
  • Use @import to selectively include relevant standards files in each package's CLAUDE.md
  • Split large CLAUDE.md files into focused topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md, deployment.md)
  • Use /memory command to verify which memory files are loaded and diagnose inconsistent behavior across sessions
TS 3.2 Create and configure custom slash commands and skills
Knowledge
  • Project-scoped commands in .claude/commands/ (version-controlled, team-wide) vs user-scoped in ~/.claude/commands/ (personal)
  • Skills in .claude/skills/ with SKILL.md files; frontmatter options: context: fork, allowed-tools, argument-hint
  • context: fork runs the skill in an isolated sub-agent context, preventing output from polluting the main conversation
  • Personal skill customization: create personal variants in ~/.claude/skills/ with different names to avoid affecting teammates
Skills
  • Create project-scoped slash commands in .claude/commands/ for team-wide availability via version control
  • Use context: fork to isolate skills that produce verbose output (e.g., codebase analysis, brainstorming alternatives) from the main session
  • Configure allowed-tools in skill frontmatter to restrict tool access during execution (e.g., limiting to file write operations)
  • Use argument-hint frontmatter to prompt developers for required parameters when invoking the skill without arguments
  • Distinguish: skills (on-demand, task-specific workflows) vs CLAUDE.md (always-loaded universal standards)
TS 3.3 Apply path-specific rules for conditional convention loading
Knowledge
  • .claude/rules/ files with YAML frontmatter paths fields containing glob patterns for conditional rule activation
  • Path-scoped rules load only when editing matching files, reducing irrelevant context and token usage
  • Key advantage over directory-level CLAUDE.md: handles conventions that span multiple directories (e.g., test files spread throughout codebase)
Skills
  • Create .claude/rules/ files with YAML frontmatter path scoping (e.g., paths: ["terraform/**/*"]) so rules load only when editing matching files
  • Use glob patterns to apply conventions to files by type regardless of directory location (e.g., **/*.test.tsx for all test files)
  • Choose path-specific rules over subdirectory CLAUDE.md when conventions must apply to files spread across many directories
TS 3.4 Determine when to use plan mode vs direct execution
Knowledge
  • Plan mode: complex tasks involving large-scale changes, multiple valid approaches, architectural decisions, multi-file modifications
  • Direct execution: simple, well-scoped changes (single-file bug fix with clear stack trace, adding a date validation conditional)
  • Plan mode enables safe codebase exploration and design before committing to changes, preventing costly rework
  • Explore subagent isolates verbose discovery output and returns summaries to preserve main conversation context
Skills
  • Select plan mode for: microservice restructuring, library migrations affecting 45+ files, choosing between integration approaches with different infrastructure requirements
  • Select direct execution for: single-file bug fix with clear stack trace, adding a date validation conditional
  • Use the Explore subagent for verbose discovery phases to prevent context window exhaustion during multi-phase tasks
  • Combine plan mode for investigation with direct execution for implementation (plan the library migration, then execute the planned approach)
TS 3.5 Apply iterative refinement techniques for progressive improvement
Knowledge
  • Concrete input/output examples are the most effective way to communicate expected transformations when prose descriptions produce inconsistent results
  • Test-driven iteration: write test suites first, then iterate by sharing test failures to guide progressive improvement
  • Interview pattern: have Claude ask questions to surface considerations the developer may not have anticipated before implementing
  • Multiple interacting issues → provide all in a single detailed message; independent issues → fix sequentially
Skills
  • Provide 2–3 concrete input/output examples to clarify transformation requirements when natural language descriptions produce inconsistent results
  • Write test suites covering expected behavior, edge cases, and performance requirements before implementation; then iterate by sharing test failures
  • Use the interview pattern to surface design considerations (cache invalidation strategies, failure modes) before implementing in unfamiliar domains
  • Provide specific test cases with example input and expected output to fix edge case handling (e.g., null values in migration scripts)
TS 3.6 Integrate Claude Code into CI/CD pipelines
Knowledge
  • The -p (or --print) flag for running Claude Code in non-interactive mode in automated pipelines
  • --output-format json and --json-schema CLI flags for enforcing structured output in CI contexts
  • CLAUDE.md is the mechanism for providing project context (testing standards, fixture conventions, review criteria) to CI-invoked Claude Code
  • Session context isolation: the same Claude session that generated code is less effective at reviewing its own changes
Skills
  • Run Claude Code in CI with the -p flag to prevent interactive input hangs
  • Use --output-format json with --json-schema to produce machine-parseable structured findings for automated posting as inline PR comments
  • Include prior review findings in context when re-running reviews after new commits; instruct Claude to report only new or still-unaddressed issues
  • Provide existing test files in context so test generation avoids suggesting duplicate scenarios
  • Document testing standards, valuable test criteria, and available fixtures in CLAUDE.md to improve test generation quality
TS 4.1 Design prompts with explicit criteria to improve precision and reduce false positives
Knowledge
  • Explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate")
  • General instructions like "be conservative" or "only report high-confidence findings" fail to improve precision vs specific categorical criteria
  • High false positive rates in any category undermine developer trust in accurate categories too
Skills
  • Write specific review criteria that define which issues to report (bugs, security) vs skip (minor style, local patterns) rather than relying on confidence-based filtering
  • Temporarily disable high false-positive categories to restore developer trust while improving prompts for those categories
  • Define explicit severity criteria with concrete code examples for each severity level to achieve consistent classification
TS 4.2 Apply few-shot prompting to improve output consistency and quality
Knowledge
  • Few-shot examples: most effective technique for consistently formatted, actionable output when detailed instructions alone produce inconsistent results
  • Few-shot examples demonstrate ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps)
  • Few-shot examples enable generalization to novel patterns rather than matching only pre-specified cases
  • Effective for reducing hallucination in extraction tasks (handling informal measurements, varied document structures)
Skills
  • Create 2–4 targeted few-shot examples for ambiguous scenarios showing reasoning for why one action was chosen over plausible alternatives
  • Include few-shot examples demonstrating specific desired output format (location, issue, severity, suggested fix) to achieve consistency
  • Provide few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization
  • Use few-shot examples demonstrating correct extraction from documents with varied formats (inline citations vs bibliographies, narrative vs structured tables)
TS 4.3 Enforce structured output using tool use and JSON schemas
Knowledge
  • tool_use with JSON schemas: most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors
  • tool_choice: "auto" (model may return text), "any" (model must call a tool but can choose which), forced (model must call a specific named tool)
  • Strict JSON schemas eliminate syntax errors but do NOT prevent semantic errors (line items that don't sum to total, values in wrong fields)
  • Schema design: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories
Skills
  • Define extraction tools with JSON schemas as input parameters and extract structured data from the tool_use response
  • Set tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown
  • Force a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps
  • Design schema fields as optional (nullable) when source documents may not contain the information — prevents the model from fabricating values
  • Add enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization
TS 4.4 Implement validation, retry, and feedback loops for extraction quality
Knowledge
  • Retry-with-error-feedback: append specific validation errors to the prompt on retry to guide the model toward correction
  • Retries are ineffective when required information is simply absent from the source document (vs format or structural errors)
  • Feedback loop design: tracking detected_pattern fields to enable systematic analysis of dismissal patterns
  • Semantic validation errors (values don't sum, wrong field placement) vs schema syntax errors (eliminated by tool use)
Skills
  • Implement follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction
  • Identify when retries will be ineffective (information exists only in an external document not provided) vs effective (format mismatches, structural output errors)
  • Add detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings
  • Design self-correction validation flows: extract calculated_total alongside stated_total to flag discrepancies; add conflict_detected booleans for inconsistent source data
TS 4.5 Design efficient batch processing strategies
Knowledge
  • Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA
  • Appropriate for: non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation)
  • Batch API does NOT support multi-turn tool calling within a single request
  • custom_id fields for correlating batch request/response pairs
Skills
  • Match API to workflow: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis
  • Calculate batch submission frequency based on SLA constraints (e.g., 4-hour submission windows to guarantee 30-hour SLA with 24-hour batch processing)
  • Handle batch failures: resubmit only failed documents (identified by custom_id) with appropriate modifications (e.g., chunking oversized documents)
  • Use prompt refinement on a sample set before batch-processing large volumes to maximize first-pass success rates and reduce resubmission costs
TS 4.6 Design multi-instance and multi-pass review architectures
Knowledge
  • Self-review limitation: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session
  • Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking
  • Multi-pass review: per-file local analysis passes + separate cross-file integration passes to avoid attention dilution and contradictory findings
Skills
  • Use a second independent Claude instance to review generated code without the generator's reasoning context
  • Split large multi-file reviews into focused per-file passes for local issues + separate integration passes for cross-file data flow analysis
  • Run verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing
TS 5.1 Manage conversation context to preserve critical information across long interactions
Knowledge
  • Progressive summarization risk: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries loses critical data
  • "Lost in the middle" effect: models reliably process information at the beginning and end of long inputs, but may omit findings from middle sections
  • Tool results accumulate in context and consume tokens disproportionately to their relevance (40+ fields per order lookup when only 5 are relevant)
  • Must pass complete conversation history in subsequent API requests to maintain conversational coherence
Skills
  • Extract transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside the summarized history
  • Trim verbose tool outputs to only relevant fields before they accumulate in context
  • Place key findings summaries at the beginning of aggregated inputs; organize detailed results with explicit section headers to mitigate position effects
  • Require subagents to include metadata (dates, source locations, methodological context) in structured outputs
  • Modify upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content when downstream agents have limited context budgets
TS 5.2 Design effective escalation and ambiguity resolution patterns
Knowledge
  • Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps (not just complex cases), inability to make meaningful progress
  • Escalate IMMEDIATELY when customer explicitly requests it; offer to resolve when the issue is within the agent's capability and customer is merely frustrated
  • Sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity
  • Multiple customer matches require clarification (additional identifiers) rather than heuristic selection
Skills
  • Add explicit escalation criteria with few-shot examples to system prompt demonstrating when to escalate vs resolve autonomously
  • Honor explicit customer requests for human agents immediately without first attempting investigation
  • Acknowledge frustration while offering resolution when the issue is within the agent's capability; escalate only if customer reiterates their preference
  • Escalate when policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching when policy only addresses own-site adjustments)
  • Instruct agent to ask for additional identifiers when tool results return multiple matches (rather than selecting based on heuristics)
TS 5.3 Implement error propagation strategies across multi-agent systems
Knowledge
  • Structured error context (failure type, attempted query, partial results, alternative approaches) enables intelligent coordinator recovery decisions
  • Access failures (timeouts needing retry) vs valid empty results (successful queries with no matches) — coordinator needs to distinguish these
  • Generic error statuses ("search unavailable") hide valuable context from the coordinator
  • Anti-patterns: silently suppressing errors (returning empty results as success) OR terminating entire workflows on single failures
Skills
  • Return structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery
  • Distinguish access failures from valid empty results in error reporting so the coordinator can make appropriate decisions
  • Have subagents implement local recovery for transient failures; only propagate errors they cannot resolve, including what was attempted and partial results
  • Structure synthesis output with coverage annotations indicating which findings are well-supported vs which topic areas have gaps due to unavailable sources
TS 5.4 Manage context effectively in large codebase exploration
Knowledge
  • Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier
  • Scratchpad files persist key findings across context boundaries
  • Subagent delegation isolates verbose exploration output while main agent coordinates high-level understanding
  • Structured state persistence for crash recovery: each agent exports state to a known location; coordinator loads manifest on resume
Skills
  • Spawn subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while main agent preserves high-level coordination
  • Have agents maintain scratchpad files recording key findings; reference them for subsequent questions to counteract context degradation
  • Summarize key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context
  • Design crash recovery using structured agent state exports (manifests) that coordinator loads on resume and injects into agent prompts
  • Use /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output
TS 5.5 Design human review workflows and confidence calibration
Knowledge
  • Aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields
  • Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns
  • Field-level confidence scores calibrated using labeled validation sets for routing review attention
  • Must validate accuracy by document type and field segment before automating high-confidence extractions
Skills
  • Implement stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection
  • Analyze accuracy by document type and field to verify consistent performance across all segments before reducing human review
  • Have models output field-level confidence scores, then calibrate review thresholds using labeled validation sets
  • Route extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity
TS 5.6 Preserve information provenance and handle uncertainty in multi-source synthesis
Knowledge
  • Source attribution is lost during summarization when findings are compressed without preserving claim-source mappings
  • Importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings
  • Conflicting statistics from credible sources: annotate conflicts with source attribution rather than arbitrarily selecting one value
  • Temporal data: require publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions
Skills
  • Require subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis
  • Structure reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations
  • Complete document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis
  • Require subagents to include publication or data collection dates in structured outputs for correct temporal interpretation
  • Render different content types appropriately in synthesis outputs (financial data as tables, news as prose, technical findings as structured lists)
In-Scope Topics

Agent SDK & Agentic Loops

  • Agentic loop implementation: control flow based on stop_reason, tool result handling, loop termination
  • Multi-agent orchestration: coordinator-subagent patterns, task decomposition, parallel subagent execution, iterative refinement
  • Subagent context management: explicit context passing, structured state persistence, crash recovery using manifests
  • Agent SDK: agent definitions, hooks (PostToolUse, tool call interception), subagent spawning via Task tool, allowedTools configuration

Tool Design & MCP

  • Tool interface design: writing effective tool descriptions, splitting vs consolidating tools, tool naming to reduce ambiguity
  • MCP tool and resource design: resources for content catalogs, tools for actions, description quality for adoption
  • MCP server configuration: project vs user scope (.mcp.json vs ~/.claude.json), environment variable expansion, multi-server simultaneous access
  • Error handling and propagation: structured error responses, transient vs business vs permission errors, local recovery before escalation
  • tool_choice configuration: "auto", "any", forced tool selection

Claude Code Configuration

  • CLAUDE.md configuration: hierarchy (user/project/directory), @import patterns, .claude/rules/ with glob patterns
  • Custom commands and skills: project vs user scope, context: fork, allowed-tools, argument-hint frontmatter
  • Plan mode vs direct execution: complexity assessment, architectural decisions, single-file changes
  • Claude Code CLI: -p flag for non-interactive mode, --output-format json, --json-schema for structured CI output
  • Session management: --resume, fork_session, named sessions, session context isolation

Prompting & Structured Output

  • Iterative refinement: input/output examples, test-driven iteration, interview pattern, sequential vs parallel issue resolution
  • Structured output via tool use: schema design, tool_choice configuration, nullable fields to prevent hallucination
  • Few-shot prompting: ambiguous scenario targeting, format consistency, false positive reduction
  • Batch processing: Message Batches API appropriateness, latency tolerance assessment, failure handling by custom_id
  • Multi-instance and multi-pass review architectures

Context & Reliability

  • Context window optimization: trimming verbose tool outputs, structured fact extraction, position-aware input ordering
  • Escalation decision-making: explicit criteria, honoring customer preferences, policy gap identification
  • Human review workflows: confidence calibration, stratified sampling, accuracy segmentation by document type and field
  • Information provenance: claim-source mappings, temporal data handling, conflict annotation, coverage gap reporting
  • /compact for reducing context usage during extended exploration sessions

Built-in Tools

  • Read/Write — full file operations
  • Edit — targeted modifications using unique text matching
  • Bash — shell command execution
  • Grep — content search within files
  • Glob — file path pattern matching
  • When Edit fails on non-unique text → use Read + Write as fallback
Out-of-Scope Topics (will NOT appear on exam)

Do not study these

  • Fine-tuning Claude models or training custom models
  • Claude API authentication, billing, or account management
  • Detailed implementation of specific programming languages or frameworks (beyond tool/schema configuration)
  • Deploying or hosting MCP servers (infrastructure, networking, container orchestration)
  • Claude's internal architecture, training process, or model weights
  • Constitutional AI, RLHF, or safety training methodologies
  • Embedding models or vector database implementation details
  • Computer use (browser automation, desktop interaction) · Vision/image analysis capabilities
  • Streaming API implementation or server-sent events · Rate limiting, quotas, or API pricing calculations
  • OAuth, API key rotation, or authentication protocol details
  • Specific cloud provider configurations (AWS, GCP, Azure)
  • Performance benchmarking or model comparison metrics
  • Prompt caching implementation details (beyond knowing it exists)
  • Token counting algorithms or tokenization specifics
Exam Preparation Recommendations
1 Build an agent with the Claude Agent SDK: Implement a complete agentic loop with tool calling, error handling, and session management. Practice spawning subagents and passing context between them.
2 Configure Claude Code for a real project: Set up CLAUDE.md with a configuration hierarchy, create path-specific rules in .claude/rules/, build custom skills with frontmatter options (context: fork, allowed-tools), and integrate at least one MCP server.
3 Design and test MCP tools: Write tool descriptions that clearly differentiate similar tools. Implement structured error responses with error categories and retryable flags. Test tool selection reliability with ambiguous requests.
4 Build a structured data extraction pipeline: Use tool_use with JSON schemas, implement validation-retry loops, design schemas with optional/nullable fields, and practice batch processing with the Message Batches API.
5 Practice prompt engineering techniques: Write few-shot examples for ambiguous scenarios. Define explicit review criteria to reduce false positives. Design multi-pass review architectures for large code reviews.
6 Study context management patterns: Practice extracting structured facts from verbose tool outputs, implementing scratchpad files for long sessions, and designing subagent delegation to manage context limits.
7 Review escalation and human-in-the-loop patterns: Understand when to escalate (policy gaps, customer requests, inability to progress) vs resolve autonomously. Practice designing human review workflows with confidence-based routing.