CCAF Foundations — Cheat Sheet

5 domains · Scenario-based MCQ · 720/1000 to pass

D1: Agentic Architecture 27% D2: Tool Design & MCP 18% D3: Claude Code 20% D4: Prompting 20% D5: Context & Reliability 15%
Agentic Loop (1.1)
DO

Correct loop control

  • Continue when stop_reason = "tool_use"
  • Terminate when stop_reason = "end_turn"
  • Append tool results to conversation history each iteration
DON'T

Anti-patterns

  • Parse natural language signals to detect completion
  • Use arbitrary iteration caps as primary stop mechanism
  • Check for assistant text content as completion indicator
Multi-Agent Orchestration (1.2 + 1.3)

Hub-and-spoke pattern

  • Coordinator manages ALL inter-subagent communication and error handling
  • Subagents have isolated context — they do not inherit coordinator history
  • Pass complete findings explicitly in each subagent's prompt
  • Use Task tool to spawn subagents; allowedTools must include "Task"
  • Spawn parallel subagents by emitting multiple Task calls in a single coordinator response (not across separate turns)
  • Coordinator prompts: specify goals + quality criteria, not step-by-step procedures
Risk

Narrow decomposition

If coordinator assigns subtasks that don't cover the full topic, downstream agents succeed but output is incomplete. Design coordinator to map the full domain before assigning subtasks.

Session

Forking & resumption

  • --resume <name> continues a named session
  • fork_session branches from a shared baseline
  • If tool results are stale, start fresh with injected summary
  • When resuming, tell the agent which files changed
Enforcement vs Prompt-Based (1.4 + 1.5)
Programmatic (hooks)

Deterministic. Use when compliance is non-negotiable: identity verification, financial thresholds, policy gates. Zero failure rate.

Prompt-based

Probabilistic. Non-zero failure rate. OK for guidance and suggestions. Never rely on this for critical business logic.

SDK Hooks (PostToolUse)

  • Intercept tool results to normalize formats before model sees them (Unix timestamps, ISO 8601, numeric codes)
  • Intercept outgoing tool calls to enforce rules (block refunds > $500, redirect to escalation)
Task Decomposition (1.6)

Prompt chaining

Fixed sequential steps. Best for predictable multi-aspect reviews. Example: analyze each file, then run a cross-file integration pass.

Dynamic decomposition

Generate subtasks based on what is discovered at each step. Best for open-ended investigation. Map structure first, then prioritize adaptively.

Tool Descriptions (2.1)

What a good tool description includes

  • Purpose (specific, not generic)
  • Expected input formats and example queries
  • Edge cases and what it does NOT handle
  • When to use it vs similar alternatives
DON'T

Overlapping descriptions

Two tools with near-identical descriptions cause misrouting. Fix: rename + rewrite to eliminate overlap. System prompt wording can also create unintended tool associations.

DO

Split generic tools

Replace one vague tool with 2-3 purpose-specific tools with defined input/output contracts. E.g., analyze_documentextract_data_points, summarize_content, verify_claim.

Structured Error Responses (2.2)

Error response must include:

  • errorCategory: transient / validation / permission / business
  • isRetryable: boolean
  • Human-readable description
  • For business violations: retriable: false + customer-friendly message
Access failure (timeout)

Needs retry logic. Propagate with structured context so coordinator can decide.

Valid empty result

Successful query, no matches. Do NOT treat as an error. Distinguish clearly.

Tool Distribution (2.3)

Too many tools = worse performance

18 tools vs 4-5 tools — decision complexity degrades reliability. Give each agent only the tools its role needs. Agents with out-of-scope tools will misuse them.

tool_choice options

  • "auto" — model may return text or call a tool
  • "any" — model must call some tool (prevents conversational text)
  • {"type":"tool","name":"..."} — forces a specific named tool first
MCP Server Configuration (2.4)

Project-level (.mcp.json)

Shared via version control. Use for shared team tooling. Use ${ENV_VAR} expansion for auth tokens — never commit secrets.

User-level (~/.claude.json)

Personal/experimental only. Not shared with teammates. Both levels can be active simultaneously.

MCP Resources

Expose content catalogs (issue summaries, DB schemas, doc hierarchies) as resources to reduce exploratory tool calls. Use existing community MCP servers for standard integrations (e.g., Jira) — build custom only for team-specific workflows.

Built-in Tools (2.5)

Grep vs Glob

  • Grep: search file contents (function names, imports, error strings)
  • Glob: find files by name/extension pattern (**/*.test.tsx)

Edit vs Read+Write

  • Edit: targeted modification using unique text match
  • If Edit fails (non-unique match), fall back to Read then Write
CLAUDE.md Hierarchy (3.1)

Three levels

  • ~/.claude/CLAUDE.md — user-level: personal only, not version-controlled
  • .claude/CLAUDE.md or root CLAUDE.md — project-level: shared via git
  • Subdirectory CLAUDE.md — directory-level: scoped to that folder

@import syntax

Reference external files to keep CLAUDE.md modular. Import only the standards relevant to each package.

.claude/rules/

Topic-specific rule files (e.g., testing.md, api-conventions.md). Use YAML frontmatter with paths: glob patterns for conditional loading only when editing matching files.

Common mistake

User-level config doesn't share to teammates

If a new team member doesn't receive instructions, check whether they live in ~/.claude/CLAUDE.md instead of project-level. Use /memory to verify which files are loaded in a session.

Custom Commands & Skills (3.2)

Commands

  • .claude/commands/ — project-scoped, version-controlled, team-wide
  • ~/.claude/commands/ — personal only, not shared

Skills (.claude/skills/)

  • context: fork — runs in isolated sub-agent context, prevents output polluting main session
  • allowed-tools — restricts tools during skill execution
  • argument-hint — prompts for required params if invoked without arguments
Skills

On-demand invocation for task-specific workflows (e.g., codebase analysis, brainstorming).

CLAUDE.md

Always-loaded universal standards. Rules that must apply to every single session.

Path-Specific Rules (3.3)

Use .claude/rules/ with glob patterns when:

  • Conventions span multiple directories (test files spread throughout codebase)
  • You need automatic loading without manual invocation
  • Rules apply to file types regardless of location (**/*.test.tsx)

Preferred over subdirectory CLAUDE.md files for cross-directory conventions.

Plan Mode vs Direct Execution (3.4)

Plan Mode when:

  • Large-scale changes across many files
  • Multiple valid approaches exist
  • Architectural decisions required
  • Monolith-to-microservices, library migrations (45+ files)
  • You need to explore before committing

Direct Execution when:

  • Simple, well-scoped single-file changes
  • Clear stack trace pointing to one fix
  • Adding a single validation or conditional
  • Scope and approach are already clear
CI/CD Integration (3.6)

Key CLI flags

  • -p / --print — non-interactive mode (prevents pipeline hangs waiting for input)
  • --output-format json + --json-schema — machine-parseable structured output for inline PR comments

Session isolation for code review

The same session that generated code is less effective at reviewing it (retains reasoning context, less likely to question its own decisions). Use an independent review instance without the generator's reasoning context.

Prompt Precision (4.1)
Specific criteria (works)

"Flag comments only when claimed behavior contradicts actual code behavior"

Vague instructions (don't work)

"Be conservative" / "Only report high-confidence findings" — no meaningful improvement in precision

False positive effect

High false-positive categories undermine trust in accurate ones. Temporarily disable noisy categories to restore developer trust while improving those prompts separately. Define explicit severity criteria with concrete code examples per severity level.

Few-Shot Prompting (4.2)

When few-shot is the most effective technique

  • Detailed instructions alone produce inconsistent output
  • Ambiguous scenarios where judgment must be demonstrated
  • Reducing hallucination in extraction tasks with varied formats
  • Showing model what to accept vs flag (false positive reduction)
  • Demonstrating correct handling of varied document structures

Structure of good few-shot examples

  • 2-4 targeted examples covering ambiguous cases
  • Show reasoning: why this choice over plausible alternatives
  • Demonstrate desired output format (location, issue, severity, suggested fix)
  • Include varied document structures to enable generalization to novel patterns
Structured Output via Tool Use (4.3)

Key rules

  • tool_use + JSON schema = most reliable structured output (eliminates syntax errors)
  • Strict schemas prevent syntax errors but not semantic errors (line items not summing, values placed in wrong fields)
  • Make fields optional/nullable when source may not contain that data — prevents fabrication to satisfy required fields
  • Use "other" + detail field pattern for extensible enum categories
  • Add "unclear" enum value for genuinely ambiguous cases
Validation + Retry Loops (4.4)

Retry works for:

  • Format mismatches
  • Structural output errors
  • Schema compliance failures

Include on retry: original doc + failed extraction + specific validation error.

Retry won't fix:

  • Information absent from the source document
  • Data only in an external doc not provided

Detect these cases and route to human review. Don't loop indefinitely.

Key trap

Semantic error detection

Extract calculated_total alongside stated_total and flag mismatches for human review. Never silently reconcile financial data or pick the "most likely correct" value automatically.

Batch Processing (4.5)

Use Message Batches API for:

  • Overnight reports, weekly audits
  • Nightly test generation
  • Any latency-tolerant workload

50% cost savings. Up to 24h processing window. No guaranteed latency SLA.

Do NOT use batch for:

  • Blocking pre-merge checks (devs wait for results)
  • Any workflow requiring fast response
  • Multi-turn tool calling (not supported)

Batch SLA formula

Worst-case total = max wait time + 24h batch window. Must be under SLA. Example: 30h SLA → submit every 4h (4+24=28h, 2h buffer). 6h submission windows leave zero buffer.

Multi-Instance Review (4.6)

Independent review beats self-review

A model retains reasoning context from generation — less likely to question its own decisions. An independent Claude instance (without prior context) catches more subtle issues than self-review instructions or extended thinking. Split large reviews into per-file local passes + cross-file integration pass to avoid attention dilution.

Context Window Management (5.1)

Lost in the middle

Models reliably process info at the beginning and end of long inputs. Middle sections get attention gaps. Put key findings summaries first. Use explicit section headers for detailed middle content.

Structured fact extraction

Extract transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt — outside summarized history. Don't let these get condensed.

Trim verbose tool outputs

Tool results accumulate and consume tokens disproportionately to their relevance. Keep only relevant fields (e.g., 5 return-relevant fields from an order lookup with 40+ fields).

Key trap

Long multi-issue sessions

Progressive summarization for resolved earlier issues — compress into narrative summaries, keep full history only for the active issue. Simpler and more reliable than sidecar context layers or sliding windows.

Escalation Decision-Making (5.2)

Escalate when:

  • Customer explicitly requests a human — honor immediately, no investigation first
  • Policy is ambiguous or silent on the specific request (not just "complex")
  • Unable to make meaningful progress on the case
Key trap

Frustration without explicit escalation request

Acknowledge frustration + offer to resolve. Only escalate if the customer reiterates their preference for a human. Never take unilateral action (e.g., process the refund anyway) without honoring their stated preference first.

Unreliable escalation triggers — don't use:

  • Sentiment analysis / frustration detection (doesn't correlate with case complexity)
  • Self-reported LLM confidence scores (poorly calibrated on hard cases)
  • "Case complexity" heuristics without explicit criteria

Multiple customer matches

When tool returns multiple matches: ask for additional identifiers. Never select based on heuristics (e.g., "most recent" or "most likely").

Error Propagation in Multi-Agent Systems (5.3)

Structured error context to coordinator must include:

  • Failure type
  • What was attempted (query, parameters)
  • Any partial results obtained
  • Potential alternative approaches

Anti-pattern: silent suppression

Returning empty results as "success" prevents recovery and risks incomplete research outputs. Never mask errors this way.

Anti-pattern: full workflow termination

Terminating the entire workflow on one subagent failure. Handle locally when possible; propagate with context when not. Proceed with partial results.

Large Codebase Context (5.4)

Scratchpad files

Persist key findings across context boundaries. Agents reference scratchpad for subsequent questions to counteract context degradation (model referencing "typical patterns" instead of actual classes found).

Crash recovery pattern

Each agent exports state to a known location. Coordinator loads a manifest on resume and injects state into agent prompts. Don't require full re-exploration.

Key trap

Coordinator spawning subagents for simple follow-ups

If the coordinator already has findings in its context, do not spawn a subagent to re-ingest 80K tokens for a simple summarization request. Subagent spawning is for complex analysis — not follow-up questions on already-retrieved data.

/compact

Use to reduce context usage during extended exploration sessions when context fills with verbose discovery output. Combine with Explore subagent to isolate verbose discovery phases.

Human Review Workflows (5.5)

Don't trust aggregate accuracy metrics

97% overall accuracy can mask poor performance on specific document types or fields. Validate accuracy by document type and field segment before reducing human review. Use stratified random sampling to detect novel error patterns.

Confidence-based routing

  • Output field-level confidence scores per extraction
  • Calibrate review thresholds using labeled validation sets
  • Route low-confidence or conflicting-source extractions to human review
  • Prioritize limited reviewer capacity toward highest-impact cases
Information Provenance (5.6)

Claim-source mapping requirements

  • Subagents must output structured claim-source mappings (URL, doc name, excerpt)
  • Synthesis agents must preserve attribution through the synthesis step
  • Conflicting statistics from credible sources: annotate with source attribution, don't arbitrarily pick one
  • Always include publication/collection dates to prevent temporal differences being misread as contradictions
Golden rules — answer selection shortcuts
Programmatic enforcement beats prompt instructions for anything with financial or safety consequences.
Tool descriptions are the first lever for tool selection problems — before few-shot, before routing layers.
Few-shot beats instruction refinement when output format or judgment is inconsistent.
Explicit criteria beats vague modifiers ("be conservative") for precision/false positive problems.
Batch API only for latency-tolerant workloads. Never for blocking workflows.
Independent review instance beats self-review instructions or extended thinking for catching subtle bugs.
Plan mode for architectural/multi-file changes. Direct execution for scoped, clear changes.
Honor explicit customer escalation immediately — no investigation first, no "let me try to help."
Structured error context (failure type + partial results + alternatives) over generic status strings.
Project-level = version-controlled = shared. User-level = personal, invisible to teammates.
Subagents need explicit context. They don't inherit coordinator history automatically.
Nullable fields prevent hallucination when source documents may not contain required information.
Common distractors to reject
Sentiment analysis or LLM confidence scores for escalation routing — poorly calibrated.
Routing classifiers / keyword parsing layers in front of tools — over-engineered when descriptions are the actual fix.
Arbitrary iteration caps as the primary loop-stopping mechanism.
Larger context windows to fix attention quality in large reviews — split into focused passes instead.
Generic error status ("search unavailable") — hides recovery context from the coordinator.
Consensus-required flagging across multiple passes — suppresses real bugs that only appear intermittently.
Consolidating tools to fix misrouting — separate them with better descriptions first.
User-level CLAUDE.md for team-wide instructions — teammates don't get them.
Batch API for pre-merge checks — no latency SLA, blocks developers from merging.
Catching all timeouts and returning empty results as success — prevents coordinator from recovering.
Exam logistics

Format

  • Multiple choice, 4 options, 1 correct answer
  • 4 scenarios drawn randomly from 6
  • No penalty for guessing (answer everything)

Scoring

  • Scaled score: 100–1000
  • Minimum passing score: 720
  • Pass/Fail designation only