CCAF Cheat Sheet

Agentic Loop (1.1)

DO

Correct loop control

Continue when stop_reason = "tool_use"
Terminate when stop_reason = "end_turn"
Append tool results to conversation history each iteration

DON'T

Anti-patterns

Parse natural language signals to detect completion
Use arbitrary iteration caps as primary stop mechanism
Check for assistant text content as completion indicator

Multi-Agent Orchestration (1.2 + 1.3)

Hub-and-spoke pattern

Coordinator manages ALL inter-subagent communication and error handling
Subagents have isolated context — they do not inherit coordinator history
Pass complete findings explicitly in each subagent's prompt
Use Task tool to spawn subagents; allowedTools must include "Task"
Spawn parallel subagents by emitting multiple Task calls in a single coordinator response (not across separate turns)
Coordinator prompts: specify goals + quality criteria, not step-by-step procedures

Risk

Narrow decomposition

If coordinator assigns subtasks that don't cover the full topic, downstream agents succeed but output is incomplete. Design coordinator to map the full domain before assigning subtasks.

Session

Forking & resumption

--resume <name> continues a named session
fork_session branches from a shared baseline
If tool results are stale, start fresh with injected summary
When resuming, tell the agent which files changed

Enforcement vs Prompt-Based (1.4 + 1.5)

Programmatic (hooks)

Deterministic. Use when compliance is non-negotiable: identity verification, financial thresholds, policy gates. Zero failure rate.

Prompt-based

Probabilistic. Non-zero failure rate. OK for guidance and suggestions. Never rely on this for critical business logic.

SDK Hooks (`PostToolUse`)

Intercept tool results to normalize formats before model sees them (Unix timestamps, ISO 8601, numeric codes)
Intercept outgoing tool calls to enforce rules (block refunds > $500, redirect to escalation)

Task Decomposition (1.6)

Prompt chaining

Fixed sequential steps. Best for predictable multi-aspect reviews. Example: analyze each file, then run a cross-file integration pass.

Dynamic decomposition

Generate subtasks based on what is discovered at each step. Best for open-ended investigation. Map structure first, then prioritize adaptively.

Tool Descriptions (2.1)

What a good tool description includes

Purpose (specific, not generic)
Expected input formats and example queries
Edge cases and what it does NOT handle
When to use it vs similar alternatives

DON'T

Overlapping descriptions

Two tools with near-identical descriptions cause misrouting. Fix: rename + rewrite to eliminate overlap. System prompt wording can also create unintended tool associations.

DO

Split generic tools

Replace one vague tool with 2-3 purpose-specific tools with defined input/output contracts. E.g., analyze_document → extract_data_points, summarize_content, verify_claim.

Structured Error Responses (2.2)

Error response must include:

errorCategory: transient / validation / permission / business
isRetryable: boolean
Human-readable description
For business violations: retriable: false + customer-friendly message

Access failure (timeout)

Needs retry logic. Propagate with structured context so coordinator can decide.

Valid empty result

Successful query, no matches. Do NOT treat as an error. Distinguish clearly.

Tool Distribution (2.3)

Too many tools = worse performance

18 tools vs 4-5 tools — decision complexity degrades reliability. Give each agent only the tools its role needs. Agents with out-of-scope tools will misuse them.

`tool_choice` options

"auto" — model may return text or call a tool
"any" — model must call some tool (prevents conversational text)
{"type":"tool","name":"..."} — forces a specific named tool first

MCP Server Configuration (2.4)

Project-level (`.mcp.json`)

Shared via version control. Use for shared team tooling. Use ${ENV_VAR} expansion for auth tokens — never commit secrets.

User-level (`~/.claude.json`)

Personal/experimental only. Not shared with teammates. Both levels can be active simultaneously.

MCP Resources

Expose content catalogs (issue summaries, DB schemas, doc hierarchies) as resources to reduce exploratory tool calls. Use existing community MCP servers for standard integrations (e.g., Jira) — build custom only for team-specific workflows.

Built-in Tools (2.5)

Grep vs Glob

Grep: search file contents (function names, imports, error strings)
Glob: find files by name/extension pattern (**/*.test.tsx)

Edit vs Read+Write

Edit: targeted modification using unique text match
If Edit fails (non-unique match), fall back to Read then Write

CLAUDE.md Hierarchy (3.1)

Three levels

~/.claude/CLAUDE.md — user-level: personal only, not version-controlled
.claude/CLAUDE.md or root CLAUDE.md — project-level: shared via git
Subdirectory CLAUDE.md — directory-level: scoped to that folder

`@import` syntax

Reference external files to keep CLAUDE.md modular. Import only the standards relevant to each package.

`.claude/rules/`

Topic-specific rule files (e.g., testing.md, api-conventions.md). Use YAML frontmatter with paths: glob patterns for conditional loading only when editing matching files.

Common mistake

User-level config doesn't share to teammates

If a new team member doesn't receive instructions, check whether they live in ~/.claude/CLAUDE.md instead of project-level. Use /memory to verify which files are loaded in a session.

Custom Commands & Skills (3.2)

Commands

.claude/commands/ — project-scoped, version-controlled, team-wide
~/.claude/commands/ — personal only, not shared

Skills (`.claude/skills/`)

context: fork — runs in isolated sub-agent context, prevents output polluting main session
allowed-tools — restricts tools during skill execution
argument-hint — prompts for required params if invoked without arguments

Skills

On-demand invocation for task-specific workflows (e.g., codebase analysis, brainstorming).

CLAUDE.md

Always-loaded universal standards. Rules that must apply to every single session.

Path-Specific Rules (3.3)

Use `.claude/rules/` with glob patterns when:

Conventions span multiple directories (test files spread throughout codebase)
You need automatic loading without manual invocation
Rules apply to file types regardless of location (**/*.test.tsx)

Preferred over subdirectory CLAUDE.md files for cross-directory conventions.

Plan Mode vs Direct Execution (3.4)

Plan Mode when:

Large-scale changes across many files
Multiple valid approaches exist
Architectural decisions required
Monolith-to-microservices, library migrations (45+ files)
You need to explore before committing

Direct Execution when:

Simple, well-scoped single-file changes
Clear stack trace pointing to one fix
Adding a single validation or conditional
Scope and approach are already clear

CI/CD Integration (3.6)

Key CLI flags

-p / --print — non-interactive mode (prevents pipeline hangs waiting for input)
--output-format json + --json-schema — machine-parseable structured output for inline PR comments

Session isolation for code review

The same session that generated code is less effective at reviewing it (retains reasoning context, less likely to question its own decisions). Use an independent review instance without the generator's reasoning context.

Prompt Precision (4.1)

Specific criteria (works)

"Flag comments only when claimed behavior contradicts actual code behavior"

Vague instructions (don't work)

"Be conservative" / "Only report high-confidence findings" — no meaningful improvement in precision

False positive effect

High false-positive categories undermine trust in accurate ones. Temporarily disable noisy categories to restore developer trust while improving those prompts separately. Define explicit severity criteria with concrete code examples per severity level.

Few-Shot Prompting (4.2)

When few-shot is the most effective technique

Detailed instructions alone produce inconsistent output
Ambiguous scenarios where judgment must be demonstrated
Reducing hallucination in extraction tasks with varied formats
Showing model what to accept vs flag (false positive reduction)
Demonstrating correct handling of varied document structures

Structure of good few-shot examples

2-4 targeted examples covering ambiguous cases
Show reasoning: why this choice over plausible alternatives
Demonstrate desired output format (location, issue, severity, suggested fix)
Include varied document structures to enable generalization to novel patterns

Structured Output via Tool Use (4.3)

Key rules

tool_use + JSON schema = most reliable structured output (eliminates syntax errors)
Strict schemas prevent syntax errors but not semantic errors (line items not summing, values placed in wrong fields)
Make fields optional/nullable when source may not contain that data — prevents fabrication to satisfy required fields
Use "other" + detail field pattern for extensible enum categories
Add "unclear" enum value for genuinely ambiguous cases

Validation + Retry Loops (4.4)

Retry works for:

Format mismatches
Structural output errors
Schema compliance failures

Include on retry: original doc + failed extraction + specific validation error.

Retry won't fix:

Information absent from the source document
Data only in an external doc not provided

Detect these cases and route to human review. Don't loop indefinitely.

Key trap

Semantic error detection

Extract calculated_total alongside stated_total and flag mismatches for human review. Never silently reconcile financial data or pick the "most likely correct" value automatically.

Batch Processing (4.5)

Use Message Batches API for:

Overnight reports, weekly audits
Nightly test generation
Any latency-tolerant workload

50% cost savings. Up to 24h processing window. No guaranteed latency SLA.

Do NOT use batch for:

Blocking pre-merge checks (devs wait for results)
Any workflow requiring fast response
Multi-turn tool calling (not supported)

Batch SLA formula

Worst-case total = max wait time + 24h batch window. Must be under SLA. Example: 30h SLA → submit every 4h (4+24=28h, 2h buffer). 6h submission windows leave zero buffer.

Multi-Instance Review (4.6)

Independent review beats self-review

A model retains reasoning context from generation — less likely to question its own decisions. An independent Claude instance (without prior context) catches more subtle issues than self-review instructions or extended thinking. Split large reviews into per-file local passes + cross-file integration pass to avoid attention dilution.

Context Window Management (5.1)

Lost in the middle

Models reliably process info at the beginning and end of long inputs. Middle sections get attention gaps. Put key findings summaries first. Use explicit section headers for detailed middle content.

Structured fact extraction

Extract transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt — outside summarized history. Don't let these get condensed.

Trim verbose tool outputs

Tool results accumulate and consume tokens disproportionately to their relevance. Keep only relevant fields (e.g., 5 return-relevant fields from an order lookup with 40+ fields).

Key trap

Long multi-issue sessions

Progressive summarization for resolved earlier issues — compress into narrative summaries, keep full history only for the active issue. Simpler and more reliable than sidecar context layers or sliding windows.

Escalation Decision-Making (5.2)

Escalate when:

Customer explicitly requests a human — honor immediately, no investigation first
Policy is ambiguous or silent on the specific request (not just "complex")
Unable to make meaningful progress on the case

Key trap

Frustration without explicit escalation request

Acknowledge frustration + offer to resolve. Only escalate if the customer reiterates their preference for a human. Never take unilateral action (e.g., process the refund anyway) without honoring their stated preference first.

Unreliable escalation triggers — don't use:

Sentiment analysis / frustration detection (doesn't correlate with case complexity)
Self-reported LLM confidence scores (poorly calibrated on hard cases)
"Case complexity" heuristics without explicit criteria

Multiple customer matches

When tool returns multiple matches: ask for additional identifiers. Never select based on heuristics (e.g., "most recent" or "most likely").

Error Propagation in Multi-Agent Systems (5.3)

Structured error context to coordinator must include:

Failure type
What was attempted (query, parameters)
Any partial results obtained
Potential alternative approaches

Anti-pattern: silent suppression

Returning empty results as "success" prevents recovery and risks incomplete research outputs. Never mask errors this way.

Anti-pattern: full workflow termination

Terminating the entire workflow on one subagent failure. Handle locally when possible; propagate with context when not. Proceed with partial results.

Large Codebase Context (5.4)

Scratchpad files

Persist key findings across context boundaries. Agents reference scratchpad for subsequent questions to counteract context degradation (model referencing "typical patterns" instead of actual classes found).

Crash recovery pattern

Each agent exports state to a known location. Coordinator loads a manifest on resume and injects state into agent prompts. Don't require full re-exploration.

Key trap

Coordinator spawning subagents for simple follow-ups

If the coordinator already has findings in its context, do not spawn a subagent to re-ingest 80K tokens for a simple summarization request. Subagent spawning is for complex analysis — not follow-up questions on already-retrieved data.

`/compact`

Use to reduce context usage during extended exploration sessions when context fills with verbose discovery output. Combine with Explore subagent to isolate verbose discovery phases.

Human Review Workflows (5.5)

Don't trust aggregate accuracy metrics

97% overall accuracy can mask poor performance on specific document types or fields. Validate accuracy by document type and field segment before reducing human review. Use stratified random sampling to detect novel error patterns.

Confidence-based routing

Output field-level confidence scores per extraction
Calibrate review thresholds using labeled validation sets
Route low-confidence or conflicting-source extractions to human review
Prioritize limited reviewer capacity toward highest-impact cases

Information Provenance (5.6)

Claim-source mapping requirements

Subagents must output structured claim-source mappings (URL, doc name, excerpt)
Synthesis agents must preserve attribution through the synthesis step
Conflicting statistics from credible sources: annotate with source attribution, don't arbitrarily pick one
Always include publication/collection dates to prevent temporal differences being misread as contradictions

Golden rules — answer selection shortcuts

✓

Programmatic enforcement beats prompt instructions for anything with financial or safety consequences.

✓

Tool descriptions are the first lever for tool selection problems — before few-shot, before routing layers.

✓

Few-shot beats instruction refinement when output format or judgment is inconsistent.

✓

Explicit criteria beats vague modifiers ("be conservative") for precision/false positive problems.

✓

Batch API only for latency-tolerant workloads. Never for blocking workflows.

✓

Independent review instance beats self-review instructions or extended thinking for catching subtle bugs.

✓

Plan mode for architectural/multi-file changes. Direct execution for scoped, clear changes.

✓

Honor explicit customer escalation immediately — no investigation first, no "let me try to help."

✓

Structured error context (failure type + partial results + alternatives) over generic status strings.

✓

Project-level = version-controlled = shared. User-level = personal, invisible to teammates.

✓

Subagents need explicit context. They don't inherit coordinator history automatically.

✓

Nullable fields prevent hallucination when source documents may not contain required information.

Common distractors to reject

✗

Sentiment analysis or LLM confidence scores for escalation routing — poorly calibrated.

✗

Routing classifiers / keyword parsing layers in front of tools — over-engineered when descriptions are the actual fix.

✗

Arbitrary iteration caps as the primary loop-stopping mechanism.

✗

Larger context windows to fix attention quality in large reviews — split into focused passes instead.

✗

Generic error status ("search unavailable") — hides recovery context from the coordinator.

✗

Consensus-required flagging across multiple passes — suppresses real bugs that only appear intermittently.

✗

Consolidating tools to fix misrouting — separate them with better descriptions first.

✗

User-level CLAUDE.md for team-wide instructions — teammates don't get them.

✗

Batch API for pre-merge checks — no latency SLA, blocks developers from merging.

✗

Catching all timeouts and returning empty results as success — prevents coordinator from recovering.

Exam logistics

Format

Multiple choice, 4 options, 1 correct answer
4 scenarios drawn randomly from 6
No penalty for guessing (answer everything)

Scoring

Scaled score: 100–1000
Minimum passing score: 720
Pass/Fail designation only

CCAF Foundations — Cheat Sheet

Correct loop control

Anti-patterns

Hub-and-spoke pattern

Narrow decomposition

Forking & resumption

SDK Hooks (PostToolUse)

Prompt chaining

Dynamic decomposition

What a good tool description includes

Overlapping descriptions

Split generic tools

Error response must include:

Too many tools = worse performance

tool_choice options

Project-level (.mcp.json)

User-level (~/.claude.json)

MCP Resources

Grep vs Glob

Edit vs Read+Write

Three levels

@import syntax

.claude/rules/

User-level config doesn't share to teammates

Commands

Skills (.claude/skills/)

Use .claude/rules/ with glob patterns when:

Plan Mode when:

Direct Execution when:

Key CLI flags

Session isolation for code review

False positive effect

When few-shot is the most effective technique

Structure of good few-shot examples

Key rules

Retry works for:

Retry won't fix:

Semantic error detection

Use Message Batches API for:

Do NOT use batch for:

Batch SLA formula

Independent review beats self-review

Lost in the middle

Structured fact extraction

Trim verbose tool outputs

Long multi-issue sessions

Escalate when:

Frustration without explicit escalation request

Unreliable escalation triggers — don't use:

Multiple customer matches

Structured error context to coordinator must include:

Anti-pattern: silent suppression

Anti-pattern: full workflow termination

Scratchpad files

Crash recovery pattern

Coordinator spawning subagents for simple follow-ups

/compact

Don't trust aggregate accuracy metrics

Confidence-based routing

Claim-source mapping requirements

Format

Scoring

SDK Hooks (`PostToolUse`)

`tool_choice` options

Project-level (`.mcp.json`)

User-level (`~/.claude.json`)

`@import` syntax

`.claude/rules/`

Skills (`.claude/skills/`)

Use `.claude/rules/` with glob patterns when:

`/compact`