Prompt Engineering Patterns#

Role + task + format#

The simplest reliable structure. Provide a role, a concrete task, and explicit output constraints.

You are a senior Linux sysadmin.

Diagnose why the following systemd service fails to start and provide
an actionable fix in plain English.

Reply in this format:
**Root cause:** <one sentence>
**Fix:** <numbered steps>
**Verification:** <command to confirm it worked>

Service log:
"""
{log_output}
"""

Chain of thought (CoT)#

Ask Claude to reason before answering. Wrap reasoning in a tag to separate it from the final answer.

{problem_statement}

Think step by step before giving your final answer.
Enclose your reasoning in <thinking> tags.
After </thinking>, give only the answer — no explanation.

[!TIP] For short tasks, <thinking> adds token cost with little benefit. Use it for multi-step math, logic puzzles, code debugging, or anything where intermediate reasoning reduces errors.

Extended thinking#

Use the thinking parameter for complex problems where Claude should spend more compute reasoning privately before responding.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",           # thinking requires Opus
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000         # max tokens Claude can think privately
    },
    messages=[{
        "role": "user",
        "content": "What is 47 × 83 × 129? Show only the final answer."
    }]
)

# Content list contains ThinkingBlock + TextBlock
for block in response.content:
    if block.type == "thinking":
        print(f"[thinking: {len(block.thinking)} chars]")
    elif block.type == "text":
        print(block.text)

Output:

[thinking: 847 chars]
503,289

[!NOTE] Extended thinking is billed for thinking tokens. Set budget_tokens to balance quality vs cost. For most tasks 5,000–10,000 is sufficient; use up to 100,000 for very hard problems.

[!WARNING] Temperature must be 1 (the default) when extended thinking is enabled. Streaming is supported. Tool use and extended thinking can be combined.

XML for structured output#

Use XML tags to separate sections in both your prompt and Claude’s response. Claude follows XML structure reliably.

Analyze the code below and return your analysis in XML.

<code>
{source_code}
</code>

Return:
<analysis>
  <summary>one sentence description</summary>
  <complexity>O(?) with explanation</complexity>
  <bugs>
    <bug line="N">description</bug>
    <!-- repeat for each bug -->
  </bugs>
  <suggestions>
    <suggestion>improvement idea</suggestion>
  </suggestions>
</analysis>

Parse the response in Python:

import xml.etree.ElementTree as ET
import re

content = response.content[0].text
xml_match = re.search(r"<analysis>.*?</analysis>", content, re.DOTALL)
root = ET.fromstring(xml_match.group())

summary = root.findtext("summary")
bugs = [{"line": b.get("line"), "desc": b.text} for b in root.findall(".//bug")]
print(summary)
print(bugs)

Output:

Recursive Fibonacci with exponential time complexity.
[{'line': '3', 'desc': 'No memoization; recomputes subproblems exponentially'}]

Structured extraction (JSON)#

Extract the following fields from the invoice text below.
Output as JSON only — no prose, no markdown fences.

Fields:
- invoice_number (string)
- date (ISO-8601)
- total_amount (float)
- currency (3-letter ISO code)
- vendor_name (string)
- line_items (array of {description: string, quantity: int, unit_price: float})

If a field is missing, use null.

Invoice:
"""
{invoice_text}
"""

[!TIP] For guaranteed JSON output, use tool_choice={"type": "tool", "name": "extract"} with a tool whose schema matches your target structure. Claude will always return valid JSON matching the schema.

Classification with confidence#

Classify the support ticket below into exactly one category.

Categories:
- billing        — payment, invoice, refund
- access         — login, permissions, account
- performance    — slow, timeout, latency
- bug            — unexpected behavior, error
- feature        — new capability request
- other          — anything else

Return JSON only:
{"category": "...", "confidence": 0.0–1.0, "reason": "<one sentence>"}

Ticket:
"""
{ticket_text}
"""

Few-shot examples#

Provide 2–5 examples before the actual input. Highly effective for formatting and style consistency.

Convert each sentence to past tense.

Input: "She walks to school."
Output: "She walked to school."

Input: "They are building a house."
Output: "They were building a house."

Input: "The server processes 1,000 requests per second."
Output: "The server processed 1,000 requests per second."

Input: "{user_sentence}"
Output:

Negative constraints#

Explicit “do not” instructions often outperform positive-only instructions for controlling output format.

Summarize the article below.

Requirements:
- Maximum 3 bullet points
- Each bullet under 20 words
- Do NOT include statistics or numbers
- Do NOT start any bullet with "The"
- Do NOT use passive voice

Article:
"""
{article_text}
"""

Self-critique / reflection#

Ask Claude to evaluate and improve its own output. Useful for high-stakes outputs.

Step 1 — Draft:
Write a Python function that {task}.

Step 2 — Critique:
Review your draft for:
- Edge cases not handled
- Performance issues
- Security risks
- Missing type annotations

Step 3 — Improved version:
Rewrite the function addressing all issues found in Step 2.

Output only the final improved function. No explanation.

Constitutional / constraint checking#

Add an explicit evaluation step before returning output:

You are a code reviewer. A developer submitted the following diff.

<diff>
{diff_text}
</diff>

Before responding, evaluate against these rules:
1. No hardcoded secrets or credentials
2. All functions have type annotations
3. No `print()` statements in library code
4. Test coverage for new public functions

For each rule: PASS / FAIL / N/A with a one-line reason.
Then: overall verdict (APPROVE / REQUEST CHANGES) with 1–3 action items.

Vision — image input#

Send images as base64 or URL. Claude can reason about diagrams, screenshots, charts, and photos.

import base64
import anthropic

client = anthropic.Anthropic()

with open("diagram.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": image_data,
                }
            },
            {
                "type": "text",
                "text": "Describe the architecture shown in this diagram. List each component and its connections."
            }
        ]
    }]
)
print(response.content[0].text)

Output:

The diagram shows a three-tier web architecture:
1. Load Balancer (HAProxy) — distributes traffic across two app servers
2. App Servers (Node.js) — process requests, connect to the cache and database
3. Redis Cache — shared session store between app servers
4. PostgreSQL Primary + Replica — primary handles writes, replica handles reads

Image from URL#

{
    "type": "image",
    "source": {
        "type": "url",
        "url": "https://example.com/chart.png"
    }
}

[!NOTE] Supported media types: image/jpeg, image/png, image/gif, image/webp. Max image size: 5 MB. For PDFs use the files API (client.beta.files).

System prompt vs user message split#

Put persistent, session-wide instructions in the system parameter; keep per-request data in messages.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=(
        "You are a technical writer. "
        "Use bullet points. "
        "Be concise — no filler phrases. "
        "Target audience: senior engineers."
    ),
    messages=[{"role": "user", "content": f"Summarize this RFC:\n\n{rfc_text}"}]
)

Prompt caching#

Cache large, reused context (documents, instructions, tool definitions) to reduce latency and cost by up to 90% on cache hits. TTL is 5 minutes.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a technical support agent for Acme Corp.",
        },
        {
            "type": "text",
            "text": large_knowledge_base_text,       # 50,000 tokens of docs
            "cache_control": {"type": "ephemeral"},  # cache this block
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": previous_conversation_history,
                    "cache_control": {"type": "ephemeral"},
                },
                {"type": "text", "text": user_question}
            ]
        }
    ]
)

print(response.usage)

Output (first call — writes cache):

Usage(cache_creation_input_tokens=52000, cache_read_input_tokens=0, input_tokens=120, output_tokens=95)

Output (subsequent calls within 5 min — reads cache):

Usage(cache_creation_input_tokens=0, cache_read_input_tokens=52000, input_tokens=120, output_tokens=95)

[!TIP] Cache the longest, most stable prefix. Place cache_control on the last content block you want cached — everything before it is included in the cache. Multiple cache breakpoints are supported (up to 4).

Temperature guidance#

Task	Temperature	Notes
Structured extraction / classification	`0.0`	Maximum determinism
Code generation	`0.0–0.3`	Reproducible, correct
Summarization	`0.3–0.5`	Slight variety OK
Creative writing	`0.7–1.0`	More originality
Brainstorming (multiple options)	`1.0`	Maximum diversity
Extended thinking	`1.0`	Required — fixed

Context window management#

Model	Context window	Recommended max input
claude-opus-4-7	200K tokens	~150K (leave room for output)
claude-sonnet-4-6	200K tokens	~150K
claude-haiku-4-5	200K tokens	~150K

# Count tokens before sending
token_count = client.messages.count_tokens(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": large_text}]
)
print(token_count.input_tokens)   # e.g. 45320

Output:

[!TIP] Use /compact in Claude Code or client.messages.create with a summarization step to condense long conversations when approaching context limits.

Batch processing#

For high-volume offline workloads, use the Message Batches API to process up to 10,000 requests at 50% cost:

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": f"doc-{i}",
            "params": {
                "model": "claude-haiku-4-5",
                "max_tokens": 200,
                "messages": [{"role": "user", "content": f"Summarize: {doc}"}]
            }
        }
        for i, doc in enumerate(documents)
    ]
)

print(batch.id)                   # keep this to poll for results
print(batch.processing_status)    # "in_progress"

Output:

msgbatch_01XVn...
in_progress

g h	home
g l	Linux section
g w	Windows section
g z	z/OS section
g o	macOS section
g a	AI section
g p	Python section
g g	graph view
g t	tags index

⌘K / /	open search palette
t	cycle theme (dark → light → system)
?	toggle this panel

[ / ]	previous / next sheet in section
j / k	scroll down / up