Prompt Engineering Patterns#
Role + task + format#
The simplest reliable structure. Provide a role, a concrete task, and explicit output constraints.
You are a senior Linux sysadmin.
Diagnose why the following systemd service fails to start and provide
an actionable fix in plain English.
Reply in this format:
**Root cause:** <one sentence>
**Fix:** <numbered steps>
**Verification:** <command to confirm it worked>
Service log:
"""
{log_output}
"""
Chain of thought (CoT)#
Ask Claude to reason before answering. Wrap reasoning in a tag to separate it from the final answer.
{problem_statement}
Think step by step before giving your final answer.
Enclose your reasoning in <thinking> tags.
After </thinking>, give only the answer β no explanation.
[!TIP] For short tasks,
<thinking>adds token cost with little benefit. Use it for multi-step math, logic puzzles, code debugging, or anything where intermediate reasoning reduces errors.
Extended thinking#
Use the thinking parameter for complex problems where Claude should spend more compute reasoning privately before responding.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7", # thinking requires Opus
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # max tokens Claude can think privately
},
messages=[{
"role": "user",
"content": "What is 47 Γ 83 Γ 129? Show only the final answer."
}]
)
# Content list contains ThinkingBlock + TextBlock
for block in response.content:
if block.type == "thinking":
print(f"[thinking: {len(block.thinking)} chars]")
elif block.type == "text":
print(block.text)
Output:
[thinking: 847 chars]
503,289
[!NOTE] Extended thinking is billed for thinking tokens. Set
budget_tokensto balance quality vs cost. For most tasks 5,000β10,000 is sufficient; use up to 100,000 for very hard problems.
[!WARNING] Temperature must be 1 (the default) when extended thinking is enabled. Streaming is supported. Tool use and extended thinking can be combined.
XML for structured output#
Use XML tags to separate sections in both your prompt and Claudeβs response. Claude follows XML structure reliably.
Analyze the code below and return your analysis in XML.
<code>
{source_code}
</code>
Return:
<analysis>
<summary>one sentence description</summary>
<complexity>O(?) with explanation</complexity>
<bugs>
<bug line="N">description</bug>
<!-- repeat for each bug -->
</bugs>
<suggestions>
<suggestion>improvement idea</suggestion>
</suggestions>
</analysis>
Parse the response in Python:
import xml.etree.ElementTree as ET
import re
content = response.content[0].text
xml_match = re.search(r"<analysis>.*?</analysis>", content, re.DOTALL)
root = ET.fromstring(xml_match.group())
summary = root.findtext("summary")
bugs = [{"line": b.get("line"), "desc": b.text} for b in root.findall(".//bug")]
print(summary)
print(bugs)
Output:
Recursive Fibonacci with exponential time complexity.
[{'line': '3', 'desc': 'No memoization; recomputes subproblems exponentially'}]
Structured extraction (JSON)#
Extract the following fields from the invoice text below.
Output as JSON only β no prose, no markdown fences.
Fields:
- invoice_number (string)
- date (ISO-8601)
- total_amount (float)
- currency (3-letter ISO code)
- vendor_name (string)
- line_items (array of {description: string, quantity: int, unit_price: float})
If a field is missing, use null.
Invoice:
"""
{invoice_text}
"""
[!TIP] For guaranteed JSON output, use
tool_choice={"type": "tool", "name": "extract"}with a tool whose schema matches your target structure. Claude will always return valid JSON matching the schema.
Classification with confidence#
Classify the support ticket below into exactly one category.
Categories:
- billing β payment, invoice, refund
- access β login, permissions, account
- performance β slow, timeout, latency
- bug β unexpected behavior, error
- feature β new capability request
- other β anything else
Return JSON only:
{"category": "...", "confidence": 0.0β1.0, "reason": "<one sentence>"}
Ticket:
"""
{ticket_text}
"""
Few-shot examples#
Provide 2β5 examples before the actual input. Highly effective for formatting and style consistency.
Convert each sentence to past tense.
Input: "She walks to school."
Output: "She walked to school."
Input: "They are building a house."
Output: "They were building a house."
Input: "The server processes 1,000 requests per second."
Output: "The server processed 1,000 requests per second."
Input: "{user_sentence}"
Output:
Negative constraints#
Explicit βdo notβ instructions often outperform positive-only instructions for controlling output format.
Summarize the article below.
Requirements:
- Maximum 3 bullet points
- Each bullet under 20 words
- Do NOT include statistics or numbers
- Do NOT start any bullet with "The"
- Do NOT use passive voice
Article:
"""
{article_text}
"""
Self-critique / reflection#
Ask Claude to evaluate and improve its own output. Useful for high-stakes outputs.
Step 1 β Draft:
Write a Python function that {task}.
Step 2 β Critique:
Review your draft for:
- Edge cases not handled
- Performance issues
- Security risks
- Missing type annotations
Step 3 β Improved version:
Rewrite the function addressing all issues found in Step 2.
Output only the final improved function. No explanation.
Constitutional / constraint checking#
Add an explicit evaluation step before returning output:
You are a code reviewer. A developer submitted the following diff.
<diff>
{diff_text}
</diff>
Before responding, evaluate against these rules:
1. No hardcoded secrets or credentials
2. All functions have type annotations
3. No `print()` statements in library code
4. Test coverage for new public functions
For each rule: PASS / FAIL / N/A with a one-line reason.
Then: overall verdict (APPROVE / REQUEST CHANGES) with 1β3 action items.
Vision β image input#
Send images as base64 or URL. Claude can reason about diagrams, screenshots, charts, and photos.
import base64
import anthropic
client = anthropic.Anthropic()
with open("diagram.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data,
}
},
{
"type": "text",
"text": "Describe the architecture shown in this diagram. List each component and its connections."
}
]
}]
)
print(response.content[0].text)
Output:
The diagram shows a three-tier web architecture:
1. Load Balancer (HAProxy) β distributes traffic across two app servers
2. App Servers (Node.js) β process requests, connect to the cache and database
3. Redis Cache β shared session store between app servers
4. PostgreSQL Primary + Replica β primary handles writes, replica handles reads
Image from URL#
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/chart.png"
}
}
[!NOTE] Supported media types:
image/jpeg,image/png,image/gif,image/webp. Max image size: 5 MB. For PDFs use the files API (client.beta.files).
System prompt vs user message split#
Put persistent, session-wide instructions in the system parameter; keep per-request data in messages.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system=(
"You are a technical writer. "
"Use bullet points. "
"Be concise β no filler phrases. "
"Target audience: senior engineers."
),
messages=[{"role": "user", "content": f"Summarize this RFC:\n\n{rfc_text}"}]
)
Prompt caching#
Cache large, reused context (documents, instructions, tool definitions) to reduce latency and cost by up to 90% on cache hits. TTL is 5 minutes.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a technical support agent for Acme Corp.",
},
{
"type": "text",
"text": large_knowledge_base_text, # 50,000 tokens of docs
"cache_control": {"type": "ephemeral"}, # cache this block
}
],
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": previous_conversation_history,
"cache_control": {"type": "ephemeral"},
},
{"type": "text", "text": user_question}
]
}
]
)
print(response.usage)
Output (first call β writes cache):
Usage(cache_creation_input_tokens=52000, cache_read_input_tokens=0, input_tokens=120, output_tokens=95)
Output (subsequent calls within 5 min β reads cache):
Usage(cache_creation_input_tokens=0, cache_read_input_tokens=52000, input_tokens=120, output_tokens=95)
[!TIP] Cache the longest, most stable prefix. Place
cache_controlon the last content block you want cached β everything before it is included in the cache. Multiple cache breakpoints are supported (up to 4).
Temperature guidance#
| Task | Temperature | Notes |
|---|---|---|
| Structured extraction / classification | 0.0 | Maximum determinism |
| Code generation | 0.0β0.3 | Reproducible, correct |
| Summarization | 0.3β0.5 | Slight variety OK |
| Creative writing | 0.7β1.0 | More originality |
| Brainstorming (multiple options) | 1.0 | Maximum diversity |
| Extended thinking | 1.0 | Required β fixed |
Context window management#
| Model | Context window | Recommended max input |
|---|---|---|
| claude-opus-4-7 | 200K tokens | ~150K (leave room for output) |
| claude-sonnet-4-6 | 200K tokens | ~150K |
| claude-haiku-4-5 | 200K tokens | ~150K |
# Count tokens before sending
token_count = client.messages.count_tokens(
model="claude-opus-4-7",
messages=[{"role": "user", "content": large_text}]
)
print(token_count.input_tokens) # e.g. 45320
Output:
45320
[!TIP] Use
/compactin Claude Code orclient.messages.createwith a summarization step to condense long conversations when approaching context limits.
Batch processing#
For high-volume offline workloads, use the Message Batches API to process up to 10,000 requests at 50% cost:
batch = client.messages.batches.create(
requests=[
{
"custom_id": f"doc-{i}",
"params": {
"model": "claude-haiku-4-5",
"max_tokens": 200,
"messages": [{"role": "user", "content": f"Summarize: {doc}"}]
}
}
for i, doc in enumerate(documents)
]
)
print(batch.id) # keep this to poll for results
print(batch.processing_status) # "in_progress"
Output:
msgbatch_01XVn...
in_progress