Chain-of-Thought and Reasoning Prompts
Language models predict tokens left-to-right. Chain-of-thought prompting exploits this by forcing the model to generate reasoning tokens before the answer — giving it more compute to work with and dramatically improving accuracy on multi-step problems.
Language models are token prediction machines. They generate one token at a time, left to right. Each token is conditioned on every token that came before it.
This architecture has a direct implication for how you should prompt them on complex tasks: the model can only be as good as the reasoning that precedes the answer. If you ask for an answer immediately — "What is the best architecture for this system?" — the model has no reasoning tokens to condition on. It jumps directly to an answer shaped by its training priors, not by deliberate analysis of your specific problem.
Chain-of-thought prompting changes this. By instructing the model to reason step by step before answering, you force it to generate intermediate conclusions that then condition the final answer. The answer is better because the reasoning that preceded it was better.
The Mechanism
The standard formulation is simple: append "Let's think step by step" to your prompt, or instruct the model to "reason through this before answering."
What this triggers: instead of predicting the answer token directly, the model first generates a chain of intermediate tokens that constitute reasoning. "First, we need to consider... The key constraint here is... Given these factors, the most logical conclusion is... Therefore the answer is..."
The final answer token is now conditioned on a reasoning chain rather than jumping from question directly to answer. For simple questions, this makes no difference. For complex multi-step problems, the accuracy improvement can be dramatic.
Standard CoT
The baseline pattern: add a reasoning instruction to your prompt.
Analyze whether this business plan is viable.
Think step by step. First, identify the core assumptions. Then, evaluate whether each assumption is realistic. Then, identify the three biggest risks. Finally, give your overall assessment.
This explicit step list is more reliable than "think step by step" alone because it tells the model which reasoning steps matter. The model's default step decomposition may not match your analytical framework. Explicit steps anchor the reasoning to your domain.
Self-Consistency CoT
Self-consistency extends standard CoT by generating multiple independent reasoning paths and taking a majority vote.
The approach: run the same problem through 5–10 separate completions with high temperature (0.7–1.0). Each run produces a different reasoning chain. Collect the final answers across all runs. The most frequently occurring answer is your output.
This pattern is expensive (5–10x the token cost) but valuable for high-stakes decisions where you cannot afford a single-path reasoning error. It works because independent reasoning chains are unlikely to share the same systematic error.
The Scratchpad Pattern
The scratchpad pattern separates visible reasoning from the final output. It is the production version of chain-of-thought for user-facing applications.
Implementation:
You are [role]. [Context]. [Task].
Use <thinking> tags to reason through the problem before responding. Your thinking is private. After closing the </thinking> tag, provide only the final answer — no meta-commentary, no "Based on my analysis," just the answer.
What this achieves: the model reasons fully in the scratchpad, but the end user sees only a clean, direct answer. The reasoning quality is preserved without the verbose intermediate steps polluting the output.
The scratchpad pattern is essential for production deployments where you want reasoning-quality answers but user-experience-quality presentation. Extended thinking models (like Claude 3.7 Sonnet with extended thinking enabled) implement a version of this natively.
When CoT Helps and When It Does Not
Chain-of-thought improves accuracy on tasks that require:
- Multi-step arithmetic or logic
- Reasoning under constraints (scheduling, resource allocation, legal analysis)
- Comparative evaluation with multiple criteria
- Causal reasoning ("why did X happen?")
- Complex classification where categories have overlapping features
Chain-of-thought adds cost without benefit on:
- Simple factual lookups
- Single-step classification
- Format transformation tasks (JSON to CSV, markdown to plain text)
- Creative tasks where the intermediate reasoning is not the bottleneck
The operator test: can you decompose your task into 3+ steps that each require a conclusion? If yes, CoT helps. If the task is a single lookup or transformation, standard prompting is sufficient.
Prompt Templates for CoT
Three templates you can drop into any context:
Standard CoT:
"[Task]. Think step by step. Show your reasoning before giving the final answer."
Structured CoT:
"[Task]. Work through this in order: 1) [first reasoning step], 2) [second reasoning step], 3) [third reasoning step]. Then state your conclusion."
Scratchpad CoT:
"[Task]. Use
<thinking>tags to reason privately. After</thinking>, provide only the final answer with no preamble."
Save these as templates in your prompt library. They apply to 80% of complex reasoning tasks without modification.
Lesson 51 Drill
Take a decision or analysis you have struggled to get good AI output on. It should be multi-step — at least three intermediate conclusions before you can reach the final answer.
Apply structured CoT: write out the exact reasoning steps the model should follow before answering. Run it. Compare the output to previous zero-shot attempts on the same problem.
Document: did the reasoning chain match how you would approach the problem manually? Where did the model's reasoning deviate? What constraint or context injection would fix the deviation?
Bottom Line
Chain-of-thought prompting works because language models can only be as good as the tokens that precede the answer. Forcing reasoning tokens before the answer token gives the model more compute to apply to the problem. Standard CoT handles most cases. Self-consistency handles high-stakes ambiguous problems. The scratchpad pattern handles production deployments where clean output matters.
On multi-step problems, CoT is not optional — it is the difference between a first-draft answer and a reasoned one. The next lesson covers eight reusable patterns that cover the full range of production prompting scenarios.