Chain-of-Thought Prompting — A Practical Guide With 6 Real Examples
Chain-of-Thought (CoT) is the single highest-ROI prompt-engineering technique. Here's exactly when to use it, when not to, and 6 real before/after examples that boost accuracy on multi-step problems.
Chain-of-Thought (CoT) prompting is the single highest-ROI move in prompt engineering. It's a one-line addition that turns a confidently-wrong answer into a correct one on multi-step problems — without changing the model or the rest of the prompt.
This guide covers: what it is, when it works, when it doesn't, and 6 real before/after examples across coding, math, decisions, and writing.
What CoT actually is
Instead of asking the model to answer directly, you instruct it to show its reasoning step by step first, then give the final answer. The reasoning tokens condition the final answer on more deliberate context — essentially making the model “think out loud” before committing.
The simplest form is the famous “Let's think step by step” suffix. The scaffolded form forces specific stages: restate the problem → list what's known → list what's unknown → state the bridge → walk the steps → sanity-check → answer. The scaffolded version is what we use in our interactive CoT template.
When CoT actually helps
- Multi-step math, word problems, unit conversions.
- Logic puzzles, deductive reasoning, classification with edge cases.
- Code debugging — “trace through this function” before suggesting a fix.
- Decision analysis — comparing options against multiple criteria.
- Anywhere the model would otherwise jump to a confident but wrong answer.
When CoT hurts (or wastes tokens)
- Simple factual lookups — reasoning adds noise, not signal.
- Creative writing — explicit reasoning can flatten voice.
- Long-context tasks where token budget is tight.
- Tasks the model already does correctly without it.
6 before/after examples
1. Math word problem
Before:
A bakery sells 3 muffins for $5 and 5 cookies for $4. If I have $30 and want a 2:1 muffin-to-cookie ratio by count, how many of each can I buy? Maximize total items.
After — wrap with CoT scaffolding:
Solve step by step. Do not jump to the answer. 1. Restate the problem in your own words. 2. List the constraints (budget, ratio, item counts). 3. Define variables. 4. Set up the equations. 5. Solve. 6. Verify with substitution. 7. State the answer. A bakery sells 3 muffins for $5 and 5 cookies for $4 ...
Without CoT, GPT-4 and Claude both occasionally get this wrong (off by one or two items). With scaffolded CoT, both get it right consistently.
2. Code debug
Before: “Find the bug in this code.”
After: “Trace through this function with the following input. State each variable's value after each line. Then identify where the actual output diverges from the expected output. Then explain the root cause. Then suggest a fix.”
Forcing the trace step catches subtle off-by-one and state-mutation bugs that “find the bug” misses.
3. Decision analysis
Before: “Should I take this job offer?”
After: “Walk through this in order. (1) What are the explicit pros? (2) What are the explicit cons? (3) What's the opportunity cost? (4) What's the reversibility — can I un-take this? (5) What do I assume? (6) What would change my answer? (7) Recommendation + confidence.”
4. Classification with edge cases
Before: “Classify this customer message: billing / technical / feature request.”
After: “For this customer message: (1) Restate the literal complaint in 1 sentence. (2) Identify any sub-issues. (3) Match each to the closest category. (4) If multiple match, explain the precedence rule. (5) Final classification + confidence.”
5. Comparison / tradeoff
Before: “Compare React and Vue for our project.”
After: “Step through 5 dimensions: (1) developer experience, (2) hiring pool, (3) bundle size, (4) ecosystem maturity, (5) future-proofing. For each, state what we know vs what we assume. End with a recommendation tied to specific project constraints.”
6. Self-check on a written draft
Before: “Is this email professional?”
After: “Read the email below. (1) Identify the tone in one phrase. (2) Find any phrase that could be misread. (3) Check for missing context the reader would need. (4) Check the ask is unambiguous. (5) Rate professionalism 1-10 with reasoning.”
Why CoT works (the boring real reason)
LLMs are autoregressive — each new token is conditioned on every token that came before. When the model generates reasoning tokens first, those tokens become the “workspace” the final answer token gets to use. The final answer is no longer the model's first instinct on the raw question; it's the model's considered answer after it's done some work.
This is also why “think harder” doesn't help — the model can't think without writing tokens. Structuring the reasoning gives it more tokens to think with.
Variants worth knowing
- Tree-of-Thoughts (ToT) — generate 3 branches per decision point, evaluate each, prune. Better than linear CoT for high-stakes decisions.
- Self-Refine — generate → critique own output → revise. Loop 2-3 times. Roughly doubles output quality on creative tasks.
- Self-Consistency — run the same CoT prompt N times with temperature > 0, take the majority answer. Beats single-sample CoT on benchmarks.
- Step-Back Prompting — abstract the general principle before answering the specific question. Better generalization, fewer hallucinations.
Try it in 30 seconds
Paste a multi-step problem into our interactive Chain-of-Thought template — fill the placeholder, copy the result, paste into ChatGPT or Claude, and watch accuracy jump on whatever problem you're stuck on.
Or use the Prompt Fixer — when it detects a multi-step task in your prompt, it auto-adds CoT scaffolding to the corrected version.
FAQ
›What is Chain-of-Thought prompting?
Chain-of-Thought (CoT) is a prompting technique where you instruct the model to show its reasoning step by step before giving a final answer. It dramatically improves accuracy on multi-step problems — arithmetic, logic, planning — because the model uses the reasoning tokens to ground the answer.
›Does 'Let's think step by step' really work?
Yes, but the structured version works better. Zero-shot CoT (just appending "Let's think step by step") gives a modest boost. Scaffolded CoT ("First state what you know, then what you don't, then bridge them, then check") gives a much bigger one. Our Chain-of-Thought template uses the scaffolded version.
›When should I NOT use Chain-of-Thought?
On simple factual lookups ("What's the capital of France?"), on short creative tasks where reasoning flattens the prose, and on token-budget-sensitive flows where the extra reasoning tokens cost more than the accuracy gain. Save CoT for problems with more than one logical step.
›Is CoT the same as 'reasoning models' like o1?
Closely related. Models like OpenAI's o1 and Claude's extended-thinking mode run CoT internally and only show you the final answer. CoT prompting is the manual version — works on every model, costs more tokens, but you see the reasoning.
Now try it on your own prompt
The FixAIPrompt auto-fixer applies every pattern in this article automatically — paste any rough prompt and get a polished, model-aware version back. Free, no signup, no API key.