Glossary · Technique
Self-Consistency
Also known as: Majority voting CoT
Sample the same Chain-of-Thought prompt N times. Take the majority answer. Beats single-sample CoT on reasoning benchmarks.
When to use it
- Numerical reasoning, logic, and multi-step math.
- Anywhere a single CoT trace is sometimes confidently wrong.
- When you can afford N× the API cost for a meaningful accuracy bump.
- Production systems with quality SLAs on reasoning correctness.
When not to use it
- Open-ended creative tasks — there's no 'majority' answer.
- Cost-sensitive flows where N samples is prohibitive.
- Real-time chat with strict latency budgets.
How it works
- 1Same Chain-of-Thought prompt, run N times with temperature > 0.
- 2Each sample produces a (different) reasoning chain and final answer.
- 3Tally the final answers; pick the most-frequent one.
- 4Optionally weight by reasoning quality (e.g. shorter chains, fewer hedges).
Example
Lazy prompt
Let's think step by step about <hard problem>.
Using the technique
Sample this CoT prompt 5 times (temperature 0.7). For each, record the final answer. Return the answer that appears most often, and flag if no answer reached majority.
Common pitfalls
- N× cost — only worth it if accuracy matters.
- Temperature too high = noise; too low = all samples agree on the wrong answer.
- Majority isn't always right; on adversarial questions it can lock in the popular-but-wrong answer.
Where this came from
Wang et al., 2022 — "Self-Consistency Improves Chain of Thought Reasoning in Language Models".
Related techniques
Chain-of-Thought (CoT) Prompting
Force the model to think step-by-step before answering. Dramatically improves accuracy on multi-step problems.
Tree-of-Thoughts (ToT) Prompting
Generate multiple reasoning branches per step, evaluate each, and prune. Beats single-path Chain-of-Thought on hard decisions.
Self-Refine
Generate → critique own output → revise → repeat. Pushes a model's output much closer to its capability ceiling.