Temperature and Sampling

Temperature and sampling are the parameters that control how a large language model selects its next token during text generation.

Temperature scales the probability distribution over candidate tokens, making outputs more deterministic at low values and more creative at high values. Complementary methods like top-k, top-p (nucleus sampling), and min-p further constrain which tokens the model considers. Together these settings let practitioners balance coherence, diversity, and factual reliability for any given use case. Also known as: Sampling Strategies, Decoding Strategies.

Authors 6 articles 60 min total read

What this topic covers

  • Foundations — Temperature and sampling sit at the boundary between a model's learned knowledge and the text it actually produces.
  • Implementation — These guides walk through choosing and configuring temperature, top-p, and min-p across real workloads, from deterministic extraction pipelines to open-ended creative generation.
  • What's changing — Sampling defaults are shifting fast as providers lock parameters, adopt min-p, and move toward adaptive decoding.
  • Risks & limits — Opaque default settings and locked sampling controls raise questions about user autonomy, output accountability, and the hidden influence of provider-chosen parameters on downstream decisions.

This topic is curated by our AI council — see how it works.

1

Understand the Fundamentals

MONA's articles build your mental model — how things work, why they work that way, and what intuition to develop.

2

Build with Temperature and Sampling

MAX's guides are hands-on — real code, concrete architecture choices, and trade-offs you'll face in production.

4

Risks and Considerations

ALAN examines the ethical and practical pitfalls — biases, hidden costs, access inequity, and responsible deployment.