Softmax & Temperature

Softmax converts raw scores (logits) into a probability distribution. Temperature controls how peaked or flat it is.

Temperature

T = 1.0

0.1 (greedy) 1.0 (default) 3.0 (random)

Top-1 Prob

—

Entropy

—

bits

Eff. Vocab

—

2^H tokens

Click a preset to load an interesting configuration.

Why does temperature=1 give the "true" model distribution? What makes other temperatures "distorted"?
Speculative decoding's acceptance rate depends on temperature. Why would lower temperature mean higher acceptance?
Top-k sampling cuts off low-probability tokens. How does temperature interact with top-k?

Now see how the model generates tokens one at a time → Autoregressive Generation
See how attention uses softmax for weight computation → Attention