LLM Inference Explorer
How it works
Tokens
Embed
Position
Attention
Softmax
Generation
Shapes
Making it fast
Model
Layer
Tiling
Flash Attention
KV Cache
Batching
Speculative Decode
Cost
Softmax & Temperature
Softmax
converts raw scores (
logits
) into a probability distribution.
Temperature
controls how peaked or flat it is.
Temperature
T = 1.0
0.1 (greedy)
1.0 (default)
3.0 (random)
Statistics
Top-1 Prob
—
Entropy
—
bits
Eff. Vocab
—
2^H tokens
Top-k Tokens
Logits (raw model scores — fixed)
Probabilities after softmax(logits / T)
Presets
Explore
Questions
What Next
Tips
▼
Click a preset to load an interesting configuration.
Greedy (T=0.1)
Default (T=1.0)
Creative (T=1.5)
Random (T=3.0)
Drag temperature from 0.1 to 3.0 and watch the probability bars change
At T=0.1, what fraction of probability mass is on the top token?
At what temperature does the top-1 probability drop below 50%?
Watch the entropy and effective vocabulary size — how do they relate?
Why does temperature=1 give the "true" model distribution? What makes other temperatures "distorted"?
Speculative decoding's acceptance rate depends on temperature. Why would lower temperature mean higher acceptance?
Top-k sampling cuts off low-probability tokens. How does temperature interact with top-k?
Now see how the model generates tokens one at a time →
Autoregressive Generation
See how attention uses softmax for weight computation →
Attention