How it works

Making it fast

Decoding

The model outputs logits for every token in its vocabulary. Softmax converts these to probabilities, then a sampling strategy selects the next token.

Temperature

T = 1.0

0.1 (peaked) 1.0 (default) 3.0 (flat)

Sampling Strategy

Top-k 5

Top-p 0.90

Statistics

Top-1 Prob

—

Candidates

—

Filtered

—

Entropy

—

bits

Eff. Vocab

—

tokens

Strategy

—

Top Tokens

Logits (raw model scores — fixed)

Sampling Distribution

Samples

Click a preset to load an interesting configuration.

Set top-k to 5 and click Resample repeatedly — how many distinct tokens appear?
Compare top-p=0.9 at T=0.5 vs T=2.0 — how does the candidate count change?
Why does greedy always produce the same token, while top-p gives variety?
Set top-k=20 (all tokens) — how does it compare to no filtering?

Why does top-p adapt to the distribution shape while top-k doesn't?
If T=0.1 makes the distribution very peaked, does top-k even matter at low temperature?
How do top-k and top-p interact when combined? Which filter is applied first?

Now see how the model generates tokens one at a time → Autoregressive Generation
See how attention uses softmax for weight computation → Attention