A small draft model proposes K tokens quickly.
The large target model verifies all K in one forward pass.
If accepted, you get K tokens for ~1 large-model step.
Target Model
Draft Model
Hardware
Parameters
Token Generation — Step by Step
Click "Step" to simulate one speculative cycle.
—
expected speedup
Time Breakdown per Cycle
Speedup vs Acceptance Rate
Click a preset to load an interesting configuration.
Run at K=4, α=0.80 and step through the animation — watch the accept/reject pattern
Drag α from 0.95 down to 0.30 — at what point does the speedup drop below 1.1×?