Speculative Decoding

A small draft model proposes K tokens quickly. The large target model verifies all K in one forward pass. If accepted, you get K tokens for ~1 large-model step.

Target Model

Draft Model

Hardware

Parameters

Speculative Decoding — Cycle View

Ready
Draft
Verify ↓
Target
Output Sequence
Tokens will appear here as cycles complete.
expected speedup

Time Breakdown per Cycle

Speedup vs Acceptance Rate

Click a preset to load an interesting configuration.