Speculative Decoding

A small draft model proposes K tokens quickly. The large target model verifies all K in one forward pass. If accepted, you get K tokens for ~1 large-model step.

Target Model

Draft Model

Hardware

Parameters

Token Generation — Step by Step

Click "Step" to simulate one speculative cycle.
expected speedup

Time Breakdown per Cycle

Speedup vs Acceptance Rate

Click a preset to load an interesting configuration.