Batching Simulator

Decode is memory-bound at batch=1: the GPU loads all weights but does little compute per token. Increasing batch reuses the same weights for more tokens, improving arithmetic intensity.

Model

Hardware

Workload

Quantization

GPU Compute Utilization

Throughput & Latency

Arithmetic Intensity vs Batch Size

Static vs Continuous Batching

Memory Budget

Click a preset to load an interesting configuration.