Inference Cost Estimator

Model

Hardware

Workload

Quantization

Estimator

Memory Budget

Rules of Thumb

Throughput Estimates

Per-Layer Op Breakdown (Decode, 1 token)

GPU Comparison

GPUVRAMFits?GPUsDecode tok/sPrefill tok/sTotal Time$/hr$/1M tokens

Worked Example

Click a preset to load an interesting configuration.