LLM Inference Explorer
How it works
Tokens
Embed
Position
Attention
Softmax
Generation
Shapes
Making it fast
Model
Layer
Tiling
Flash Attention
KV Cache
Batching
Speculative Decode
Cost
Tensor Shapes
Follow a tensor through one transformer layer. See how dimensions change at each stage — this is why FFN dominates
FLOPs
.
Model
Batch Size
1
Sequence Length
64
Tensor Flow Through One Layer
FLOPs Summary
Presets
Explore
Questions
What Next
Tips
▼
Click a preset to load an interesting configuration.
GPT-2 decode
LLaMA 7B prefill
LLaMA 70B batch
Increase sequence length — which operations grow fastest? (QK
T
and Score×V scale as S²)
Compare FFN Up dimensions to Q Projection dimensions — FFN Up is [d, 4d] vs Q's [d, d]. That's why FFN dominates FLOPs.
Switch between GPT-2 and LLaMA 70B — how do the relative proportions change?
Why does FFN typically account for ~65% of FLOPs? Look at the weight matrix dimensions.
At what sequence length does the attention FLOPs (which scale as S²) overtake the FFN FLOPs (which scale as S)?
Now zoom out to see all layers stacked →
Model Overview