LLM Inference Explorer
How it works
Tokens
Embed
Position
Attention
Softmax
Generation
Shapes
Making it fast
Model
Layer
Tiling
Flash Attention
KV Cache
Batching
Speculative Decode
Cost
Positional Encoding
Without position information, attention treats "the cat sat" and "sat the cat" identically. Position encodings tell the model where each token is.
Model
Encoding Type
Sinusoidal
Rotary
Max Positions
Positions:
64
Position A
pos =
5
Position B
pos =
6
Dot Product Similarity
—
Nearby positions → higher similarity
Positional Encoding Heatmap
Position Vectors Comparison
Position A
Position B
Presets
Explore
Questions
What Next
Tips
▼
Click a preset to load an interesting configuration.
Sinusoidal
Rotary
Close (5, 6)
Distant (5, 100)
Compare the heatmap patterns for sinusoidal vs rotary — how do the stripe patterns differ?
Drag positions A and B to see how the dot product decays with distance.
Notice the different frequency bands: low dimensions change slowly, high dimensions oscillate fast.
Increase max positions — the high-frequency stripes become denser.
Why use different frequencies for different dimensions? What would happen if all dimensions used the same frequency?
Why can RoPE generalize to longer sequences than it was trained on, while absolute sinusoidal encodings struggle?
The dot product of two position vectors encodes relative distance. Why is this useful for attention?
Now see how attention uses these position-aware vectors →
Attention
See how tokens get their initial representations →
Embeddings