Attention

See how each token decides what to pay attention to — the core mechanism of transformers.

Input Sequence

Head Dimension (d_head)

Hover over a row in any matrix to see which tokens it attends to.

Q, K, V Projections

Click a preset to load an interesting configuration.