Flash Attention

Standard attention materializes the full S×S score matrix in GPU memory. Flash Attention tiles the computation to stay in fast SRAM — same math, far less memory traffic.

Sequence Length

Head Dimension (d_head)

Block Size (Flash)

HBM Traffic

Standard:
Flash:
Reduction:

Peak Memory (score matrix)

Standard:
Flash:

Standard Attention

HBM Traffic
Peak Memory

Flash Attention

HBM Traffic
Peak Memory
Block 0 / 0

Click a preset to load an interesting configuration.