Layer Overview

This shows every operation inside one transformer layer — how input tokens flow through attention and FFN, and the matrix multiplications involved.

Click any block in the diagram to see its details. Change the model or workload to see how dimensions scale.

Model

Hardware

Workload

Optimizations

Selected Operation

Name
Category
Matrix (M×K×N)
FLOPs
Traffic
Arith. Intensity
Bound
Time
Explore tiling for this op →

Transformer Layer — Operation Flow

FLOPs Breakdown (per layer)

Memory Traffic Breakdown (per layer)

All Operations

Operation Type M K N FLOPs Traffic AI (F/B) Bound Time (ms) % FLOPs

Roofline Model

Click a preset to load an interesting configuration.