Attention Visualizer

Self-attention mechanism explorer

Input sequence:

Tokens (click to select source)

Attention Matrix (Head 1)

Attention Heads

Attention Flow

Input

→

Q = XWq

Input

→

K = XWk

Input

→

V = XWv

A = softmax(QK^T/√d)

Output = AV

Settings

Number of heads

Head dimension

Temperature 1.0

Causal masking

0

1

Q/K/V Projections

Query (selected token)

Keys (all tokens)

Values (all tokens)