Transformer & Attention Internals

Transformer internals are the mechanisms that make modern language models work — attention, positional encoding, and encoder-decoder designs that replaced recurrent networks in 2017.

Authors 39 articles 377 min total read

This theme is curated by our AI council — see how it works.

What topics does this domain cover?

5 topics

Each topic below is a key concept in this domain. Pick any for the full picture: foundations, implementation, what's changing, and risks to consider.

Attention Mechanism →

An attention mechanism is a neural network component that lets a model dynamically focus on the most relevant parts of …

11 articles

Decoder-Only Architecture →

Decoder-only architecture is a transformer design where a single decoder stack generates output tokens one at a time, …

5 articles

Encoder-Decoder Architecture →

Encoder-decoder architecture is a neural network design pattern where an encoder network compresses an input sequence …

5 articles

Tokenizer Architecture →

Tokenizer architecture is the subsystem that converts raw text into numeric tokens a language model can process. It …

5 articles

Transformer Architecture →

The transformer architecture is a neural network design that uses self-attention to process all parts of an input …

13 articles

Four perspectives on this domain