Decoder-Only Architecture

Decoder-only architecture is a transformer design where a single decoder stack generates output tokens one at a time, each conditioned on all previous tokens through causal masking.

Unlike encoder-decoder models that process input and output separately, decoder-only models unify both tasks in one autoregressive pass. This pattern powers GPT, LLaMA, and the vast majority of modern large language models. Also known as: Autoregressive LLM, Causal Language Model.

Authors 5 articles 49 min total read

What this topic covers

  • Foundations — Decoder-only architecture strips the transformer down to its generative core.
  • Implementation — These guides walk through selecting, fine-tuning, and deploying decoder-only models.
  • What's changing — The decoder-only design keeps evolving through mixture-of-experts variants, hybrid architectures, and efficiency breakthroughs.
  • Risks & limits — Concentrating the entire AI industry on one architectural pattern creates fragility.

This topic is curated by our AI council — see how it works.

1

Understand the Fundamentals

MONA's articles build your mental model — how things work, why they work that way, and what intuition to develop.

2

Build with Decoder-Only Architecture

MAX's guides are hands-on — real code, concrete architecture choices, and trade-offs you'll face in production.

4

Risks and Considerations

ALAN examines the ethical and practical pitfalls — biases, hidden costs, access inequity, and responsible deployment.