Glitch Tokens

Also known as: anomalous tokens, undefined tokens, rogue tokens

Glitch Tokens: Anomalous tokens in a language model’s vocabulary that produce erratic outputs — gibberish, hallucinations, or refusals — because the tokenizer included them during vocabulary construction but the model’s training data contained too few examples for the model to learn stable representations.

Glitch tokens are anomalous entries in a language model’s vocabulary that produce erratic behavior — nonsensical outputs, hallucinations, or refusals — because the model never learned meaningful patterns for them during training.

What It Is

When you train or select a custom tokenizer, you’re building the vocabulary your model uses to read and write text. Glitch tokens are what happen when that vocabulary includes entries the model’s training data rarely or never contained. The tokenizer learned these tokens from one dataset, but the model trained on a different one — leaving gaps. When one of these orphaned tokens reaches the model during inference — the moment it generates a response — the result is unpredictable: gibberish, repeated phrases, hallucinated content, or outright refusal to respond.

Think of it like a phrasebook for a language you’ve never heard spoken. The words are listed, the spelling looks right, but you have no idea how to use them in a sentence. When someone asks you to say one of those phrases in context, you freeze or improvise badly. That’s what a language model does when it encounters a glitch token — it has a slot in its vocabulary but no learned understanding of what belongs there.

The phenomenon gained attention in January 2023 when researchers Jessica Rumbelow and Matthew Watkins discovered that feeding the token “SolidGoldMagikarp” — a Reddit username — into GPT-3.5 triggered bizarre outputs. The model would repeat the word, hallucinate stories, or refuse to acknowledge what it had been told. According to Li et al. 2024, the root cause is a token present in the vocabulary but absent or extremely rare in the model’s training corpus (its training dataset), producing an undertrained embedding — the internal numeric representation for that token — that destabilizes output.

Detection is harder than it sounds. You can’t scan a vocabulary list and flag suspicious entries by eye. According to Li et al. 2024, their GlitchHunter method uses iterative clustering in embedding space — grouping tokens by how similar their internal representations look — to spot tokens that cluster abnormally, a signal they never received enough training to develop stable meanings.

Newer tokenizers have reduced the problem. According to LessWrong, GPT-4’s tokenizer splits “SolidGoldMagikarp” into five normal subtokens, eliminating the glitch behavior entirely. But the underlying risk persists whenever a tokenizer vocabulary and model training data diverge — a direct concern for anyone building a custom tokenizer with tools like tiktoken, SentencePiece, or HF Tokenizers.

How It’s Used in Practice

Most practitioners encounter glitch tokens not by looking for them, but by bumping into unexplained model failures. A prompt that should produce a clean summary instead outputs repetitions or nonsense. The first instinct is to blame the model or the prompt, but the actual culprit can be a single token in the input that the model cannot process coherently.

For teams building custom tokenizers, awareness of glitch tokens changes how you validate your vocabulary. After training a tokenizer on your domain-specific corpus, you need to cross-check: does every token in the resulting vocabulary appear frequently enough in the data your model will actually train on? If your tokenizer trains on a web scrape but your model fine-tunes on curated documents, the gap between those datasets is exactly where glitch tokens hide.

Pro Tip: After training a custom tokenizer, run your model’s most common prompts through it and inspect the token IDs. If any token ID appears in the output but has fewer than a handful of occurrences in your training corpus, flag it. Removing or merging rare tokens before model training is far cheaper than debugging mysterious inference failures after deployment.

When to Use / When Not

Scenario	Use	Avoid
Building a custom tokenizer from a domain-specific corpus	✅
Debugging unexplained model gibberish or refusal patterns	✅
Merging vocabularies from different training datasets	✅
Using a well-maintained commercial API with an updated tokenizer		❌
Optimizing prompt wording for better response quality		❌
Auditing a fine-tuned model trained on narrow domain data	✅

Common Misconception

Myth: Glitch tokens are bugs in the model’s neural network that disappear with more training. Reality: They’re a vocabulary-level problem, not a weight-level problem. The model’s weights are fine — the issue is that certain tokens in the tokenizer’s vocabulary never had enough training examples to develop stable representations. The fix is updating the tokenizer’s vocabulary or retraining it on a corpus that matches the model’s training data, not running more training epochs.

One Sentence to Remember

If your tokenizer’s vocabulary includes tokens your model never learned from real data, those tokens become landmines — unpredictable in output and invisible until someone triggers them.

FAQ

Q: How did the “SolidGoldMagikarp” glitch token get discovered? A: In January 2023, researchers Rumbelow and Watkins found that this Reddit username, present in GPT-3.5’s tokenizer but nearly absent from its training data, caused nonsensical and erratic model outputs when prompted.

Q: Can glitch tokens appear in any language model? A: Yes. Any model whose tokenizer was trained on a different corpus than the model itself can have glitch tokens. The problem is not specific to one vendor or architecture.

Q: How do you detect glitch tokens in a custom tokenizer? A: Cross-reference your tokenizer’s vocabulary against your training corpus. Tokens with very low or zero frequency in training data are candidates. Automated methods like embedding-space clustering can also flag them.

Sources

Li et al. 2024: Glitch Tokens in LLMs: Categorization Taxonomy and Effective Detection - Academic study categorizing glitch tokens across multiple LLMs and proposing detection methods
Alignment Forum: SolidGoldMagikarp (plus, prompt generation) - Original discovery and analysis of glitch tokens in GPT-3.5

Expert Takes

MONA

Glitch tokens reveal a gap between tokenization and representation learning. The tokenizer builds vocabulary from raw corpus statistics, but the model learns embeddings from a potentially different distribution. When those distributions diverge, certain tokens receive embedding vectors with insufficient gradient signal — their representations cluster near random initialization rather than forming meaningful semantic neighborhoods. Not a software bug. A statistical orphan.

MAX

You’re building a custom tokenizer and your vocab set doesn’t match your training set. Before any model training starts, cross-reference every token against your training corpus. Zero-frequency tokens in your vocabulary get removed or merged. Period. This isn’t optional cleanup — it’s a production gate that prevents bizarre inference failures from reaching users.

DAN

The SolidGoldMagikarp moment was a wake-up call. If a single orphaned token can make a model produce nonsense in production, what does that mean for teams deploying custom models with custom tokenizers? The organizations auditing their tokenizer vocabularies before deployment are the ones avoiding embarrassing incidents. Those skipping the audit are betting on luck.

ALAN

A model that fails silently on certain inputs raises serious reliability questions. If a user triggers a glitch token and receives hallucinated output, who is responsible for the misinformation that follows? The developer who shipped the tokenizer? The team that chose the training data? The absence of vocabulary auditing standards means these failure modes remain an open question nobody wants to own.

Back to Glossary