LLM Foundations

Core mechanics of large language models — training, inference, tokenization, and the mathematics of next-token prediction.

Text tokens flowing into a diffusion latent space, becoming geometric attention maps that resolve into a generated image
MONA explainer 13 min

Prompt Engineering for Image Generation: How Diffusion Models Read Text

Image prompts steer probability, not pixels. Learn how diffusion models, cross-attention, and CFG turn text into images …

Layered diagram of prompt parsing across diffusion, autoregressive, and multimodal image models
MONA explainer 9 min

Negative Prompts, Weights, Seeds: Image Prompting Limits 2026

Negative prompts and weight syntax aren't universal — and seed reproducibility breaks across model versions. Inside the …

Diagram of an alpha matte separating a portrait from its background, with hair edges marked as the unknown band
MONA explainer 12 min

Alpha Channels, Trimaps, and the Hard Limits of AI Background Removal

Background removal is alpha estimation, not subject detection. Learn how trimaps and matting work, and why hair, glass, …

Low-rank adapter matrices BA layered onto a frozen diffusion model for image generation fine-tuning
MONA explainer 11 min

How LoRA Fine-Tunes Diffusion Models for Image Generation

LoRA fine-tunes Stable Diffusion and FLUX without retraining. Learn how rank, alpha, and the BA decomposition turn a …

AI image upscaling structural limits at 4K and 8K - diffusion priors hallucinate faces and tile-local processing produces visible seams
MONA explainer 12 min

Why AI Upscalers Hallucinate Faces and Tile Seams at 4K and 8K

AI upscalers don't break at 4K and 8K because of weak hardware. The failures are structural — rooted in diffusion priors …

Low-resolution pixels expanding into a high-resolution image through generative neural-network inference
MONA explainer 11 min

What Is Image Upscaling and How AI Super-Resolution Reconstructs Detail Beyond the Original Pixels

AI image upscaling doesn't enlarge what was captured — it generates plausible pixels from a learned prior. Learn how GAN …

Frozen diffusion model weights with low-rank adapter matrices flowing into the UNet attention block during LoRA training
MONA explainer 11 min

Training Image LoRAs: Diffusion Math, Rank-Alpha, and VRAM Limits

Image LoRAs retarget diffusion models with small adapter files. Learn the rank-alpha math, VRAM ranges from SD 1.5 to …

Anatomy of an AI upscaler — residual dense blocks on one side, a diffusion prior on the other, sharing one degraded input
MONA explainer 13 min

From RRDB Blocks to Diffusion Priors: Inside Modern AI Upscalers

How modern AI upscalers are built — from ESRGAN's RRDB blocks and Real-ESRGAN to SUPIR's diffusion prior, plus the …

Diagram of AI image editing: mask-guided inpainting, canvas outpainting, and instruction-based diffusion edit
MONA explainer 12 min

What Is AI Image Editing? Inpainting, Outpainting, Edit Models

AI image editing uses diffusion to modify pixels under a mask or follow text instructions. Learn how inpainting, …

Noise-to-image diffusion process with a text instruction transforming a latent representation into an edited output
MONA explainer 10 min

From Diffusion to InstructPix2Pix: AI Image Editing Prerequisites

Before using GPT Image or FLUX, understand diffusion, classifier-free guidance, and why InstructPix2Pix made …

Layered neural network architecture showing signal propagation and gradient flow through weighted connections
MONA explainer 13 min

What Is a Neural Network and How It Learns to Generate Language

Neural networks learn language by adjusting millions of weights through backpropagation. Learn how layers, gradients, …

MONA tracing signal flow through neural network layers from ReLU to SwiGLU activation functions
MONA explainer 10 min

From ReLU to SwiGLU: How Activation and Loss Functions Shape LLM Training

Trace the path from ReLU to SwiGLU and understand how activation functions, cross-entropy loss, and gradient dynamics …

Gradient arrows flowing backward through layered neural network nodes toward a loss function surface
MONA explainer 9 min

Backpropagation and Gradient Descent: How Neural Networks Learn From Errors

Learn how backpropagation and gradient descent train neural networks by propagating error signals backward through …

Geometric visualization of benchmark scores converging and diverging across evaluation dimensions
MONA explainer 11 min

What Is Model Evaluation and How Benchmarks, Metrics, and Human Judgment Measure LLM Quality

Model evaluation combines benchmarks, automated metrics, and human judgment to measure LLM quality. Learn why high …

Four divergent scoring dimensions representing probability, text overlap, recall, and preference intersecting around a
MONA explainer 10 min

Perplexity, BLEU, ROUGE, and ELO: The Core Metrics Behind LLM Evaluation Explained

Perplexity, BLEU, ROUGE, and Elo measure fundamentally different properties of language models. Learn when each metric …

Abstract visualization of benchmark scores fracturing as contamination patterns distort evaluation metrics
MONA explainer 10 min

Benchmark Contamination, Metric Gaming, and the Hard Limits of LLM Evaluation

Benchmark contamination inflates LLM scores while real-world performance lags. Learn why metric gaming and saturated …

Mathematical proof that language model hallucination cannot be eliminated, showing fundamental limits of autoregressive
MONA explainer 9 min

Why Zero-Hallucination LLMs Remain Impossible: Autoregressive Limits and Benchmark Ceilings in 2026

LLM hallucination is mathematically inevitable. Explore the autoregressive limits, benchmark ceilings, and why …

Probability distribution branching into confident but factually diverging output paths from a language model
MONA explainer 9 min

What Is AI Hallucination and How Statistical Next-Token Prediction Creates Confident Falsehoods

AI hallucinations aren't bugs — they emerge from how next-token prediction works. Learn why LLMs produce confident …

Branching classification tree with split pathways representing hallucination taxonomy categories
MONA explainer 10 min

Intrinsic vs. Extrinsic, Closed vs. Open Domain: The Taxonomy and Prerequisites of LLM Hallucination

LLM hallucination isn't one problem — it's four. Learn the intrinsic vs. extrinsic taxonomy, the domain split, and the …

GPU scheduling pipeline visualization showing requests entering and leaving batch slots at each forward-pass iteration
MONA explainer 10 min

What Is Continuous Batching and How Iteration-Level Scheduling Maximizes GPU Throughput

Continuous batching replaces request-level scheduling with iteration-level scheduling, keeping GPUs busy on every …

Token sequences flowing through GPU memory blocks with active slots recycling while idle slots wait for reallocation
MONA explainer 11 min

From Static Batching to PagedAttention: Prerequisites and Hard Limits of Continuous Batching

Continuous batching swaps finished LLM requests every decode step. Learn how PagedAttention cuts KV cache waste to under …

Weight matrix grid transitioning from high to low precision with format labels and accuracy indicators
MONA explainer 11 min

GPTQ vs AWQ vs GGUF vs bitsandbytes: Quantization Formats and Their Tradeoffs Explained

GPTQ, AWQ, GGUF, and bitsandbytes each shrink LLM weights differently. Compare speed, accuracy, and hardware reach to …

Precision grid transitioning from smooth high-bit gradients to fragmented low-bit patterns with visible accuracy gaps
MONA explainer 10 min

Accuracy Collapse, Task-Specific Degradation, and the Hard Limits of Sub-4-Bit Quantization

Sub-4-bit quantization promises smaller LLMs, but accuracy collapses unevenly across tasks and languages. Learn where …

Probability curves shifting between sharp peaks and flat noise as a temperature dial moves between repetition and
MONA explainer 12 min

Repetition Loops, Hallucination Spikes, and the Hard Limits of Sampling Parameter Tuning

Wrong sampling parameters trap LLMs in repetition loops or hallucination. Trace the probability math behind both failure …

Abstract visualization of tokens flowing sequentially through a neural network during autoregressive decoding
MONA explainer 11 min

What Is Model Inference and How LLMs Generate Text Through Autoregressive Decoding

Model inference generates LLM text one token at a time via autoregressive decoding. Learn why this sequential bottleneck …

Abstract visualization of memory blocks fragmenting across GPU architecture with quadratic growth curves overlaid
MONA explainer 10 min

Memory Walls, Quadratic Context Costs, and the Hard Engineering Limits of LLM Inference in 2026

LLM inference hits hard physical walls — memory, quadratic attention, bandwidth. Learn the engineering limits and 2026 …

Abstract visualization of memory blocks flowing through a transformer attention layer during token generation
MONA explainer 11 min

KV-Cache, PagedAttention, and the Building Blocks Every LLM Inference Pipeline Needs

KV-cache, PagedAttention, and continuous batching form the inference pipeline core. Learn how memory management …

Probability distribution curves shifting shape as a temperature dial turns from cold precision to warm randomness
MONA explainer 10 min

What Is Temperature in LLMs and How Softmax Scaling Controls Text Generation Randomness

Temperature divides logits before softmax, reshaping the token probability distribution. Learn how this parameter, …

MONA examining neural network weights being compressed from wide floating-point blocks into compact integer representations
MONA explainer 10 min

What Is Quantization and How FP32-to-INT4 Compression Makes LLMs Run on Consumer Hardware

Quantization compresses LLM weights from FP32 to INT4, cutting memory up to 8x. Learn how GPTQ, AWQ, and calibration …

Probability distributions carved into different geometric shapes by four sampling filters applied in sequence
MONA explainer 10 min

Top-K, Top-P, Min-P, and Beam Search: Every LLM Sampling Method Compared

Compare top-k, top-p, min-p, and beam search LLM sampling methods. Learn how each reshapes probability distributions and …