AI Principles

The science behind AI — transformer architectures, training dynamics, and evaluation methodology. MONA explains how AI actually works, with precision over hype.

Frozen diffusion model weights with low-rank adapter matrices flowing into the UNet attention block during LoRA training
MONA explainer 11 min

Training Image LoRAs: Diffusion Math, Rank-Alpha, and VRAM Limits

Image LoRAs retarget diffusion models with small adapter files. Learn the rank-alpha math, VRAM ranges from SD 1.5 to …

Anatomy of an AI upscaler — residual dense blocks on one side, a diffusion prior on the other, sharing one degraded input
MONA explainer 13 min

From RRDB Blocks to Diffusion Priors: Inside Modern AI Upscalers

How modern AI upscalers are built — from ESRGAN's RRDB blocks and Real-ESRGAN to SUPIR's diffusion prior, plus the …

Diagram of AI image editing: mask-guided inpainting, canvas outpainting, and instruction-based diffusion edit
MONA explainer 12 min

What Is AI Image Editing? Inpainting, Outpainting, Edit Models

AI image editing uses diffusion to modify pixels under a mask or follow text instructions. Learn how inpainting, …

Noise-to-image diffusion process with a text instruction transforming a latent representation into an edited output
MONA explainer 10 min

From Diffusion to InstructPix2Pix: AI Image Editing Prerequisites

Before using GPT Image or FLUX, understand diffusion, classifier-free guidance, and why InstructPix2Pix made …

Diagram of noise progressively resolving into a coherent image across diffusion sampling steps
MONA explainer 11 min

What Is a Diffusion Model? How Reversing Noise Creates Images and Video

Diffusion models generate images by reversing noise. Learn how forward and reverse processes differ, and why predicting …

Geometric diagram of a diffusion pipeline with latent compression, a denoising backbone, cross-attention conditioning, and an ODE sampler
MONA explainer 12 min

U-Net, VAE, Schedulers, and Text Encoders: The Anatomy of a Modern Diffusion Model

A modern diffusion model is not one network but four: a VAE for compression, a U-Net or DiT denoiser, a text encoder, …

Geometric visualization of a neural network fusing text, image, audio, and video streams into a shared latent space
MONA explainer 12 min

Multimodal Architecture: How Models Fuse Text, Images, Audio & Video

Multimodal models like GPT-5 and Gemini 3.1 Pro don't see images — they translate them into token space. Here's the …

Multimodal architecture prerequisites, vision transformers, modality gap, and cross-modal grounding failure in 2026 AI models
MONA explainer 12 min

From Vision Transformers to Modality Gaps: Prerequisites and Technical Limits of Multimodal AI in 2026

Before multimodal AI works, vision transformers, modality gaps, and grounding decay define its limits. The mechanics of …

Diffusion model sampling visualized as iterative denoising steps from noise toward a coherent image
MONA explainer 10 min

Diffusion Models in 2026: Slow Sampling and Hard Engineering Limits

Why diffusion models still need many sampling steps, why FLUX and SD 3.5 stumble on text and hands, and where the 2026 …

Geometric grid of image patches transforming into a token sequence representing vision transformer patch embedding architecture
MONA explainer 13 min

What Is a Vision Transformer and How Image Patches Replaced Convolutions in Computer Vision

Vision Transformers treat images as token sequences, not pixel grids. Learn how 16x16 patches, self-attention, and …

selective state space model hidden state recurrence versus quadratic self-attention on long sequences
MONA explainer 10 min

What Is a State Space Model and How Selective SSMs Replace Quadratic Attention

State space models trade quadratic attention for linear recurrence. See how Mamba's selection works and why long-context …

Compressed state vector losing early tokens while a small attention layer recovers recall in a hybrid sequence model
MONA explainer 11 min

In-Context Learning Gaps, Hybrid Complexity, and the Hard Technical Limits of State Space Models

State space models trade recall for speed. Learn why pure Mamba breaks on in-context tasks and how hybrid SSM-attention …

Image patches flowing through a Vision Transformer encoder with a class token aggregating features for classification.
MONA explainer 12 min

Patch Embeddings, Class Tokens, and 2D Positional Encoding: Inside the Vision Transformer

How Vision Transformers turn images into token sequences — inside patch embeddings, the CLS token, and the shift from 1D …

Diagram of SSM components: hidden state, A/B/C matrices, and selective scan across a token sequence
MONA explainer 11 min

From HiPPO to Selective Scan: The Components and Prerequisites of State Space Models

State space models rebuilt recurrence on new math. Trace the components — HiPPO, S4, selective scan, gating — and the …

Diagram of an image cut into 16x16 patches feeding a transformer encoder with attention arrows and a data-cliff curve
MONA explainer 12 min

From CNN Intuition to Data Hunger: Prerequisites and Hard Limits of Vision Transformers

Vision Transformers drop CNN priors for learned attention — a trade that changes everything. Learn the prerequisites, …

Sparse neural network with glowing active pathways routing through specialized expert sub-networks
MONA explainer 11 min

What Is Mixture of Experts and How Sparse Gating Routes Inputs to Specialized Sub-Networks

Mixture of experts activates only selected sub-networks per token. Learn how sparse gating makes trillion-parameter …

Routing collapse in mixture of experts with token paths converging to dominant experts while idle capacity goes unused
MONA explainer 10 min

Routing Collapse, Load Balancing Failures, and the Hard Engineering Limits of Mixture of Experts

MoE models promise scale at fractional compute cost. Understand routing collapse, memory tradeoffs, and communication …

Geometric visualization of parallel expert networks with a routing gate selecting active pathways through a sparse architecture
MONA explainer 10 min

From Feedforward Layers to Expert Pools: Prerequisites and Building Blocks of MoE Architecture

Mixture of experts replaces one feedforward layer with many expert networks and a router. Learn how MoE gating and …

Message passing in a graph neural network — node embeddings propagating information across connected nodes
MONA explainer 10 min

What Is a Graph Neural Network and How Message Passing Propagates Information Across Nodes

Graph neural networks learn from connections, not grids. Understand message passing, how graph convolution differs from …

Signal diffusion across graph neural network layers with node features converging toward uniformity
MONA explainer 9 min

Oversmoothing, Scalability Walls, and the Hard Technical Limits of Graph Neural Networks

Oversmoothing and neighbor explosion set hard ceilings on graph neural network depth and scale. Learn the mathematical …

Abstract geometric visualization of interconnected nodes and edges forming a graph structure with mathematical notation overlays
MONA explainer 10 min

Adjacency Matrices, Node Features, and the Prerequisites for Understanding Graph Neural Networks

Graph neural networks consume matrices, not pixels. Learn how adjacency matrices, node features, and message passing …

Probability distributions flowing through an encoder-decoder bottleneck with sampling points in latent space
MONA explainer 12 min

What Is a Variational Autoencoder and How the Reparameterization Trick Enables Generative Learning

VAEs compress data into structured probability spaces for generation. Learn how the reparameterization trick and ELBO …

Geometric latent space visualization showing compression paths diverging between deterministic and probabilistic autoencoders
MONA explainer 10 min

From Autoencoders to KL Divergence: Prerequisites and Hard Limits of Variational Autoencoders

Learn the math behind variational autoencoders — KL divergence, ELBO, the reparameterization trick — and why VAEs blur …

Two neural networks locked in adversarial competition with fracture lines revealing mode collapse failure points
MONA explainer 10 min

Mode Collapse, Training Instability, and the Hard Technical Limits of Generative Adversarial Networks

Mode collapse and training instability aren't GAN bugs — they're structural limits of adversarial training. Learn the …

Diagram of two opposing neural networks connected by latent space vectors and adversarial loss signals
MONA explainer 10 min

From Latent Vectors to Adversarial Loss: The Building Blocks and Prerequisites of GAN Architecture

Understand GAN architecture from the ground up: generator, discriminator, latent space, and the adversarial loss that …

Gradient signals fading across unrolled recurrent network time steps with eigenvalue decay
MONA explainer 10 min

Backpropagation Through Time, Vanishing Gradients, and Why Transformers Replaced Recurrent Networks

Gradients decay exponentially in recurrent networks during backpropagation through time. Learn the math, how LSTM gates …

Convolutional filter kernels evolving from simple edge detectors to deep spatial feature hierarchies
MONA explainer 11 min

From LeNet to ConvNeXt: How CNN Architectures Evolved and Where Spatial Inductive Bias Falls Short

Trace CNN evolution from LeNet to ConvNeXt. Understand how spatial inductive bias enables efficient vision but limits …

Layered neural network architecture showing signal propagation and gradient flow through weighted connections
MONA explainer 13 min

What Is a Neural Network and How It Learns to Generate Language

Neural networks learn language by adjusting millions of weights through backpropagation. Learn how layers, gradients, …

Hidden state vectors flowing through recurrent loops in a neural network processing sequential data
MONA explainer 10 min

What Is a Recurrent Neural Network and How Hidden States Process Sequential Data

RNNs use hidden states to carry memory across time steps. Learn how recurrent neural networks process sequences, why …

Learnable filters extracting edge and texture features from image pixels in a convolutional neural network
MONA explainer 10 min

What Is a Convolutional Neural Network and How Learnable Filters Extract Visual Features

Convolutional neural networks detect visual features through learnable filters, not pixel matching. Understand the …