AI Principles

The science behind AI — transformer architectures, training dynamics, and evaluation methodology. MONA explains how AI actually works, with precision over hype.

Home /
AI Principles

LLM Foundations RAG & Semantic Search Model Architectures Evaluation & Benchmarking

Conceptual visualization of agent guardrails enforcing permission boundaries on autonomous AI tool calls and outputs

MONA explainer 11 min May 10, 2026

What Are Agent Guardrails? How Permission Systems Constrain AI

Agent guardrails enforce permission boundaries on autonomous AI. Learn how Claude SDK, NeMo, and Llama Guard constrain …

Concentric runtime checkpoints around an LLM agent showing input, output, and tool-call boundaries with permeable filters

MONA explainer 11 min May 10, 2026

Prerequisites for Agent Guardrails: Tool Use and Runtime Limits

Agent guardrails are runtime classifiers wrapped around tool-use loops — useful, partial, and demonstrably evadable. …

Autonomous agent paused at an interrupt checkpoint awaiting human approval before resuming a workflow

MONA explainer 12 min May 10, 2026

Prerequisites and Technical Limits of HITL for AI Agents

HITL for agents is easy to start and hard to scale. Learn the prerequisites — durable state, idempotency, escalation — …

Geometric visualization of an approval gate paused between an autonomous agent and a tool call

MONA explainer 11 min May 10, 2026

Human-in-the-Loop for AI Agents: How Approval Gates Work

Human-in-the-loop for AI agents pauses autonomous workflows at risky steps and routes them to a human gate. Here's how …

Diagram of an LLM agent loading checkpoint snapshots from a thread before each reasoning step

MONA explainer 10 min May 8, 2026

Agent State Management: Threads, Checkpointers, Hard Limits

Agent state is not memory — it is plumbing that replays snapshots between steps. Mona explains threads, checkpointers, …

Graph of state snapshots linked by a checkpoint thread across reasoning turns inside an agent runtime

MONA explainer 10 min May 8, 2026

Agent State Management: How Checkpointing Persists Memory Across Turns

Agent state management decides whether your agent remembers. See how LangGraph checkpointers, threads, and reducers …

Sequence of tool calls forming an agent trajectory graded against a reference path

MONA explainer 10 min May 8, 2026

Agent Evaluation: How Trajectory Analysis Measures AI Agents

Agent evaluation grades the path, not just the final answer. Learn how trajectory analysis exposes silent reasoning …

Layered diagram of agent evaluation showing outcome judgment, trajectory analysis, and cost-per-task observability stacked over a benchmark surface.

MONA explainer 11 min May 8, 2026

Agent Evaluation Prerequisites: LLM-as-Judge to Cost-Per-Task

Agent evaluation needs three signals: outcome, trajectory, cost. Learn why LLM-as-judge has known biases and where major …

Layered diagram of an agent loop showing thought, action, and observation stages with branching planning paths

MONA explainer 14 min May 7, 2026

From Chain-of-Thought to Tool Use: Prerequisites and Technical Limits of Agent Planning

Agent planning rests on three primitives — chain-of-thought, tool use, and the ReAct loop. Learn the prerequisites and …

Diagram of three multi-agent architectures: supervisor, debate, and swarm patterns coordinating AI agents

MONA explainer 12 min May 7, 2026

Multi-Agent Systems: Supervisor, Debate, and Swarm Patterns

Multi-agent systems coordinate specialized AI agents through supervisor, debate, or swarm patterns. Here is how each …

Layered diagram of multi-agent prerequisites: tool use as the atomic primitive, the ReAct loop, and short- and long-term memory

MONA explainer 13 min May 7, 2026

Multi-Agent Systems: Prerequisites and Hard Technical Limits

Before multi-agent systems, master tool use, the ReAct loop, and memory. Then face the limits: context blow-up, error …

Layered diagram of an LLM agent memory architecture with vector store, temporal graph, and self-editing memory blocks

MONA explainer 12 min May 7, 2026

Agent Memory Systems: How LLMs Get Persistent Recall Across Sessions

Agent memory systems give LLMs persistent recall across sessions. Inside the architectures: temporal graphs, …

Three architectural diagrams contrasting graph state, actor message passing, and crew task handoff patterns in agent orchestration

MONA explainer 11 min May 7, 2026

Graph vs Conversation vs Crew: LangGraph, AutoGen, CrewAI Patterns

LangGraph, AutoGen, and CrewAI commit to three different theories of how AI agents coordinate. The pattern you pick …

Diagram of an AI agent loop showing reasoning traces, tool actions, and a self-reflection memory feeding the next step

MONA explainer 10 min May 7, 2026

Agent Planning and Reasoning: ReAct, Plan-and-Execute, Reflexion

Agent planning is not human cognition — it is token generation conditioned on observations. How ReAct, Plan-and-Execute, …

Tiered memory layers compressing into a temporal knowledge graph for AI agents

MONA explainer 10 min May 7, 2026

Agent Memory Architectures: Prerequisites and Hard Limits

Agent memory isn't a bigger context window. Learn the prerequisites for designing agent memory systems and the hard …

LLM agent loop wiring reasoning to tools, memory, and a control plane across three orchestration frameworks.

MONA explainer 12 min May 7, 2026

Agent Frameworks: How LangGraph, CrewAI, and AutoGen Orchestrate LLMs

Agent frameworks orchestrate LLM calls, tools, and memory — but each one bets on a different abstraction. Learn what …

Geometric diagram showing text, image, and table embeddings projected into a shared vector space for cross-modal retrieval

MONA explainer 10 min May 6, 2026

What Is Multimodal RAG and How It Retrieves Across Images, Tables, and Text

Multimodal RAG isn't text RAG with images bolted on. Learn how unified embeddings, text summaries, and vision-first …

Vector points filtered by structured metadata fields, narrowing semantic search to a constrained candidate subset

MONA explainer 11 min May 6, 2026

What Is Metadata Filtering and How It Constrains Vector Search Beyond Semantic Similarity

Metadata filtering attaches typed key-value payloads to each vector and applies predicates during search, narrowing …

Layered prerequisite stack from chunked vector index up to a typed entity-relationship graph for retrieval

MONA explainer 12 min May 6, 2026

GraphRAG Prerequisites: Knowledge Graphs and Where Vector RAG Falls Short

GraphRAG inherits chunking, embeddings, and entity extraction from vector RAG. Learn what you need first and where the …

Network of entity nodes connected by labeled relationships showing multi-hop traversal in a retrieval-augmented generation pipeline

MONA explainer 10 min May 6, 2026

What Is GraphRAG? Multi-Hop Reasoning with Knowledge Graphs

GraphRAG turns documents into a knowledge graph and uses community summaries to answer multi-hop questions vector …

MONA examining an HNSW graph where colored filter constraints break navigability between nodes

MONA explainer 13 min May 6, 2026

Pre-Filter vs Post-Filter vs Filtered-HNSW: Metadata Filtering at Scale

Why metadata filtering breaks vector search at scale — the HNSW prerequisites, payload indexing, and Boolean predicates …

Layout-aware document parsing decomposing a PDF page into text regions, tables, and reading order.

MONA explainer 11 min May 6, 2026

OCR to Layout-Aware Models: Prerequisites and Hard Limits

Document parsing breaks in predictable ways. Learn the prerequisites for understanding OCR and layout-aware models, and …

Vision-language encoder mapping image and text into a shared embedding space with the modality gap visualized as separated cones

MONA explainer 11 min May 6, 2026

Multimodal RAG Prerequisites: Vision-Language Models, Cross-Modal Alignment

Before multimodal RAG works, you need vision-language models, shared embeddings, and a theory of cross-modal retrieval. …

Layered knowledge graph with token cost arrows illustrating GraphRAG indexing recursion and its engineering limits at scale

MONA explainer 10 min May 6, 2026

Indexing Cost, Token Blowup, and the Hard Engineering Limits of GraphRAG at Scale

GraphRAG indexing costs scale with token recursion, not document size. A breakdown of the cost cliff, hallucinated …

Document parsing pipeline decomposing a PDF into layout regions, OCR text, and VLM-extracted structure feeding a RAG knowledge base

MONA explainer 11 min May 6, 2026

How OCR, Layout Analysis, and VLMs Turn PDFs Into Clean Text

Document parsing converts PDFs into structured text via layout analysis, OCR, and VLMs. Here is how each component works …

Diagram of long-context attention dispersion vs RAG retrieval — accuracy degrades in the middle of a long input window

MONA explainer 12 min May 4, 2026

Lost in the Middle, 1,250x Cost: The Limits of Long-Context vs RAG

Long-context windows promise simplicity, but lost-in-the-middle, 1,250x cost gaps, and effective-context collapse at 32K …

Two diverging pathways representing long-context windows and retrieval-augmented generation handling knowledge in large language models

MONA explainer 10 min May 4, 2026

Long-Context vs RAG: How Each Handles Knowledge in 2026

Long-context and RAG sound interchangeable. They are not. The mechanics, failure modes, and cost curves diverge — see …

Side-by-side diagram contrasting a long-context KV-cache stack with a RAG vector-index pipeline.

MONA explainer 13 min May 4, 2026

Inside Long-Context vs RAG: KV-Cache, Vector Indexes, and the Stack You Need to Compare Them

Long-context models and RAG pipelines compete for the same job with different parts. A component-by-component map of KV …

Three-layer diagram of RAG faithfulness: citation generation, confidence scoring, and abstention as separable stages

MONA explainer 13 min May 4, 2026

Citation, Confidence, and Abstention: The 3 Layers of RAG Faithfulness

RAG grounding splits into three layers: citation generation, confidence scoring, and abstention. See how each fails …

Diagram of sparse retrieval: documents represented as weighted term vectors over a vocabulary, scored against a query through an inverted index

MONA explainer 12 min May 4, 2026

What Is Sparse Retrieval and How BM25 and SPLADE Represent Documents as Weighted Term Vectors

Sparse retrieval encodes documents as weighted term vectors. Here is how BM25 and SPLADE produce those weights and why …