LLM Foundations

Core mechanics of large language models — training, inference, tokenization, and the mathematics of next-token prediction.

Home /
AI Principles /
LLM Foundations

Branching retrieval graph that converges into a reasoning loop with reflection and tool-call nodes

MONA explainer 11 min May 16, 2026

From RAG to Agentic RAG: Prerequisites and Technical Limits of Retrieval-Augmented Agents

From RAG to Agentic RAG: Prerequisites and Technical Limits of Retrieval-Augmented Agents ELI5

DOM accessibility tree and raw screenshot views of a webpage, showing the two ways computer use agents perceive interfaces

MONA explainer 11 min May 16, 2026

DOM Trees vs Screenshots: Prerequisites and Technical Limits of Computer Use Agents in 2026

ELI5

Sandboxed Python interpreter receiving generated code from a language model, isolated from the host system

MONA explainer 12 min May 14, 2026

What Are Code Execution Agents and How Sandboxed Interpreters Let LLMs Run Their Own Code

What Are Code Execution Agents and How Sandboxed Interpreters Let LLMs Run Their Own Code ELI5

Layered diagram of an AI code-execution stack: reasoning loop, sandbox runtime, microVM isolation primitives.

MONA explainer 11 min May 14, 2026

Prerequisites for Code Execution Agents: From ReAct Loops to microVM Isolation

ELI5

Three concentric rings representing sandbox isolation, benchmark consistency, and context collapse in code execution agents

MONA explainer 11 min May 14, 2026

Cold Starts, Flaky Tests, and Context Blowup: The Technical Limits of Code Execution Agents in 2026

Cold Starts, Flaky Tests, and Context Blowup: The Technical Limits of Code Execution Agents in 2026 …

Geometric diagram of an LLM pipeline branching, looping, and checkpointing across workflow steps

MONA explainer 12 min May 14, 2026

What Is Workflow Orchestration for AI and How DAGs, State Machines, and Conditional Branching Structure LLM Pipelines

What Is Workflow Orchestration for AI and How DAGs, State Machines, and Conditional Branching …

DAG and state machine orchestration patterns side by side, with retry arrows showing how AI workflows recover from failures.

MONA explainer 11 min May 14, 2026

DAGs vs. State Machines, Retry Logic, and the Hard Technical Limits of AI Workflow Orchestration

DAGs vs. State Machines, Retry Logic, and the Hard Technical Limits of AI Workflow Orchestration …

Cascading failure points branching across an agent execution graph with recovery checkpoints

MONA explainer 12 min May 12, 2026

Agent Error Handling: How Agents Recover From Tool and LLM Failures

Agent error handling turns brittle LLM loops into resilient systems. Learn how guardrails, retries, and checkpoints …

Nested timeline of agent spans showing tool calls, retrieval steps, and token counters arranged as a causal graph

MONA explainer 12 min May 12, 2026

What Is Agent Observability? Traces, Spans, and Token Attribution

Agent observability records every step an AI agent takes. Learn how traces, spans, and token attribution reveal what …

Layered diagram of agent failure modes, idempotency boundaries, and durable execution checkpoints

MONA explainer 11 min May 12, 2026

Resilient AI Agents: Failure Modes, Idempotency, Durable Execution

Reliable AI agents need three foundations: a failure-mode taxonomy, idempotent action boundaries, and durable execution …

Distributed trace graph branching across agent tool calls and LLM invocations

MONA explainer 11 min May 12, 2026

OpenTelemetry GenAI: Prerequisites and Limits of Agent Tracing

OpenTelemetry GenAI semconv is still in Development. What you need to know about tracing prerequisites and hard limits …

Geometric diagram of an LLM agent loop split into routing, caching, and token-budget control layers

MONA explainer 11 min May 12, 2026

Agent Cost Optimization: Routing, Caching, and Token Budgets for LLMs

Agent cost optimization routes requests to the right model, caches reusable computation, and caps runaway loops before …

Diagram of three agent cost vectors: pricing asymmetry, prefill vs decode latency, prompt cache preconditions

MONA explainer 9 min May 12, 2026

Agent Cost Optimization Prerequisites: Pricing, Latency, Caching Limits

Before optimizing agent costs, understand token pricing asymmetry, prefill vs decode latency, and why prompt and …

Conceptual visualization of agent guardrails enforcing permission boundaries on autonomous AI tool calls and outputs

MONA explainer 11 min May 10, 2026

What Are Agent Guardrails? How Permission Systems Constrain AI

Agent guardrails enforce permission boundaries on autonomous AI. Learn how Claude SDK, NeMo, and Llama Guard constrain …

Concentric runtime checkpoints around an LLM agent showing input, output, and tool-call boundaries with permeable filters

MONA explainer 11 min May 10, 2026

Prerequisites for Agent Guardrails: Tool Use and Runtime Limits

Agent guardrails are runtime classifiers wrapped around tool-use loops — useful, partial, and demonstrably evadable. …

Autonomous agent paused at an interrupt checkpoint awaiting human approval before resuming a workflow

MONA explainer 12 min May 10, 2026

Prerequisites and Technical Limits of HITL for AI Agents

HITL for agents is easy to start and hard to scale. Learn the prerequisites — durable state, idempotency, escalation — …

Geometric visualization of an approval gate paused between an autonomous agent and a tool call

MONA explainer 11 min May 10, 2026

Human-in-the-Loop for AI Agents: How Approval Gates Work

Human-in-the-loop for AI agents pauses autonomous workflows at risky steps and routes them to a human gate. Here's how …

Diagram of an LLM agent loading checkpoint snapshots from a thread before each reasoning step

MONA explainer 10 min May 8, 2026

Agent State Management: Threads, Checkpointers, Hard Limits

Agent state is not memory — it is plumbing that replays snapshots between steps. Mona explains threads, checkpointers, …

Graph of state snapshots linked by a checkpoint thread across reasoning turns inside an agent runtime

MONA explainer 10 min May 8, 2026

Agent State Management: How Checkpointing Persists Memory Across Turns

Agent state management decides whether your agent remembers. See how LangGraph checkpointers, threads, and reducers …

Sequence of tool calls forming an agent trajectory graded against a reference path

MONA explainer 10 min May 8, 2026

Agent Evaluation: How Trajectory Analysis Measures AI Agents

Agent evaluation grades the path, not just the final answer. Learn how trajectory analysis exposes silent reasoning …

Layered diagram of agent evaluation showing outcome judgment, trajectory analysis, and cost-per-task observability stacked over a benchmark surface.

MONA explainer 11 min May 8, 2026

Agent Evaluation Prerequisites: LLM-as-Judge to Cost-Per-Task

Agent evaluation needs three signals: outcome, trajectory, cost. Learn why LLM-as-judge has known biases and where major …

Layered diagram of an agent loop showing thought, action, and observation stages with branching planning paths

MONA explainer 14 min May 7, 2026

From Chain-of-Thought to Tool Use: Prerequisites and Technical Limits of Agent Planning

Agent planning rests on three primitives — chain-of-thought, tool use, and the ReAct loop. Learn the prerequisites and …

Diagram of three multi-agent architectures: supervisor, debate, and swarm patterns coordinating AI agents

MONA explainer 12 min May 7, 2026

Multi-Agent Systems: Supervisor, Debate, and Swarm Patterns

Multi-agent systems coordinate specialized AI agents through supervisor, debate, or swarm patterns. Here is how each …

Layered diagram of multi-agent prerequisites: tool use as the atomic primitive, the ReAct loop, and short- and long-term memory

MONA explainer 13 min May 7, 2026

Multi-Agent Systems: Prerequisites and Hard Technical Limits

Before multi-agent systems, master tool use, the ReAct loop, and memory. Then face the limits: context blow-up, error …

Layered diagram of an LLM agent memory architecture with vector store, temporal graph, and self-editing memory blocks

MONA explainer 12 min May 7, 2026

Agent Memory Systems: How LLMs Get Persistent Recall Across Sessions

Agent memory systems give LLMs persistent recall across sessions. Inside the architectures: temporal graphs, …

Three architectural diagrams contrasting graph state, actor message passing, and crew task handoff patterns in agent orchestration

MONA explainer 11 min May 7, 2026

Graph vs Conversation vs Crew: LangGraph, AutoGen, CrewAI Patterns

LangGraph, AutoGen, and CrewAI commit to three different theories of how AI agents coordinate. The pattern you pick …

Diagram of an AI agent loop showing reasoning traces, tool actions, and a self-reflection memory feeding the next step

MONA explainer 10 min May 7, 2026

Agent Planning and Reasoning: ReAct, Plan-and-Execute, Reflexion

Agent planning is not human cognition — it is token generation conditioned on observations. How ReAct, Plan-and-Execute, …

Tiered memory layers compressing into a temporal knowledge graph for AI agents

MONA explainer 10 min May 7, 2026

Agent Memory Architectures: Prerequisites and Hard Limits

Agent memory isn't a bigger context window. Learn the prerequisites for designing agent memory systems and the hard …

LLM agent loop wiring reasoning to tools, memory, and a control plane across three orchestration frameworks.

MONA explainer 12 min May 7, 2026

Agent Frameworks: How LangGraph, CrewAI, and AutoGen Orchestrate LLMs

Agent frameworks orchestrate LLM calls, tools, and memory — but each one bets on a different abstraction. Learn what …

Salient object segmentation pipeline isolating a foreground subject from a busy background using alpha matting and per-pixel opacity

MONA explainer 10 min Apr 27, 2026

What Is AI Background Removal? How Salient Object Segmentation Works

AI background removal is not one model — it's salient object detection plus alpha matting. See how U2-Net, BiRefNet, and …