AI Glossary

A living reference of 365 artificial intelligence and machine learning terms — from foundational concepts like gradient descent and backpropagation to emerging topics such as agentic RAG, mixture of experts, and multi-agent orchestration. Written for technical professionals who want precision without the PhD prerequisite.

A

Ablation Study

A controlled experiment that removes or disables individual components of a machine learning model — such as layers, features, or training steps — to measure how much each part contributes to overall performance.

Activation Function

A mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to model complex relationships. Without activation functions, stacking layers would only produce linear transformations, making deep learning impossible.

Adam Optimizer

Adam (Adaptive Moment Estimation) is an optimization algorithm that combines momentum tracking and per-parameter learning rate scaling to train neural networks efficiently. It adapts step sizes automatically, making it the default optimizer for most deep learning tasks including large language model pre-training.

Adjacency Matrix

A square matrix where each row and column represents a graph node, and each cell indicates whether an edge connects two nodes. It encodes graph structure so algorithms — especially graph neural networks — know which nodes can exchange information during message passing.

Adobe Firefly

Adobe Firefly is Adobe's family of generative AI models and the branded generative surface embedded across Creative Cloud (Photoshop Generative Fill, Illustrator Generative Shape, Express), trained on Adobe Stock and licensed content for commercially-safe image and video generation.

Adversarial Attack

A deliberate manipulation of inputs, training data, or model parameters to cause an AI system to produce incorrect or unintended outputs, forming the core threat model that security frameworks like OWASP LLM Top 10 and MITRE ATLAS are designed to classify and defend against.

Agent Debate

Agent debate is a multi-agent coordination technique where multiple LLM agents independently propose answers, critique each other's reasoning across rounds, and converge on a final response that is typically more accurate than any single agent's output.

Agent Evaluation And Testing

Agent evaluation and testing measures whether AI agents — systems that plan, call tools, and produce multi-step outputs — perform correctly. It scores both outcome (did the task succeed?) and trajectory (were the right tool calls made in the right order?).

Agent Frameworks Comparison

An agent frameworks comparison evaluates software libraries that orchestrate LLM-based agents across reasoning, tool use, and memory. The mainstream contenders are LangGraph (graph-based control), CrewAI (role-based crews), and AutoGen and its successor Microsoft Agent Framework (event-driven async messaging).

Agent Guardrails

Programmable controls that limit what an autonomous AI agent can perceive, say, and do — applied across input, prompt, retrieval, tool-call, and output layers to prevent excessive agency, unsafe actions, and unauthorized resource access.

Agent Memory Systems

An agent memory system is an external storage layer that lets a large language model retain information across sessions — such as user preferences, past conversations, and project facts — by writing data to a database and retrieving relevant entries before each new prompt.

Agent Orchestration

Agent orchestration is the software layer that coordinates multiple AI agents, defining the sequence in which they run, how they share state, and how their outputs combine into a single workflow result.

Agent Planning And Reasoning

Agent planning and reasoning describes how an AI agent decomposes a goal into ordered steps, picks tools or actions for each step, and revises its plan based on intermediate results — the cognitive engine behind autonomous task execution.

Agent State Management

Agent state management is how an AI agent tracks and persists its conversation history, tool results, plans, and intermediate reasoning across turns so it can pause, resume, or hand off work without losing context.

Agentic RAG

Agentic RAG is an architecture where an autonomous LLM agent drives the retrieval process — deciding which sources to query, reflecting on intermediate results, and looping until it has enough evidence — instead of running a single retrieve-then-generate step over a static index.

AI Background Removal

AI background removal is a computer vision technique that uses deep learning models — typically trained on salient object segmentation or image matting — to automatically detect a foreground subject and isolate it from the surrounding background, producing a transparent or replaceable backdrop.

AI Fairness 360

An open-source Python toolkit providing fairness metrics and bias mitigation algorithms that help teams detect, measure, and reduce discrimination in machine learning models across different stages of the ML pipeline.

AI Image Editing

AI image editing is the conditional modification of an existing image using a generative model, steered by a mask, a text instruction, or a reference image. It covers inpainting, outpainting, and instruction-based edits, now typically unified in a single diffusion or flow-matching architecture.

Alpha Matting

Alpha matting estimates a per-pixel opacity value (alpha) that determines how much of each pixel belongs to the foreground versus the background. The result is a grayscale matte used to composite the subject onto new backgrounds while preserving soft edges like hair and translucent regions.

Answer Relevancy

Answer Relevancy is a generation-side RAG evaluation metric that measures how directly a system's response addresses the user's original question. Scores fall between 0 and 1; the metric does not check factual correctness, only whether the answer stays on-topic and avoids irrelevant padding.

Attention Mechanism

A deep learning technique that lets models dynamically weigh which parts of an input matter most for each output, enabling context-aware predictions instead of treating all input tokens equally.

Autoregressive Generation

A sequential text generation method where a language model produces one token at a time, conditioning each new prediction on all previously generated tokens to build coherent output.

Awq

A post-training quantization method developed at MIT that compresses large language model weights to 4-bit precision by identifying and protecting the most important weight channels through activation analysis, enabling high-quality inference on consumer GPUs without retraining.

B

Backpropagation

The core training algorithm for neural networks that computes how much each connection weight contributed to prediction errors by applying the chain rule from output back to input, enabling the network to learn from mistakes and improve its predictions iteratively.

Backpropagation Through Time

Backpropagation Through Time (BPTT) is the standard algorithm for training recurrent neural networks. It unfolds the network across all time steps in a sequence, then applies standard backpropagation to compute gradients, enabling the network to learn temporal dependencies in sequential data.

Bart

BART is a sequence-to-sequence model by Meta AI built on the encoder-decoder architecture, pre-trained by corrupting text and learning to reconstruct it, combining bidirectional encoding with autoregressive decoding to excel at summarization and text generation.

Baseline Model

A simple reference model that establishes the minimum acceptable performance level in machine learning experiments. Baseline models serve as the control condition in ablation studies and model comparisons, revealing whether added complexity delivers genuine improvement over straightforward approaches like predicting the most common class or the average value.

Batch Normalization

A training technique that normalizes inputs to each neural network layer using mini-batch statistics, stabilizing the optimization process and enabling faster convergence. Introduced in 2015, it became the standard normalization method for convolutional neural networks and enabled training of much deeper architectures.

Beam Search

A heuristic decoding algorithm that maintains multiple candidate sequences (beams) during text generation, expanding and scoring them at each step to find a high-probability output sequence without exhaustively searching every possibility.

BEIR Benchmark

BEIR (Benchmarking Information Retrieval) is a heterogeneous zero-shot evaluation benchmark of 18 publicly available datasets across 9 task types. Models train on MS MARCO and are tested out-of-domain on BEIR using nDCG@10, measuring how well retrieval methods generalize beyond their training distribution.

Benchmark Contamination

Benchmark contamination happens when an AI model's training data accidentally includes questions or answers from evaluation benchmarks, inflating test scores and making the model appear more capable than it actually is.

BGE Reranker

BGE Reranker is an open-source family of cross-encoder models from BAAI that re-scores candidate documents against a search query, sharpening retrieval results in RAG pipelines without sending data to a commercial reranking API.

Bi-Encoder

A bi-encoder is a transformer architecture that encodes a query and a document independently into fixed-size vectors, enabling fast similarity search via precomputed embeddings. It is the standard first-stage retriever in vector search and RAG pipelines.

Bias And Fairness Metrics

Quantitative measures that evaluate whether a machine learning model treats demographic groups equitably, detecting discriminatory patterns in predictions by comparing outcomes across protected attributes like race, gender, or age.

Binary Classification

A supervised machine learning task that assigns each data point to one of exactly two mutually exclusive classes, such as spam or not spam, forming the foundation of the 2x2 confusion matrix used to evaluate classifier performance.

Bitsandbytes

A Python library that reduces large language model memory requirements through k-bit quantization, enabling both inference and fine-tuning on consumer-grade GPUs by compressing model weights to 8-bit or 4-bit precision.

BLEU

BLEU is an automated metric that scores machine-generated text by counting how many word sequences (n-grams) match a human-written reference, producing a value from 0 to 1 where higher means closer to human output.

Bradley Terry Model

A probabilistic framework that converts pairwise preference comparisons into numerical strength scores. Originally from statistics (1952), it now serves as the standard mathematical loss function for training reward models in reinforcement learning from human feedback.

BRIA RMBG

BRIA RMBG is Bria AI's background removal model family. The current version, RMBG-2.0, uses dichotomous image segmentation trained exclusively on licensed, manually labeled images. Open weights ship under CC BY-NC 4.0; commercial use requires a Bria license or hosted API.

Byterover

Byterover is an agent-native memory system for AI agents that stores knowledge as a hierarchical tree of human-readable markdown files. The same LLM that handles reasoning curates and retrieves entries, with most lookups resolving in milliseconds without vector databases or graph stores.

C

Calibration

Calibration measures the alignment between a model's expressed confidence and its actual accuracy. A well-calibrated model that reports 80% confidence should be correct 80% of the time, providing a reliable signal for when to trust or question its outputs.

Catastrophic Forgetting

The tendency of neural networks to lose previously acquired knowledge when trained sequentially on new data. In LLM fine-tuning, a model specialized on one task may lose its general abilities, making the choice between full fine-tuning and parameter-efficient methods critical.

Causal Masking

Causal masking is an attention restriction in decoder-only transformer models that prevents each token from attending to future tokens, enforcing the left-to-right generation order that makes autoregressive language models produce text one token at a time.

Chain-of-Thought

A prompting technique that instructs large language models to produce explicit, step-by-step reasoning before reaching a final answer, making the model's logic visible and improving accuracy on tasks requiring multi-step thinking.

Chatbot Arena

A human-preference evaluation platform where anonymous users compare AI model responses side by side, generating crowdsourced Elo ratings that rank large language models by real-world conversational quality rather than performance on static benchmark datasets.

Chinchilla Scaling

A set of scaling laws showing that for a fixed compute budget, large language models perform best when model size and training data are scaled in roughly equal proportion, rather than prioritizing one over the other.

Chunking Strategy

The rule that splits source documents into smaller passages before they are embedded and stored in a vector index for retrieval-augmented generation. It defines chunk size, overlap, and split boundary, all of which directly affect retrieval quality.

Class Imbalance

A condition in classification tasks where one class contains significantly more examples than others, causing models to favor the majority class and making standard accuracy misleading as a performance metric.

Class Token

A single learnable embedding prepended to a transformer's input sequence that aggregates information from all other tokens through self-attention, producing one summary vector used by the classification head.

Classification Threshold

The probability cutoff value that converts a machine learning classifier's continuous confidence score into a binary class prediction, determining the tradeoff between false positives and false negatives.

Classifier-Free Guidance

Classifier-Free Guidance is a sampling technique for diffusion models that steers generation toward a prompt by blending predictions from a single network run with and without the conditioning signal, removing the need for a separate trained classifier.

Claude Agent SDK

The Claude Agent SDK is Anthropic's official framework for building agents powered by Claude. It provides a Python and TypeScript runtime with built-in tools for reading files, writing code, running shell commands, and connecting custom tools, originally released as the Claude Code SDK.

CLIP Model

CLIP (Contrastive Language-Image Pre-training) is a vision-language model from OpenAI that jointly trains an image encoder and a text encoder so matching image-caption pairs land close in a shared embedding space, enabling zero-shot image classification.

Cohere Rerank

A managed cross-encoder reranking model from Cohere that scores how relevant each candidate document is to a query and re-sorts the list. Used as a second-stage refinement after vector or hybrid retrieval to sharpen the context passed to an LLM in RAG systems.

Colpali

A vision-language retrieval model that searches documents by processing page images directly through a vision encoder, generating multi-vector patch embeddings and using late interaction scoring to rank pages without OCR or text extraction.

ComfyUI

ComfyUI is an open-source, node-based workflow editor for running diffusion image and video models. Users wire nodes — model loaders, samplers, ControlNets, upscalers, VAE encoders — on a canvas to build custom pipelines, making it the standard tool for advanced workflows like tiled diffusion upscaling.

Community Detection

Community detection is a graph algorithm that identifies clusters of densely connected nodes inside a network or knowledge graph. In GraphRAG systems, it groups related entities and concepts so the model can summarize each cluster and answer questions that span multiple topics.

Compute Optimal Training

A training methodology that uses scaling law predictions to find the ideal balance between model size and training data volume for a fixed compute budget, maximizing performance rather than simply increasing parameter count.

Confusion Matrix

A table summarizing a classification model's predictions against actual outcomes, divided into true positives, true negatives, false positives, and false negatives. These four counts form the basis for precision, recall, accuracy, and the error rates central to algorithmic fairness evaluation.

Content Moderation

The process of screening user-generated content against platform policies using AI classifiers, human reviewers, or both to detect and remove harmful material before it reaches audiences.

Context Precision

Context Precision is a retrieval-side RAG evaluation metric that scores whether relevant chunks appear higher than irrelevant ones in the retrieved context, calculated as a weighted mean of Precision@k across the ranked top K results.

Context Recall

Context Recall is a retrieval-side RAG evaluation metric that measures how completely the retrieved documents cover the information required to produce the ideal answer, scored against a human-labeled ground truth.

Context Vector

The single fixed-length vector an encoder network produces after processing an entire input sequence, compressing all source information into one representation that the decoder uses to generate output. Its limited capacity motivated the invention of attention mechanisms.

Context Window

The maximum number of tokens a language model can process in a single interaction, covering both the input prompt and the generated output combined.

Contextual Retrieval

A retrieval-augmented generation technique where each document chunk is prefixed with a short, model-generated context summary before embedding and indexing, so retrieved passages remain meaningful and unambiguous when surfaced to the language model in isolation.

Continuous Batching

A request scheduling technique for LLM inference that inserts new requests into a running batch at every forward pass, replacing static batching to maximize GPU throughput and reduce wait times.

Contrastive Learning

A self-supervised machine learning technique that trains models to produce meaningful embeddings by maximizing similarity between related (positive) pairs while minimizing similarity between unrelated (negative) pairs, forming the core training objective behind Sentence Transformers and modern sentence-level embedding models.

Convolutional Neural Network

A neural network architecture that slides small learnable filters across input data to automatically detect spatial patterns such as edges and textures, making it the standard approach for image recognition and computer vision tasks.

Cosine Similarity

A mathematical metric that computes the cosine of the angle between two vectors, producing a score from −1 (opposite) to +1 (identical direction), widely used to measure semantic closeness between embeddings.

Counterfactual Fairness

A causal fairness criterion requiring that an AI model's prediction for any individual remains unchanged in a hypothetical scenario where only their protected attribute, such as race or gender, is altered — grounded in structural causal models rather than statistical group comparisons.

CrewAI

CrewAI is an open-source Python framework for orchestrating role-playing AI agents in multi-agent systems. It defines four primitives — Agents, Tasks, Tools, and Crew — and runs them through a sequential or hierarchical process to coordinate work without a LangChain dependency.

Cross Attention

An attention mechanism where queries originate from one sequence and keys and values come from a different sequence, enabling a model to focus on relevant information across two distinct inputs like encoder and decoder representations.

Cross Entropy Loss

A loss function that measures how far a neural network's predicted probability distribution diverges from the correct answer, producing steep gradients that drive effective weight updates during backpropagation — especially punishing confident wrong predictions to accelerate training convergence.

Cross-Encoder

A cross-encoder is a transformer that processes a query and a candidate document jointly through a single network and outputs a relevance score (typically 0–1), capturing fine-grained interactions between every query and document token. It is the standard architecture for reranking shortlisted results.

Cypher Query Language

Cypher is a declarative pattern-matching language for property graphs. Created at Neo4j in 2011 and opened via openCypher in 2015, it became the primary input dialect of the ISO/IEC 39075:2024 GQL standard. It expresses graph traversals as ASCII-art patterns rather than imperative code.

D

Data Deduplication

A preprocessing technique that identifies and removes duplicate or near-duplicate documents from training datasets before model training, reducing wasted compute, preventing memorization of repeated text, and improving a model's ability to generalize.

DDIM

DDIM is a sampling scheduler for diffusion models that replaces DDPM's noisy reverse process with a deterministic, non-Markovian one, producing images in 20–50 steps instead of 1000. It shares DDPM's training objective and enables latent inversion for image editing.

Decoder Only Architecture

A neural network design based on the transformer decoder block that generates text autoregressively, predicting one token at a time by attending only to previous tokens in the sequence without a separate encoder component.

Deep Graph Library

Deep Graph Library (DGL) is an open-source Python framework for building and training graph neural networks across multiple deep learning backends, offering optimized message-passing primitives and built-in model implementations for graph-structured data.

Deepeval

An open-source Python framework for unit testing LLM applications with automated evaluation metrics including faithfulness, hallucination detection, and answer relevancy, built on a Pytest-like testing workflow.

Deepspeed

An open-source library from Microsoft that distributes model training across multiple GPUs using its ZeRO optimizer, reducing memory requirements so teams can train models too large for any single device.

Demographic Parity

A fairness metric requiring that a machine learning model's positive prediction rate is equal across all demographic groups, regardless of whether those groups differ in actual outcomes.

Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models are generative models that gradually corrupt training data with Gaussian noise, then learn a neural network to reverse that process step by step, turning pure noise into realistic images, audio, or video.

Dense Retrieval

A neural search method that encodes queries and documents into vector embeddings, then finds relevant results by measuring semantic similarity rather than matching exact keywords.

Diffusion Models

A diffusion model is a generative AI system that creates images, video, or audio by iteratively reversing a gradual noising process. Starting from random noise, the model predicts and removes noise step by step, producing structured outputs conditioned on text prompts or other inputs.

Diffusion Transformer

A diffusion model whose denoising network is a Transformer acting on small patches of a compressed latent image, replacing the U-Net used in earlier diffusion architectures. Timestep and conditioning are injected through adaptive layer-norm blocks, and the backbone scales predictably with compute.

Dimensionality Reduction

A set of techniques that compress high-dimensional data into fewer dimensions while preserving meaningful patterns, making storage cheaper, computation faster, and visualization possible.

DINOv2

DINOv2 is Meta's self-supervised Vision Transformer family, released in 2023 and trained without labels, which produces reusable visual features used as backbones for downstream tasks such as classification, semantic segmentation, depth estimation, and instance retrieval.

DiskANN

Microsoft's open-source library for approximate nearest neighbor search on billion-scale datasets using a single machine with SSD storage, combining a Vamana graph index with product quantization to keep costs low while maintaining high recall.

Disparate Impact

A legal and algorithmic fairness concept where a neutral-seeming policy or model disproportionately harms a protected group, measured by the four-fifths rule requiring each group's selection rate to be at least 80% of the highest group's rate.

Document Parsing And Extraction

Document parsing and extraction is the process of converting unstructured documents — PDFs, scans, images, and office files — into structured, machine-readable formats like Markdown, JSON, or HTML that preserve layout, tables, and reading order so RAG pipelines, agents, and knowledge graphs can consume them.

Dot Product

A mathematical operation that multiplies corresponding components of two vectors and sums the results into a single number, measuring how similar two vectors are in both direction and magnitude.

DPO

Direct Preference Optimization (DPO) is an alignment technique that fine-tunes language models directly on human preference pairs without training a separate reward model, replacing the reinforcement learning step in RLHF with a simple classification loss.

E

ELO Rating

A numerical scoring system that ranks competitors by relative skill through head-to-head comparisons, originally developed for chess and now widely used to evaluate and compare AI language models on platforms like Arena.

ELSER

ELSER (Elastic Learned Sparse EncodeR) is a proprietary retrieval model from Elastic that produces sparse term-weight vectors for English-language semantic search inside Elasticsearch, working out-of-the-box without fine-tuning and accessible through the standard inference API.

Embedding

A mathematical representation that converts discrete data like words or tokens into dense numerical vectors in a continuous space, where similar items are positioned closer together. Embeddings serve as the input layer for transformer models and most modern neural networks.

Emergent Abilities

Capabilities that appear in large language models only beyond a certain training scale, absent in smaller models but present in larger ones, raising questions about predictability when applying scaling laws to training decisions.

Encoder Decoder

A neural network design where an encoder compresses input into a fixed representation and a decoder generates output from that representation, forming the original transformer blueprint for tasks like translation and summarization.

Encoder Decoder Architecture

A two-part neural network design that processes sequences by first encoding input into a compressed internal representation, then decoding that representation into the desired output sequence, powering tasks like translation and summarization.

Entity Extraction

A natural language processing technique that scans unstructured text to identify and label named items — such as people, organizations, locations, products, dates, and domain concepts — converting raw prose into structured data that downstream systems can query, link, or feed into knowledge graphs.

Episodic Memory

Episodic memory in AI agents is a dedicated store of time-stamped past events — interactions, tool calls, decisions, and outcomes — distinct from semantic facts and procedural skills, so the agent can recall what happened, when, and in what context.

Equalized Odds

A group fairness criterion requiring that a classifier's true positive rate and false positive rate are equal across demographic groups. Introduced by Hardt, Price, and Srebro in 2016, it ensures prediction accuracy does not depend on protected attributes like race or gender.

ESRGAN

ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) is a 2018 GAN-based image upscaling model that produces sharper, more realistic textures than earlier super-resolution methods. It pioneered architecture and loss-function changes still used in modern upscalers like Real-ESRGAN, the GAN baseline most production image pipelines rely on today.

Euclidean Distance

The straight-line distance between two points in multi-dimensional space, calculated as the square root of the sum of squared differences between coordinates. In vector search, it quantifies how far apart two embeddings are, with zero meaning identical.

Evaluation Harness

A software framework that automates running language models against standardized benchmarks, handling task loading, prompt formatting, model inference, and metric calculation to produce comparable scores across different models.

Evidence Lower Bound

A mathematical lower bound on the log-likelihood of observed data, used as the training objective in variational autoencoders. ELBO combines reconstruction loss measuring output fidelity with KL divergence penalizing deviation from a chosen prior distribution.

Expert Parallelism

A distributed training and inference strategy for Mixture-of-Experts models where individual experts reside on separate GPUs. A gating network decides which expert handles each token, and all-to-all communication moves data between devices.

F

Factual Consistency

The measure of whether AI-generated text aligns with verifiable real-world facts. Distinguished from faithfulness (alignment with input context), factual consistency evaluates whether a model's claims about the world are true, making it a core metric in hallucination detection.

Fairlearn

An open-source Python toolkit from Microsoft Research that helps teams assess and improve fairness in machine learning models by measuring performance disparities across demographic groups and applying mitigation algorithms to reduce observed bias.

Faiss

An open-source C++ and Python library by Meta for efficient similarity search and clustering of dense vectors. Faiss implements index types including IVF, HNSW, and product quantization, enabling nearest-neighbor search across billion-scale datasets with CPU and GPU support.

Faithfulness

Faithfulness measures whether a RAG system's generated answer is factually consistent with the retrieved context. It is calculated as the ratio of claims in the response that are supported by source documents to total claims, producing a score between 0 and 1.

Falcon H1

Falcon-H1 is a family of open-weight hybrid language models from Technology Innovation Institute (TII) that runs Transformer attention heads and Mamba-2 state space model heads in parallel inside each mixer block, released in sizes from 0.5B to 34B parameters.

Feature Map

The 2D output grid produced when a convolutional filter scans across an input, where each value represents the filter's activation strength at that spatial position, revealing where specific visual patterns were detected.

Feedforward Network

A neural network where data moves in one direction from input to output with no loops or cycles, used as a core processing sub-layer inside each Transformer block to transform learned representations.

Few-Shot Learning

A prompting technique where a small number of input-output examples are included in the prompt to guide an LLM's response, allowing the model to recognize task patterns without any retraining or weight updates.

Fine Tuning

Fine-tuning adapts a pre-trained machine learning model to a specific task or domain by continuing training on a smaller, targeted dataset, adjusting the model's weights so it performs better on that particular use case.

Flash Attention

An algorithm that computes exact attention scores without storing the full attention matrix in GPU memory, reducing memory use from quadratic to linear while maintaining mathematical equivalence to standard attention.

Flow Matching

Flow Matching is a simulation-free training method where a neural network learns a velocity field that transports random noise into realistic data along a chosen probability path, generalising classical diffusion training.

Flux

FLUX is a family of image generation and editing models from Black Forest Labs built on rectified-flow architecture in latent space. The FLUX.1 Kontext variant accepts both text and image inputs to perform single-pass in-context image edits.

Four Fifths Rule

A threshold from US employment law stating that a selection rate for any group below 80% of the highest group's rate signals potential adverse impact. Originally designed for hiring decisions, the rule now applies to AI-powered screening tools that automate candidate evaluation.

Fp8

An 8-bit floating-point format used to store and compute AI model weights and activations at half the memory cost of FP16, with two encodings — E4M3 for inference and E5M2 for training — supported natively on modern GPU hardware.

G

Gating Mechanism

A learned routing layer inside a Mixture of Experts model that scores every input token and sends it to only a few specialist sub-networks, keeping the rest idle so the model stays fast despite its large total size.

Generative Adversarial Network

A machine learning framework consisting of two competing neural networks: a generator that creates synthetic data from random latent vectors, and a discriminator that distinguishes real from generated samples. Through adversarial training, the generator learns to produce increasingly realistic outputs.

GGUF

GGUF (Georgi Gerganov Universal Format) is a single-file model format for local large language model inference, packaging quantized weights and metadata into one portable file. Used by llama.cpp, Ollama, and LM Studio, it enables efficient CPU and hybrid CPU/GPU inference on consumer hardware.

Glitch Tokens

Anomalous tokens in a language model's vocabulary that produce erratic outputs — gibberish, hallucinations, or refusals — because the tokenizer included them during vocabulary construction but the model's training data contained too few examples for the model to learn stable representations.

Google ADK

An open-source framework from Google for building, evaluating, and deploying AI agents and multi-agent systems with code-first control. Available in Python and TypeScript, it pairs with Google Cloud's Gemini Enterprise Agent Platform for managed production deployment.

GPT Image

GPT Image is OpenAI's natively multimodal image model family that handles text-to-image generation, instruction-based editing, and mask-based inpainting through a unified Images API. It powers image creation in ChatGPT and is available to third-party apps through the OpenAI platform.

GPTQ

A post-training quantization method that compresses large language model weights from 16-bit to 3-4 bits using Hessian-based optimization, enabling models to run on consumer GPUs with minimal accuracy loss.

GPU Utilization

A metric measuring the percentage of a GPU's compute capacity actively performing work, used to evaluate how efficiently AI inference servers process requests and allocate hardware resources.

Gradient Descent

An optimization algorithm that trains neural networks by iteratively computing the gradient of a loss function and adjusting model weights in the direction that reduces prediction errors, enabling the model to learn from data.

Graph Attention Network

A neural network layer for graph-structured data that applies attention mechanisms to weigh the importance of each neighboring node's features during aggregation, allowing the model to focus on the most relevant connections rather than treating all neighbors equally.

Graph Convolution

A mathematical operation that extends convolution from regular grids to irregular graph structures by aggregating features from neighboring nodes, enabling graph neural networks to learn meaningful node representations through local information exchange.

Graph Neural Network

A deep learning architecture that processes graph-structured data by propagating information between connected nodes through message passing, enabling pattern recognition in relational data where standard neural networks fall short.

Graphsage

GraphSAGE is an inductive graph neural network algorithm that learns to generate node embeddings by sampling and aggregating features from a node's local neighborhood, enabling predictions on previously unseen nodes without retraining the entire model.

Greedy Decoding

A text generation strategy where a language model always selects the single most probable next token at each step, producing deterministic output without any randomness.

Groq

An AI inference chip company that designed the Language Processing Unit, a custom silicon accelerator built for low-latency large language model inference using deterministic, compiler-driven execution instead of traditional GPU parallelism.

Grounding

Grounding connects AI-generated text to verifiable external knowledge sources, reducing hallucinations by anchoring model responses in real-world facts rather than relying solely on patterns learned during training.

Grouped Query Attention

An attention mechanism variant that groups multiple query heads to share key-value heads, balancing the output quality of multi-head attention with the inference speed of multi-query attention. Adopted by most frontier language models.

GRPO

A reinforcement learning alignment method that estimates policy advantages by comparing multiple outputs within a group, eliminating the need for a separate critic model required by PPO-based RLHF.

Guardrails

Runtime safety mechanisms that validate, filter, and enforce policies on AI system inputs and outputs, preventing failures like hallucinations, prompt injection, data leakage, and toxic content before they reach end users.

H

Hallucination

When a language model produces text that appears coherent and authoritative but contains factually incorrect, logically inconsistent, or entirely fabricated information, often with no indication that anything is wrong.

Harmbench

A standardized evaluation framework created by the Center for AI Safety that benchmarks AI model resistance to automated red-teaming attacks using hundreds of curated harmful behaviors across multiple semantic categories, enabling reproducible comparison of attack methods and model safety.

Harmonic Mean

A type of average calculated as the reciprocal of the arithmetic mean of reciprocals. In machine learning, it forms the mathematical basis of the F1 score, ensuring neither precision nor recall can mask the other's weakness.

Helm Benchmark

An open-source evaluation framework from Stanford that tests language models across multiple dimensions — accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency — using standardized scenarios to produce multi-dimensional scorecards instead of single rankings.

Hidden State

The internal vector within a recurrent neural network that stores a compressed summary of all previously processed inputs in a sequence. Updated at each time step, the hidden state allows the network to retain context and make predictions based on sequential patterns.

Hugging Face

An open-source AI platform that hosts pre-trained models, datasets, and deployment tools, serving as the central repository where researchers and practitioners share, discover, and run machine learning models — particularly transformers.

Human In The Loop For Agents

Human-in-the-Loop for agents is a design pattern where an autonomous AI agent pauses at defined checkpoints — usually before high-risk tool calls — and waits for a person to approve, edit, reject, or redirect the next step before continuing.

HumanEval

A benchmark of hand-written Python programming problems created by OpenAI that measures AI code generation through automated unit tests, serving as one of the core metrics used to evaluate large language model capabilities.

Hunyuan Image

Hunyuan Image is Tencent's open-source text-to-image and image-editing model family. It uses a Mixture-of-Experts autoregressive architecture that jointly handles visual understanding and generation, placing it in the same architectural camp as closed multimodal models rather than diffusion-based systems like FLUX.

Hybrid Architecture

A neural network design that mixes Transformer attention layers with State Space Model layers (like Mamba) inside a single model, so attention handles precise recall while SSM layers handle long sequences at linear cost.

Hybrid Search

A retrieval method that runs keyword search (typically BM25) and dense vector search in parallel, then fuses the ranked results — usually with Reciprocal Rank Fusion — to combine exact-term precision with semantic understanding.

Hyperparameter Tuning

The systematic process of finding the best external configuration values for a machine learning model. Unlike parameters learned during training, hyperparameters are set before training begins and directly influence model accuracy, training speed, and generalization ability.

I

Image Matting

Image matting is the computer vision task of estimating, for every pixel, a foreground color, a background color, and a continuous opacity (alpha) so that semi-transparent regions like hair, fur, smoke, and glass composite cleanly onto a new background.

Image Upscaling

Image upscaling is the process of increasing an image's pixel resolution, either by deterministic interpolation that smooths existing pixels or by AI super-resolution that uses learned priors — CNNs, GANs, or diffusion models — to reconstruct plausible detail beyond the source.

Impossibility Theorem

A mathematical proof that three group-fairness criteria — calibration, balance for the positive class, and balance for the negative class — cannot all be satisfied simultaneously unless the predictor is perfect or base rates are equal across groups.

Inductive Bias

Inductive bias is the set of assumptions a machine learning model relies on to generalize from training data to unseen inputs. These assumptions live in the architecture, loss function, or training procedure, and they determine which patterns the model prefers to learn.

Inference

The process of running a trained machine learning model on new input data to produce a prediction or output. For large language models, inference uses autoregressive decoding — generating text one token at a time, with each token conditioned on all preceding tokens.

Inference Time Scaling

The practice of allocating additional computational resources during a model's response generation rather than during training, enabling the model to reason through problems more thoroughly and produce higher-quality outputs on complex tasks.

Inspect AI

An open-source Python framework created by the UK AI Security Institute for evaluating large language models, offering pre-built evaluations, prompt engineering tools, multi-turn dialog testing, and model-graded scoring to measure LLM performance on safety and capability benchmarks.

Inverted Index

An inverted index is a data structure that maps each term in a corpus to the list of documents containing it, with optional term frequencies and positions, enabling BM25 and keyword retrieval to find matching documents in milliseconds instead of scanning the full corpus.

IVF (Inverted File Index)

A partition-based vector indexing method that groups vectors into clusters using k-means, then searches only the nearest cluster partitions at query time, enabling fast approximate nearest-neighbor search at scale.

L

LangChain

LangChain is an open-source framework for building LLM applications—chains, agents, and RAG pipelines—by composing prompts, retrievers, models, and tools into reusable workflows, with the October 2025 release refocusing the core API around agent loops running on LangGraph.

LangGraph

LangGraph is a graph-based runtime from LangChain for building stateful AI agents and multi-agent systems. Each step is a node, control flow is an edge, and state, persistent memory, and human-in-the-loop checkpoints are built into the runtime.

Langmem

LangMem is a Python SDK from LangChain that adds long-term memory to AI agents by exposing memory operations as agent-callable tools and running a background manager that consolidates memories asynchronously, with native integration into LangGraph's long-term memory store.

Latent Diffusion

A generative AI technique that runs the image creation process inside a compressed mathematical space produced by a variational autoencoder, rather than working directly with pixels, cutting computation costs while preserving output quality.

Latent Space

A compressed mathematical representation where neural networks store learned patterns from training data. In GAN architecture, the generator samples from this space to create new outputs, making it the source of variation behind every generated image or data point.

Learning Rate

A hyperparameter that controls how much a model's weights change during each training step, directly determining whether fine-tuning converges smoothly or destroys pre-trained knowledge.

Letta

Letta is an open-source agent framework that gives large language models persistent memory, letting AI agents store user facts, recall past conversations, and maintain consistent state across sessions through structured memory blocks the agent itself reads and edits.

LightRAG

LightRAG is an open-source graph-based retrieval-augmented generation framework that uses dual-level retrieval over an LLM-built knowledge graph and supports incremental updates by merging new nodes and edges instead of rebuilding the entire graph and its community summaries.

Linear Attention

A family of attention mechanisms that approximates standard softmax attention while scaling linearly with sequence length instead of quadratically, making it practical to process very long contexts and enabling hybrid architectures that combine it with state-space models.

Listwise Reranking

Listwise reranking reorders search results by evaluating the entire candidate list together in one pass, rather than scoring each query-document pair independently. It captures relationships between candidates and is used by LLM-based rerankers like RankGPT and Jina Reranker v3.

Llama Cpp

An open-source C/C++ inference engine created by Georgi Gerganov that runs large language models locally on consumer CPUs and GPUs through quantized GGUF files, removing the need for cloud-based GPU infrastructure or heavyweight Python dependencies.

Llama Guard

Llama Guard is Meta's open-weight safety classification model that screens both inputs to and outputs from large language models, flagging content across standardized hazard categories to prevent harmful or toxic AI responses.

LlamaIndex

LlamaIndex is an open-source framework for building data-backed and agentic LLM applications, providing abstractions like Document, Node, Index, Retriever, and Query Engine to connect language models with external knowledge sources for retrieval, search, and document agents.

LLM As Judge

An evaluation technique where a large language model is prompted to assess, score, or rank outputs produced by other AI systems, serving as an automated alternative to human reviewers.

Load Balancing Loss

An additional penalty term added during training of Mixture-of-Experts models that discourages the gating mechanism from routing most tokens to a small subset of experts, ensuring all experts receive enough tokens to learn effectively and preventing wasted model capacity.

Locality Sensitive Hashing

A family of randomized algorithms that map similar data points to the same hash buckets with high probability, enabling approximate nearest neighbor search in high-dimensional spaces without scanning every item — a key index structure in vector similarity search pipelines.

Locomo Benchmark

LoCoMo (Long Conversational Memory) is a 2024 benchmark from Snap Research and UNC that tests whether AI agents can recall and reason over very long, multi-session conversations through question answering, event summarization, and multimodal dialogue tasks.

Logits

Logits are the raw numerical scores a language model produces for every possible next token before those scores are converted into probabilities through the softmax function.

Long Context Modeling

Long-context modeling is the ability of a language model to process and reason over input sequences spanning tens of thousands to millions of tokens in a single forward pass, without losing coherence across the document.

Long Context Vs RAG

Long context versus RAG describes two approaches for giving language models access to large knowledge bases: long context loads documents directly into the prompt window, while RAG retrieves only the most relevant passages from an external vector store at inference time.

LongMemEval

LongMemEval is an open-source benchmark that tests long-term interactive memory in chat assistants and AI agents. It scores systems across five abilities — information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention — using 500 manually written questions over multi-session conversations.

LORA

A parameter-efficient fine-tuning method that freezes pre-trained model weights and injects small trainable low-rank matrices into transformer layers, reducing trainable parameters by orders of magnitude while preserving model quality.

LoRA for Image Generation

LoRA for image generation is a parameter-efficient fine-tuning method that freezes a diffusion model's weights and trains tiny low-rank matrices to add a new style, character, or subject. The result is a small file you can load alongside the base model at inference.

Loss Function

A mathematical formula that quantifies the difference between a model's predictions and the correct values, serving as the primary feedback signal during neural network training and the metric that reveals when scaling more compute or data stops improving performance.

M

Magnific

A commercial, cloud-based image upscaler from Freepik that treats super-resolution as a generative diffusion task. Instead of interpolating pixels, it uses Stable Diffusion–family models to invent plausible detail like skin pores, fabric weaves, and brick texture, guided by text prompts and Creativity, Resemblance, and HDR sliders.

Mamba Architecture

A neural network architecture based on selective state space models that processes sequences with linear time complexity, enabling efficient long-context modeling as an alternative to transformer attention.

Masked Autoencoder

A self-supervised pretraining recipe for Vision Transformers that hides most image patches at random and trains the model to reconstruct the missing pixels, learning visual features without any human labels.

Masked Language Modeling

A self-supervised pre-training technique where random tokens in a sentence are hidden behind a mask and the model learns to predict them using surrounding context from both directions, enabling deep bidirectional language understanding.

Matryoshka Embedding

An embedding training method where the first d dimensions of a full vector form a valid lower-dimensional representation. Named after Russian nesting dolls, it lets a single model produce embeddings at multiple sizes, trading vector length for storage and speed.

Matthews Correlation Coefficient

A classification quality metric that accounts for all four confusion matrix quadrants — true positives, true negatives, false positives, and false negatives — producing a balanced score from -1 (inverse prediction) through 0 (random) to +1 (perfect classification).

Mean Pooling

Mean pooling produces a single fixed-size vector from a transformer model's token-level outputs by averaging all token hidden states, creating sentence embeddings used for semantic similarity comparisons in search and retrieval systems.

Megatron-LM

NVIDIA's open-source framework that distributes the training of large language models across many GPUs using multiple parallelism strategies, enabling organizations to pre-train models with billions of parameters that no single machine could handle alone.

Mem0

Mem0 is an open-source memory layer that adds persistent recall to AI agents and chatbots. It extracts important facts from conversations, stores them in a structured memory store, and retrieves relevant context during future interactions to keep responses personalized over time.

MemGPT

MemGPT is an agent architecture that gives a large language model two memory tiers — a fast in-context working memory and a slower external store — with the model itself paging information between them through tool calls, much like an operating system.

Message Passing

A computational mechanism where nodes in a graph neural network exchange information with their neighbors, aggregate those signals, and update their own representations to learn structural patterns from connected data.

Metadata Filtering

Metadata filtering is a technique that attaches structured key-value attributes to vectors in a database, then applies predicates over those attributes during similarity search so returned results satisfy both semantic relevance and explicit conditions like date, author, or document type.

Microsoft Agent Framework

An open-source Microsoft SDK for building agentic and multi-agent applications in .NET and Python. It unifies the earlier AutoGen and Semantic Kernel projects, adds a graph-based workflow engine, and supports the Agent-to-Agent and Model Context Protocol standards natively.

Microsoft GraphRAG

Microsoft GraphRAG is an open-source modular graph-based RAG system that uses LLM extraction to build a knowledge graph from source documents, runs hierarchical Leiden community detection, and pre-computes community summaries to support both global query-focused summarization and local entity-anchored search.

Min P Sampling

A dynamic token filtering strategy that removes low-probability candidates during text generation by setting a threshold relative to the most likely token's probability, automatically tightening selection when the model is confident and loosening it when multiple tokens are equally plausible.

Mitre Atlas

A publicly available knowledge base maintained by MITRE that catalogs adversary tactics, techniques, and real-world case studies targeting AI and machine learning systems, modeled after the ATT&CK framework.

Mixed Precision Training

Mixed precision training combines lower-precision formats like FP16 or BF16 for most neural network computations with FP32 for numerically sensitive operations, reducing memory use and speeding up training while preserving model accuracy.

Mixedbread Rerank

Mixedbread Rerank is an open-weight reranker family from Mixedbread AI that reorders search results by relevance. The current generation, mxbai-rerank-v2, is built on Qwen-2.5 and trained with reinforcement learning to improve retrieval accuracy in RAG and search pipelines.

Mixture Of Experts

A neural network architecture that splits a model into multiple specialized sub-networks (experts) and uses a gating function to route each input token to only a few of them, reducing computation per token while preserving the knowledge capacity of a larger model.

MMLU Benchmark

A standardized benchmark of 15,908 multiple-choice questions across 57 academic subjects — from STEM to humanities — that tests how well a large language model handles factual, knowledge-intensive questions. Introduced at ICLR 2021 by Hendrycks et al., it remains a widely reported model comparison metric.

MMLU Pro

A harder evolution of the MMLU benchmark featuring over 12,000 graduate-level questions across 14 subjects with 10 answer choices, designed to reduce noise, minimize prompt sensitivity, and better differentiate reasoning ability among top-performing language models.

Mode Collapse

Mode collapse is a training failure in generative adversarial networks where the generator learns to produce only a small set of similar outputs instead of capturing the full variety present in the training data.

Model Evaluation

The systematic process of testing and scoring AI models against defined criteria — including accuracy, reasoning, safety, and user preference — to determine whether a model is fit for a specific task or ready for deployment.

MS MARCO

MS MARCO is a family of large-scale information retrieval datasets from Microsoft Research, built from anonymized real Bing search queries and web passages, used to train and evaluate ranking models for search and retrieval-augmented generation systems.

Multi Agent Systems

A multi-agent system (MAS) is an AI architecture where multiple LLM-powered agents — each with its own role, tools, and memory — coordinate through patterns like supervisor, debate, or swarm to handle tasks too complex or parallelizable for a single agent.

Multi Head Attention

A mechanism inside transformers that splits attention into multiple parallel heads, each learning different relationships in the input, then combines their outputs for richer representations.

Multi Vector Retrieval

An information retrieval approach where documents and queries are represented as sets of token-level vectors instead of single embeddings, enabling fine-grained similarity matching through late interaction scoring.

Multi-Hop Reasoning

Multi-hop reasoning answers questions that require chaining multiple pieces of evidence across entities or facts, where each hop traverses a relationship. It contrasts with single-shot retrieval and is the central motivating use case for GraphRAG and knowledge-graph-augmented language models.

Multimodal Architecture

A multimodal architecture is a neural-network design that takes in multiple data types — text, images, audio, video, code — and fuses them into a shared internal representation so a single model can reason across them without bouncing between specialized systems.

Multimodal RAG

A retrieval-augmented generation method that indexes and retrieves information across text, images, tables, and other modalities, then feeds the matched evidence to a multimodal language model so the final answer is grounded in the original visual or structured source rather than a paraphrase of it.

N

nDCG

nDCG (Normalized Discounted Cumulative Gain) is a graded-relevance ranking metric that scores how well a result list places the most relevant documents at the top, with a logarithmic position discount and normalization to the [0, 1] range.

Nemo Guardrails

NVIDIA NeMo Guardrails is an open-source Python toolkit that adds programmable input, output, dialog, retrieval, and execution rails to LLM applications, enforcing safety, topic control, PII protection, jailbreak prevention, and grounding through fact-checking and hallucination-detection rails — vendor-agnostic across LLM providers.

Nemotron-H

NVIDIA's family of hybrid Mamba-Transformer language models that replaces most self-attention layers with Mamba-2 state space layers, cutting inference cost at the same accuracy target while keeping a small number of attention layers for precise long-context recall.

Neo4j

Neo4j is a native graph database that stores data as nodes and relationships in the property graph model and is queried with Cypher. It is the most common backing store for GraphRAG-style systems that pair knowledge graphs with large language models.

Neural Network Basics for LLMs

A neural network is a layered computational model that learns patterns by adjusting connection weights during training, serving as the core architecture behind large language models that generate text.

Next Token Prediction

A training method where a language model learns to predict the next token in a sequence based on all preceding tokens, forming the core objective behind decoder-only transformer architectures like GPT and Claude.

Node Embedding

A learned low-dimensional vector representation of a graph node that captures both its features and structural position, enabling downstream tasks like node classification and link prediction in graph neural networks.

Noise Schedule

A noise schedule is the function that governs how variance is added to data across the forward diffusion steps and removed at inference. It determines signal-to-noise ratio at each timestep, shaping what a diffusion model can learn to denoise and how the sampler reverses the process.

O

Open LLM Leaderboard

A public Hugging Face-hosted ranking that evaluates open-source large language models on standardized benchmarks using EleutherAI's evaluation harness, providing transparent and reproducible score comparisons to help developers and researchers identify model strengths across reasoning, math, and instruction-following tasks.

OpenAI Agents SDK

OpenAI's official open-source framework for building agentic and multi-agent systems with five primitives — Agents, Handoffs, Guardrails, Sessions, Tracing — built on the Responses API and provider-agnostic via LiteLLM adapters. Production successor to the experimental Swarm project.

OpenCompass

An open-source LLM evaluation platform developed by Shanghai AI Laboratory that automates benchmarking across a wide range of standardized datasets, with distributed evaluation, report generation, and leaderboard publishing for reproducible model comparison.

OpenRLHF

An open-source framework built on Ray that simplifies reinforcement learning from human feedback (RLHF) training for large language models, supporting multiple alignment algorithms like PPO and GRPO with distributed computing and memory optimization.

Overfitting

Overfitting occurs when a machine learning model memorizes training data patterns too closely, performing well on familiar examples but failing to generalize to new, unseen data.

Oversmoothing

A phenomenon in graph neural networks where stacking too many message-passing layers causes node representations to converge, making nodes from different classes indistinguishable and degrading model performance.

Owasp LLM Top 10

A ranked list of the ten most critical security risks in large language model applications, maintained by the Open Worldwide Application Security Project (OWASP). It provides a shared vocabulary and prioritization framework for teams securing AI-powered systems against threats like prompt injection and data poisoning.

P

Paged Attention

A memory management technique for large language model inference that partitions key-value caches into fixed-size blocks, eliminating wasted GPU memory and allowing more concurrent requests to be served.

Parameter Efficient Fine Tuning

A family of techniques that adapt pre-trained language models to specific tasks by updating only a small fraction of their parameters, achieving comparable results to full fine-tuning at a fraction of the compute and memory cost.

Patch Embedding

The input layer of a Vision Transformer that splits an image into fixed-size patches, flattens each one, and linearly projects them into token vectors the transformer can process as a sequence.

Patronus Lynx

Patronus Lynx is an open-source LLM-as-judge fine-tuned on Llama-3 by Patronus AI to detect hallucinations in RAG outputs. It checks whether a generated answer is contained in retrieved chunks, contains no extra information, and does not contradict them.

Perplexity

A metric that measures how confidently a language model predicts the next word in a sequence, where lower scores indicate better predictive accuracy and stronger language understanding.

Perspective API

A free toxicity-detection API from Google's Jigsaw that uses machine learning to score how likely a comment is to be perceived as toxic, returning a probability between 0 and 1.

Pinecone

Pinecone is a fully managed, serverless vector database designed to power AI retrieval workloads such as RAG, semantic search, and recommendation systems. It stores high-dimensional embeddings, supports hybrid keyword and semantic queries, and exposes a single API for indexing and similarity search.

Positional Encoding

A technique that injects word-order information into transformer models, which process all tokens simultaneously and would otherwise treat every word as if its position in a sentence did not matter.

Post Training Quantization

Post training quantization compresses a pre-trained model's weights to lower-precision formats using a small calibration dataset, reducing memory requirements and accelerating inference without retraining.

Power Law

A mathematical relationship where one quantity scales as a fixed power of another, producing steep initial gains that gradually flatten. In AI, power laws describe how model performance improves predictably yet with diminishing returns as data, compute, or parameters increase.

PPO (Proximal Policy Optimization)

A reinforcement learning algorithm that updates a language model's behavior in small, stable steps during RLHF. PPO uses a clipped objective function to prevent destructively large changes, ensuring the model improves its responses based on human feedback without losing its core capabilities.

Pre Training

The foundational training phase where a large language model processes billions of raw text samples using self-supervised learning to build general language understanding before specialization through fine-tuning.

Precision, Recall, and F1 Score

Three classification metrics that quantify different aspects of prediction accuracy. Precision measures correctness among predicted positives, recall measures coverage of actual positives, and F1 score balances both through a harmonic mean for single-number model comparison.

Preference Data

Structured datasets of paired responses where one answer is rated better than the other, used to train reward models and align language models with human values through RLHF, DPO, and similar post-training methods.

Product Quantization

A vector compression method that divides high-dimensional vectors into smaller subvectors, quantizes each independently using learned codebooks, and stores compact codes that enable fast approximate nearest neighbor search with reduced memory.

Prompt Engineering For Image Generation

Prompt engineering for image generation is the practice of structuring text input — subject, style, composition, lighting, weighted tokens, negative prompts — to control how a text-to-image model interprets the request and renders the output.

Prompt Injection

A security vulnerability where crafted inputs manipulate a large language model into ignoring its system instructions, bypassing safety controls, or executing unauthorized actions. Ranked as the top LLM security risk by OWASP for two consecutive editions.

Promptfoo

An open-source CLI and library for evaluating, testing, and red-teaming LLM applications, scanning for vulnerabilities like prompt injection, jailbreaks, and PII leaks across configurable test suites.

Protected Attribute

A protected attribute is a characteristic — such as race, sex, age, or disability — that laws or fairness policies forbid using as a basis for discriminatory decisions. In machine learning, fairness metrics measure whether model outcomes differ across groups defined by these attributes.

Pyserini

Pyserini is an open-source Python toolkit from the Castorini group at the University of Waterloo that runs reproducible information retrieval experiments, supporting both sparse retrievers like BM25 and SPLADE and dense neural retrievers over a shared, Lucene-backed index.

PyTorch

PyTorch is an open-source deep learning framework maintained under the PyTorch Foundation that provides dynamic computation graphs and Python-native tools for building, training, and deploying neural networks.

Pytorch Geometric

An open-source Python library built on PyTorch that provides tools for building and training graph neural networks, offering ready-made GNN layers, standard graph datasets, and efficient data handling for graph-structured problems.

Q

Qdrant

An open-source vector database written in Rust that runs hybrid search natively, combining dense embeddings, sparse models like BM25 and SPLADE, and fusion methods such as Reciprocal Rank Fusion inside a single Query API call.

QLORA

QLoRA is a parameter-efficient fine-tuning method that combines 4-bit quantization with Low-Rank Adaptation (LoRA), enabling large language models to be fine-tuned on consumer-grade GPUs without meaningful loss in quality compared to full-precision fine-tuning.

Quantization

Quantization reduces the numerical precision of a model's weights to shrink memory usage and speed up inference. By converting high-precision numbers to lower-precision formats, it enables large language models to run on less hardware while preserving most output quality.

Query Key Value

Query, Key, and Value are three learned vector projections in the transformer attention mechanism that determine how each token weighs and retrieves information from every other token in a sequence.

Query Transformation

Query transformation is the pre-retrieval stage of a Retrieval-Augmented Generation (RAG) pipeline where the user's raw query is rewritten, expanded, abstracted, or decomposed before vector search runs, so the retriever finds documents that match meaning rather than just original phrasing.

Qwen Image Edit

Alibaba's open-weight instruction-based image-editing model that applies natural-language edits — swapping objects, rewriting in-image text, restyling, or generating new views — to an existing image while preserving untouched regions, built on a 20B MMDiT diffusion backbone with dual VLM + VAE encoders.

R

RAG Evaluation

RAG Evaluation measures the quality of a Retrieval-Augmented Generation pipeline by treating the retriever and generator as separate subsystems, scoring each with reference-free LLM-as-a-judge metrics such as faithfulness, answer relevancy, context precision, and context recall.

RAG Guardrails And Grounding

RAG guardrails are programmable checks placed around a retrieval-augmented generation pipeline that filter inputs, validate retrieved chunks, and verify generated claims are grounded in the actual retrieved sources, blocking or rewriting answers that drift beyond the evidence.

Ragatouille

A Python library by Answer.AI that wraps the ColBERT retrieval model for easy integration into RAG pipelines, enabling multi-vector late interaction retrieval with minimal setup through simple indexing, search, and training APIs.

Real-ESRGAN

An open-source GAN-based image super-resolution tool that restores and upscales degraded images using purely synthetic training data, eliminating the need for paired real-world and high-quality image datasets.

Reciprocal Rank Fusion

A parameter-free fusion algorithm that combines multiple ranked result lists into a single ranking by summing reciprocal-rank scores 1/(k+rank) for each document across retrievers, then re-sorting. Operates only on rank positions, so it merges results from algorithms with incompatible score scales.

Rectified Flow

Rectified flow is a generative-modeling method that trains a neural network to transport data along straight-line trajectories between noise and images. The straight paths let samplers produce results in very few integration steps, making it the dominant training objective for modern text-to-image diffusion transformers.

Recurrent Neural Network

A recurrent neural network (RNN) is a neural network architecture that processes sequential data by passing a hidden state from one time step to the next, allowing the model to retain and use information from earlier inputs when making predictions.

Red Teaming For AI

A structured adversarial testing practice where testers deliberately probe AI systems for security vulnerabilities, safety failures, and harmful behaviors, helping teams identify and fix critical weaknesses before deployment.

Regularization

A family of techniques that add constraints or penalties during model training, discouraging overly complex solutions and helping the model generalize to unseen data rather than memorize the training set.

Rembg

Rembg is an open-source Python tool that removes image backgrounds by running pre-trained segmentation models — including U²-Net, BiRefNet, SAM, and BRIA RMBG — and outputs transparent PNGs through a CLI, library, or HTTP server. Distributed under the MIT license.

Remove Bg

Remove.bg is a hosted SaaS background-removal service launched in 2018 by Kaleido AI and acquired by Canva in 2021, offering web, desktop, plugin, and API access to a proprietary AI segmentation model that automatically isolates subjects from photo backgrounds.

Reparameterization Trick

A mathematical technique that rewrites random sampling as a deterministic function of learnable parameters plus independent noise, enabling gradient-based optimization through stochastic layers in neural networks such as variational autoencoders.

Reproducibility

Reproducibility is the ability to obtain consistent results when repeating an experiment or computation using the same data, methods, and conditions, confirming that findings reflect genuine patterns rather than random chance.

Reranking

Reranking is the second stage of a two-stage retrieval pipeline: a fast retriever returns a candidate set of documents for recall, then a slower, more accurate model — typically a cross-encoder transformer — rescores them and returns the top results for precision.

Residual Connection

A residual connection is an architectural shortcut that lets data skip over one or more layers in a neural network, adding the original input directly to the layer's output. This enables training of much deeper networks by preserving gradient flow during backpropagation.

Retrieval Augmented Generation

A technique that connects a large language model to an external retrieval system so it can search for and reference real documents before generating a response, reducing hallucinations by grounding outputs in verifiable source material rather than relying on training data alone.

Reward Hacking

A failure mode in RLHF where the AI policy learns to exploit weaknesses in the reward model, maximizing its score without genuinely improving output quality or alignment with human preferences.

Reward Model Architecture

A neural network design where a pretrained language model is extended with a scoring layer that converts human preference judgments into scalar reward signals, used to train AI systems via reinforcement learning from human feedback.

Rewardbench

A standardized benchmark and leaderboard from Allen Institute for AI that measures how accurately reward models score and rank language model outputs, testing whether the preference signals driving RLHF alignment reliably distinguish better responses from worse ones.

RLAIF

A training technique where an AI model generates preference judgments to guide reinforcement learning alignment, replacing or supplementing human annotators in the feedback loop that shapes model behavior.

RLHF

A training method that aligns large language models with human preferences by collecting ranked comparisons of model outputs, training a reward model on those rankings, and optimizing the model using reinforcement learning.

Roc Auc

A threshold-independent metric that evaluates how well a binary classifier separates positive and negative examples across all decision thresholds, scored from 0 to 1 where 0.5 equals random guessing.

RWKV

An attention-free recurrent neural network architecture that combines parallel Transformer-style training with linear-time, constant-memory recurrent inference, positioning it as a lightweight alternative to quadratic Transformers for long-context language modeling.

S

Safety Classifier

A machine learning model that automatically scores or labels content as safe or unsafe against a predefined hazard taxonomy, used to filter harmful inputs and outputs in AI systems and content platforms.

Salient Object Detection

A computer vision task that identifies the single most visually prominent object in an image and outputs a pixel-level mask separating it from the background. SOD is class-agnostic — it locates the subject without naming it — and underpins most one-click background removal tools.

SAM 2

SAM 2 (Segment Anything Model 2) is Meta's open-weight foundation model for promptable image and video segmentation. Released under Apache 2.0, it accepts clicks, boxes, or masks as prompts and uses a memory module to track objects across video frames, even through occlusions.

Scaled Dot Product Attention

The core computation inside transformer models that calculates relevance scores between queries and keys using dot products, scales them to prevent gradient saturation, and produces weighted combinations of values.

Scaling Laws

Empirical power-law relationships showing how a language model's performance predictably improves as you increase model size, training data, or compute budget, enabling teams to forecast results before committing resources.

ScaNN

An open-source library from Google Research that performs fast approximate nearest neighbor search using anisotropic vector quantization, designed for finding similar items in large collections of high-dimensional vectors.

Scikit Learn

An open-source Python machine learning library providing consistent APIs for classification, regression, clustering, and model evaluation, widely used for computing metrics like precision, recall, and F1 score.

Seedream

Seedream is ByteDance's family of image foundation models that combine text-to-image generation and instruction-based editing in a single unified architecture. The current flagship Seedream 4.5 supports high-resolution output with multiple reference images and is delivered via third-party inference platforms rather than a first-party ByteDance API.

Selective Scan

Selective Scan is the content-aware recurrence at the heart of modern state space models — it updates a compressed hidden state using input-dependent parameters, letting the model emphasize relevant tokens and compress or skip irrelevant ones as it streams through long sequences.

Self Supervised Learning

A training approach where models learn from unlabelled data by generating their own supervisory signal from the input — masking parts, comparing augmented views, or matching across modalities — producing general-purpose representations that transfer to downstream tasks with little labelled data.

Semantic Search

A retrieval method that converts queries and documents into dense vector representations and ranks results by similarity metrics like cosine similarity or dot product, finding matches based on meaning rather than keyword overlap.

Semantic Segmentation

Semantic segmentation is a computer vision technique that assigns a class label to every pixel in an image. Unlike object detection, which draws bounding boxes, segmentation produces a precise pixel-level map showing exactly which pixels belong to people, cars, sky, or background.

Sentence Transformers

A Python framework that generates sentence-level embeddings by passing text through transformer models and applying pooling strategies, enabling semantic search, clustering, and similarity comparison tasks that require understanding meaning rather than matching exact keywords.

SGLang

An open-source serving framework for large language models that accelerates inference through RadixAttention and automatic prefix caching, enabling faster token generation for production deployments.

Siamese Network

A neural network architecture where two identical sub-networks share the same weights, process separate inputs simultaneously, and produce comparable output vectors, enabling the system to measure how similar or different two inputs are.

SigLIP

SigLIP is Google's family of image-text contrastive encoder models that learn joint vision-language representations using a per-pair sigmoid loss, producing efficient and accurate vision backbones used inside most open-source Vision-Language Models in 2026.

Similarity Search Algorithms

Methods that find the closest matching vectors in high-dimensional spaces by measuring distance or angle between numerical representations of data. Used in AI systems for semantic search, recommendation engines, and retrieval-augmented generation to match queries to relevant results.

Softmax

A mathematical function that converts raw numerical scores into a probability distribution where all values sum to one, used in attention mechanisms and classification outputs across AI systems.

Sparse Activation

A computational strategy where only a small subset of a neural network's parameters activate for each input. Common in Mixture of Experts architectures, it decouples model capacity from inference cost, allowing larger models to run efficiently by routing each token through selected expert sub-networks.

Sparse Retrieval

Sparse retrieval represents queries and documents as high-dimensional vectors over a vocabulary, with almost every coordinate zero. Matching uses an inverted index for efficient top-k lookup. The family includes classical scoring like BM25 and learned encoders like SPLADE.

Specificity

Specificity measures a classifier's ability to correctly identify negative instances — data points that don't belong to the target class. Calculated as true negatives divided by all actual negatives (TN + FP), it reveals how often a model avoids false alarms.

Spectral Graph Theory

A mathematical framework that analyzes graph structure through eigenvalues and eigenvectors of associated matrices like the Laplacian, forming the theoretical basis for spectral graph neural networks.

Speculative Decoding

An inference acceleration technique where a small draft model proposes multiple candidate tokens that a larger target model verifies in parallel, reducing latency while preserving output quality identical to standard generation.

State Space Model

A sequence modeling architecture that uses linear recurrence with selective gating to process data in linear time, offering an alternative to transformer attention for tasks involving long sequences.

Static Batching

A batch inference scheduling method where multiple requests are grouped into a fixed batch and processed together, requiring all requests to wait until the longest sequence finishes generating before any output is returned.

Statistical Significance

A statistical measure indicating whether an observed difference between experimental results is likely caused by a real effect rather than random variation, commonly used to validate model comparisons in ablation studies.

Stylegan

A style-based GAN architecture from NVIDIA Research that introduces a mapping network to separate high-level image attributes from stochastic variation, giving fine-grained control over generated image quality and feature manipulation.

Subword Tokenization

A text preprocessing technique that splits words into smaller units (subwords) based on statistical frequency patterns, enabling language models to represent any word — including rare or unseen terms — using a fixed-size vocabulary of common fragments.

Supermemory

Supermemory is a managed memory and context infrastructure layer for AI agents, combining connectors, content extractors, hybrid search, a memory graph, and user profiles into a single API. It enables agents to recall facts across conversations and integrate data from sources like Notion, Slack, and Drive.

Supervised Fine Tuning

A training method that adapts a pre-trained large language model to perform a specific task by learning from labeled input-output pairs, adjusting model weights through gradient descent to match ground-truth examples.

Supervisor Agent Pattern

A multi-agent architecture in which a central supervisor agent receives a request, delegates subtasks to specialized worker agents, monitors their progress, and combines the results into a single response. The supervisor controls flow; workers handle domain-specific work like research, coding, or writing.

Supir

SUPIR is an open-source diffusion-based image super-resolution and restoration model that pairs a Stable Diffusion XL backbone with multimodal LLM guidance to reconstruct photorealistic detail in heavily degraded photos, faces, and textures far beyond what GAN-based upscalers like Real-ESRGAN can recover.

Swarm Architecture

A multi-agent design pattern in which AI agents pass conversational control to each other through explicit handoffs rather than relying on a central supervisor; popularised by OpenAI's experimental Swarm framework, it favours decentralised routing over top-down orchestration.

SWE Bench

A software engineering benchmark that evaluates large language models by testing their ability to resolve real GitHub issues from Python repositories, requiring each model to generate a code patch that passes the project's existing test suite.

Swin Transformer

A hierarchical Vision Transformer that computes self-attention inside non-overlapping shifted windows and merges patches layer by layer, producing multi-scale feature maps at linear cost in image size — the default backbone for object detection and semantic segmentation.

T

T5

T5 is Google's encoder-decoder transformer model that converts every NLP task into a text-to-text format, treating both inputs and outputs as text strings regardless of whether the task involves translation, summarization, classification, or question answering.

Teacher Forcing

A training technique for sequence models where the correct output at each time step feeds into the decoder's next step instead of the model's own prediction, enabling faster convergence but introducing exposure bias at inference time.

Temperature And Sampling

Temperature and sampling are parameters that control how a large language model selects its next token from a probability distribution, with temperature scaling logits before softmax to adjust the randomness of generated text.

TensorRT-LLM

NVIDIA's open-source inference optimization framework that accelerates large language model serving on NVIDIA GPUs using in-flight batching, paged KV cache, quantization, and speculative decoding to maximize throughput and minimize latency.

Term Frequency

Term frequency (TF) is the count of how often a term appears in a document. It is the foundational signal in lexical information retrieval, used directly in TF-IDF and re-weighted by saturation in BM25 or replaced by learned weights in SPLADE and ELSER.

Text Generation Inference

An open-source inference server by Hugging Face that deploys large language models for production use, featuring continuous batching, tensor parallelism, quantization, Flash Attention, and speculative decoding to maximize GPU throughput and minimize response latency.

TF-IDF

TF-IDF is a term-weighting formula that ranks the importance of a word in a document by multiplying its term frequency (how often it appears locally) with its inverse document frequency (how rare it is across the corpus).

Tiktoken

Tiktoken is OpenAI's open-source tokenizer library that converts text into subword tokens using Byte Pair Encoding, enabling language models to process input text as numerical sequences for prediction and generation.

Tiled Upscaling

A super-resolution technique that splits a source image into overlapping tiles, processes each tile independently through an AI model, then stitches the results back together — enabling 4K and 8K outputs on consumer GPUs that lack the memory for a full-image pass.

Time To First Token

The latency measured from when a generation request arrives at an LLM inference engine to when the first output token is produced. Encompasses queuing time, prefill computation across the full input prompt, and network overhead. The primary metric for perceived responsiveness in interactive AI applications.

Tokenization

Tokenization splits raw text into smaller units called tokens — subwords, characters, or bytes — that language models can process as numerical input for tasks like text generation and understanding.

Tokenizer Architecture

The multi-stage system that converts raw text into numerical token IDs for large language models, consisting of normalization, pre-tokenization, a subword algorithm (BPE, WordPiece, or Unigram), and post-processing steps.

Top K Routing

A gating mechanism in Mixture of Experts models that scores every available expert for each input token, then routes computation to only the k highest-scoring ones while keeping the rest inactive to save processing power.

Top P Sampling

A text generation strategy that dynamically selects the smallest set of tokens whose cumulative probability exceeds a threshold p, then samples from that reduced set — adapting the candidate pool size to the model's confidence at each step.

Topaz Gigapixel

Topaz Gigapixel is a desktop AI image upscaler from Topaz Labs that enlarges photos using specialized models for faces, textures, line art, and synthetic renders, processing files locally on the user's computer rather than in the cloud.

Toxicity And Safety Evaluation

The systematic process of testing AI models for harmful outputs — toxic language, discriminatory content, jailbreak vulnerability, and policy violations — using benchmarks, safety classifiers, and red teaming to measure and reduce risk before deployment.

Toxigen

A large-scale machine-generated dataset of implicit hate speech and benign statements about 13 minority groups, created by Microsoft Research for training and evaluating toxicity classifiers that detect subtle harmful language without relying on explicit slurs or profanity.

Transfer Learning

A machine learning technique where knowledge gained from training on one task is reused to improve performance on a different but related task. Transfer learning reduces the need for large labeled datasets and extensive compute, making it the foundation behind all modern fine-tuning approaches including LoRA and QLoRA.

Transformer Architecture

A neural network design that uses self-attention to process entire input sequences in parallel, replacing older sequential approaches and powering most modern large language models and AI systems.

Trimap

A trimap is a 1-channel guide image that partitions a photo into three regions — known foreground (white), known background (black), and an unknown transition zone (gray) — telling an alpha matting algorithm where to estimate per-pixel transparency for hair, fur, and edges.

TRL

TRL is HuggingFace's open-source Python library for aligning language models with human preferences using reinforcement learning and preference optimization methods like PPO, GRPO, and DPO.

True Positive Rate

The proportion of actual positive cases correctly identified by a classifier, calculated as true positives divided by total actual positives (true positives plus false negatives). Also called sensitivity or recall.

TruLens

TruLens is an open-source evaluation and tracing framework for LLM applications and agents, built around the RAG Triad — Context Relevance, Groundedness, and Answer Relevance — three feedback functions that score retrieval quality, grounding to source documents, and how well the answer addresses the question.

V

Vanishing Gradient

The vanishing gradient problem occurs when gradients shrink exponentially as they travel backward through deep neural network layers during training, preventing early layers from learning effectively and driving the development of modern activation functions like ReLU.

Variational Autoencoder

A generative neural network that encodes input data into a probability distribution over a latent space, then decodes samples from that distribution to produce new data resembling the original training set.

Vectara HHEM

Vectara HHEM (Hughes Hallucination Evaluation Model) is a classifier that compares an LLM's generated text to a source document and produces a faithfulness score, used to detect hallucinations and rank how well models stay grounded in retrieved sources.

Vector Database

A specialized database designed to store, index, and query high-dimensional vector embeddings using approximate nearest neighbor algorithms, enabling fast similarity search for applications like semantic search, RAG pipelines, and recommendation engines.

Vector Indexing

A method of organizing high-dimensional vectors into specialized data structures so approximate nearest-neighbor searches return results in sub-linear time instead of scanning every record.

Vision Transformer

A deep-learning architecture that treats an image as a sequence of small fixed-size patches and processes them with the same Transformer encoder used for language, replacing convolutions with self-attention across all patches at every layer.

vLLM

An open-source inference engine that optimizes how large language models generate text by using PagedAttention for efficient GPU memory management, enabling higher throughput and lower latency during autoregressive decoding.

Voyage Rerank

Voyage Rerank is a family of cross-encoder reranking models from Voyage AI (now part of MongoDB) that re-scores retrieved passages against a query, with the rerank-2.5 generation adding instruction-following so relevance criteria can be steered at query time.

VQ-VAE

A generative model that uses vector quantization to replace the continuous latent space of standard variational autoencoders with a discrete codebook, producing sharper reconstructions, avoiding posterior collapse, and enabling downstream models like transformers to process the resulting discrete codes as token sequences.

365 terms defined

About This Glossary

Unlike conventional glossaries that offer a single paragraph per term, each entry here provides a multi-perspective treatment. MONA grounds the term in its mathematical or architectural foundation. MAX connects it to real tools, frameworks, and implementation patterns. DAN places it within the broader industry context — where it came from, who is pushing it forward, and why it matters now. ALAN examines the ethical dimension — what risks, biases, or accountability gaps the concept introduces.

The glossary is organized alphabetically and interlinked with our article library. When you encounter a term while reading an article, inline glossary links take you directly to the relevant definition — and from there to deeper articles that explore the concept in full.

Terms are added and updated with every content cycle as our coverage expands into new topic clusters. If a concept appears in our articles, it belongs in the glossary.

Q: Who is this glossary for? A: This glossary is for technical professionals navigating the AI landscape — whether you are a developer building your first agent pipeline, an engineer evaluating RAG architectures, or a tech lead making build-vs-buy decisions about AI infrastructure. Definitions assume programming literacy but not prior AI expertise.

Q: How is this glossary different from other AI glossaries? A: Each term receives multi-perspective coverage from four specialized authors — scientific foundations, practical implementation, industry context, and ethical implications. Definitions are interlinked with in-depth articles that explore the concept further.

Q: How are terms selected and how often is the glossary updated? A: Terms are selected based on the topics covered in our article library. The glossary is updated with every content cycle — when new articles introduce concepts that warrant a standalone definition, those terms are added automatically as part of our content pipeline.

Q: Can I use this glossary as a learning path? A: Yes. Each glossary entry links to related terms and full articles. You can start with a foundational term like “transformer” and follow the links through attention mechanisms, embeddings, and into applied topics like retrieval-augmented generation — building understanding layer by layer.