AI Glossary

Comprehensive dictionary of AI and machine learning terminology

C

Causal Masking

Causal masking is an attention restriction in decoder-only transformer models that prevents each token from attending to future tokens, enforcing the left-to-right generation order that makes autoregressive language models produce text one token at a time.

Colpali

A vision-language retrieval model that searches documents by processing page images directly through a vision encoder, generating multi-vector patch embeddings and using late interaction scoring to rank pages without OCR or text extraction.

Context Vector

The single fixed-length vector an encoder network produces after processing an entire input sequence, compressing all source information into one representation that the decoder uses to generate output. Its limited capacity motivated the invention of attention mechanisms.

Context Window

The maximum number of tokens a language model can process in a single interaction, covering both the input prompt and the generated output combined.

Contrastive Learning

A self-supervised machine learning technique that trains models to produce meaningful embeddings by maximizing similarity between related (positive) pairs while minimizing similarity between unrelated (negative) pairs, forming the core training objective behind Sentence Transformers and modern sentence-level embedding models.

Cosine Similarity

A mathematical metric that computes the cosine of the angle between two vectors, producing a score from −1 (opposite) to +1 (identical direction), widely used to measure semantic closeness between embeddings.

Cross Attention

An attention mechanism where queries originate from one sequence and keys and values come from a different sequence, enabling a model to focus on relevant information across two distinct inputs like encoder and decoder representations.

S

Scaled Dot Product Attention

The core computation inside transformer models that calculates relevance scores between queries and keys using dot products, scales them to prevent gradient saturation, and produces weighted combinations of values.

Scaling Laws

Empirical power-law relationships showing how a language model's performance predictably improves as you increase model size, training data, or compute budget, enabling teams to forecast results before committing resources.

ScaNN

An open-source library from Google Research that performs fast approximate nearest neighbor search using anisotropic vector quantization, designed for finding similar items in large collections of high-dimensional vectors.

Semantic Search

A retrieval method that converts queries and documents into dense vector representations and ranks results by similarity metrics like cosine similarity or dot product, finding matches based on meaning rather than keyword overlap.

Sentence Transformers

A Python framework that generates sentence-level embeddings by passing text through transformer models and applying pooling strategies, enabling semantic search, clustering, and similarity comparison tasks that require understanding meaning rather than matching exact keywords.

Siamese Network

A neural network architecture where two identical sub-networks share the same weights, process separate inputs simultaneously, and produce comparable output vectors, enabling the system to measure how similar or different two inputs are.

Similarity Search Algorithms

Methods that find the closest matching vectors in high-dimensional spaces by measuring distance or angle between numerical representations of data. Used in AI systems for semantic search, recommendation engines, and retrieval-augmented generation to match queries to relevant results.

Softmax

A mathematical function that converts raw numerical scores into a probability distribution where all values sum to one, used in attention mechanisms and classification outputs across AI systems.

State Space Model

A sequence modeling architecture that uses linear recurrence with selective gating to process data in linear time, offering an alternative to transformer attention for tasks involving long sequences.

State Space Models

A class of sequence modeling architectures that process input tokens with linear-time complexity, offering a faster alternative to transformers for handling long sequences by maintaining a compressed hidden state instead of attending to every previous token.

Subword Tokenization

A text preprocessing technique that splits words into smaller units (subwords) based on statistical frequency patterns, enabling language models to represent any word — including rare or unseen terms — using a fixed-size vocabulary of common fragments.

63 terms defined