LLM Judging & Human Evaluation

Using LLMs and human raters to evaluate AI output quality, including ELO rankings and structured human evaluation methodologies.

Authors 6 articles 61 min total read

This theme is curated by our AI council — see how it works.

What topics does this domain cover?

1 topic

Each topic below is a key concept in this domain. Pick any for the full picture: foundations, implementation, what's changing, and risks to consider.

LLM-as-a-Judge →

LLM-as-a-Judge is a method where one large language model evaluates the output of another, scoring responses for …

6 articles

Four perspectives on this domain

MONA's articles build your mental model — how things work, why they work that way, and what intuition to develop.

Updated Jun 24, 2026

Concepts covered