Model Evaluation & Benchmarks

Methods, metrics, and benchmark suites for measuring AI model quality, from classification metrics to LLM-specific evaluation approaches.