Query By Committee
Also known as: QBC, Committee-Based Sampling, Ensemble Disagreement Sampling
- Query By Committee
- Query By Committee is an active learning query strategy that trains multiple models on the same labeled data, then selects the unlabeled samples where these models disagree most. High disagreement signals uncertainty, so labeling those examples teaches the models the most per annotation.
Query By Committee is an active learning strategy where several models trained on the same data vote on unlabeled examples, and the ones they most disagree on get sent to a human for labeling.
What It Is
Labeling data is expensive. If you have a million unlabeled product reviews and a budget to annotate only a few thousand, which ones do you pick? Picking at random wastes money on easy examples the model already understands. Query By Committee (QBC) solves this by letting the model tell you where it’s confused — and confusion is exactly where new labels pay off.
The idea works like a panel of experts disagreeing on a diagnosis. Imagine three doctors who all studied from the same textbook. For a routine case, they agree instantly. But for an unusual patient, they argue — and that argument is the signal that the case is genuinely hard and worth a closer look. QBC builds a “committee” of several models, trains each on the same small batch of already-labeled data, then asks all of them to predict labels for the unlabeled pool. The examples where the committee splits its vote are the ones a human should label next.
Each committee member is trained to be slightly different — using different subsets of the data, different starting conditions, or different model settings — so that they don’t all make identical mistakes. The disagreement between them is measured with a number (often called vote entropy or how far each model’s prediction sits from the group average). A high score means the models can’t agree; a low score means the answer is obvious. QBC ranks the unlabeled pool by this disagreement and hands the top of the list to a human annotator. The newly labeled examples go back into training, the committee retrains, and the cycle repeats. This is the core loop that makes active learning efficient: instead of labeling everything, you label the few examples that resolve the most uncertainty.
How It’s Used in Practice
The most common place teams meet QBC is inside a data-labeling workflow for a classification model — spam detection, sentiment analysis, document routing, image tagging. A company has a huge unlabeled dataset and a limited annotation budget. Rather than send random batches to annotators, the team wraps the model in a QBC loop: train a small committee on whatever labels exist, score the unlabeled pool by disagreement, route the most-contested items to the labeling queue, retrain, repeat. Over several rounds, the model reaches target accuracy with far fewer labeled examples than random sampling would need.
QBC fits naturally with pool-based sampling, where the full set of unlabeled candidates is available at once and the strategy picks the best ones to query. It pairs with a human-in-the-loop setup, since a person provides the ground-truth labels the committee then learns from.
Pro Tip: Make your committee members genuinely different, not three copies of the same model with the same seed. If they’re trained identically, they’ll agree on everything — including their blind spots — and QBC quietly degrades into random sampling without telling you. Vary the training subsets or model configurations, and sanity-check that disagreement actually correlates with errors before you trust it to spend your labeling budget.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Large unlabeled pool, small labeling budget | ✅ | |
| Each label is cheap and instant (auto-labeled) | ❌ | |
| Classification task with clear disagreement signal | ✅ | |
| You can only afford to train one model, not a committee | ❌ | |
| Iterative labeling rounds with retraining are feasible | ✅ | |
| Cold-start with zero labeled data to seed the committee | ❌ |
Common Misconception
Myth: Query By Committee finds the “hardest” or most unusual examples in your dataset, so it’s a good way to surface outliers and edge cases.
Reality: QBC finds examples the current committee disagrees on, which is not the same as objectively hard or rare. A weak or biased committee can disagree on trivial examples and agree on genuinely tricky ones. It targets model uncertainty, not data difficulty — and if the committee members are too similar, it can miss informative samples entirely. It’s a disagreement detector, not an outlier detector.
One Sentence to Remember
Query By Committee spends your labeling budget where it counts by labeling the examples your models can’t agree on — but it only works as well as the diversity of the committee you build.
FAQ
Q: How is Query By Committee different from uncertainty sampling? A: Uncertainty sampling uses one model’s confidence to pick examples. QBC uses several models and measures their disagreement instead. Disagreement across a diverse committee is often a more reliable signal than a single model’s confidence.
Q: How many models do you need in the committee? A: Usually a small handful — enough to produce meaningful disagreement without exploding training cost. The key requirement is that members are diverse, not numerous, since identical models defeat the entire purpose.
Q: Does Query By Committee work for any machine learning task? A: It fits classification tasks best, where votes and disagreement are easy to measure. It’s harder to apply to regression or generative tasks, and it needs at least a small labeled set to train the initial committee.
Expert Takes
Disagreement is information. When a diverse committee splits its vote on an example, that split measures how much the models still don’t know about that region of the data. Labeling there reduces uncertainty faster than labeling where everyone already agrees. The principle is simple: spend supervision where the model’s internal disagreement is highest, because that is where each new label changes the most.
Treat the committee as a specification for “what to label next,” not as a finished product. The workflow matters more than the math: define how members differ, how disagreement is scored, and how labeled examples flow back into retraining. Write that loop down explicitly. A QBC setup with vague diversity rules drifts toward random sampling, and you won’t notice until the labeling budget is already spent.
Labeling is one of the biggest hidden costs in shipping a model, and budgets are finite. Query By Committee is a lever on that cost — reach target accuracy with a fraction of the annotations. Teams that treat data labeling as a strategy rather than a chore move faster and spend less. The ones that label everything by brute force are paying a tax their competitors stopped paying.
A committee that agrees can still be wrong together. If every member learned from the same biased seed data, QBC will confidently skip the examples that expose that bias — because nobody disagrees. The method optimizes for resolving internal uncertainty, not for fairness or coverage. Worth asking: whose edge cases never trigger disagreement, and therefore never get labeled, reviewed, or corrected?