ALAN opinion 9 min read March 25, 2026

The Scaling Tax: Energy Consumption, Data Monopolies, and Concentrated AI Power

Abstract visualization of growing energy grid towers dwarfing small human figures below

Table of Contents

The Hard Truth

If a technology’s progress is measured by how much electricity it consumes and how few organizations can afford to build it, should we still call it progress — or should we call it extraction?

There is a formula at the heart of modern AI development, and it is remarkably simple: more compute, more data, better models. The math is clean. The implications are not. Behind every leap in capability sits a growing bill — measured in gigawatt-hours, in freshwater, in capital that only a handful of institutions can raise. The industry calls this scaling. It might be more honest to call it a tax.

The Cost Nobody Itemizes

The logic of Scaling Laws is seductive because it works. Loss Function performance improves as a predictable Power Law function of model size, dataset size, and compute — a regularity first established by Kaplan et al. in 2020. Two years later, Chinchilla Scaling refined the insight: model parameters and training tokens should scale in roughly equal proportion — approximately twenty tokens per parameter — to achieve Compute Optimal Training (Hoffmann et al.). The empirical regularity was striking. It suggested that intelligence, or at least the statistical approximation of it, could be purchased by the terawatt-hour.

But purchased by whom? And at what environmental cost?

The Elegant Promise of Scale

To be fair, the argument for scaling carries real weight. Pre Training at scale has produced capabilities that no one planned and few predicted. Emergent Abilities — in-context learning, chain-of-thought reasoning, cross-lingual transfer — appeared not because they were programmed but because the models became large enough for latent structure to surface. The scaling hypothesis was not just validated; it seemed to accelerate.

From a researcher’s perspective, this is extraordinary. A single mathematical relationship — loss as a function of compute — provides a roadmap for building systems that do things their designers cannot fully explain. The regularity is convenient. It is also, from a governance perspective, dangerous — because it makes the path forward look like a question of investment rather than a question of values.

The Hidden Ledger

The energy arithmetic is sobering. Data centers consumed approximately 415 TWh of electricity globally in 2024 — roughly 1.5% of global demand (IEA). Projections suggest that figure could reach 945 TWh by 2030, with the United States and China accounting for roughly 80% of the growth (IEA). Training a single frontier model like GPT-4 required an estimated 50-62 GWh of energy — a figure based on leaked estimates rather than official disclosure from OpenAI (Epoch AI). That is roughly forty to forty-eight times the energy consumed by its predecessor. And GPT-3’s own training run evaporated 700,000 liters of freshwater (OECD AI).

These are not abstract numbers. They represent real rivers, real power grids, and real communities competing for resources with an industry that treats energy consumption as a scaling variable — a cost to be optimized, not a boundary to be respected.

The capital picture is equally stark. The amortized cost of training GPT-4 was approximately $78 million; full hardware acquisition pushed the figure closer to $800 million (Epoch AI). Training costs have grown at roughly 2.4x per year since 2016, a trajectory that points toward single training runs exceeding a billion dollars by 2027 — though this remains a trend extrapolation, not a confirmed expenditure. Global AI infrastructure capital expenditure is projected to reach $400-450 billion in 2026 alone. Who can write checks like that? Perhaps five organizations. Perhaps ten. Certainly not a university lab in Nairobi, a public health agency in Brasilia, or a cooperative in rural France.

Infrastructure as Ideology

The conventional framing treats energy consumption and capital concentration as engineering challenges — problems to be solved with better cooling, cheaper chips, and more efficient architectures. DeepSeek-V3 demonstrated that training compute efficiency could improve dramatically relative to comparable-scale models, and the shift toward Inference Time Scaling and mixture-of-experts architectures suggests the industry is already adapting to diminishing returns on raw pre-training scale. These are meaningful developments.

But efficiency gains do not resolve the structural problem. Economies of scale in compute, data, and talent create a natural tendency toward market concentration — and potentially toward irreversible market tipping (Korinek & Vipra). Even if each individual training run becomes cheaper per FLOP, the absolute investment required to remain at the frontier continues to climb. Inference already accounts for approximately two-thirds of all AI compute, and that share is growing. The organizations that train the models are also the organizations that profit from running them at scale — a feedback loop that compounds the asymmetry with every generation.

This is not a market dynamic. It is an infrastructure regime. And like every infrastructure regime before it — railroads, telecommunications, electrical grids — its politics are embedded in its architecture.

The Scaling Tax Is a Governance Failure

The environmental and economic costs of AI scaling are not technical problems awaiting better engineering — they are governance failures masquerading as optimization challenges.

The Fine Tuning of a language model is treated as a technical procedure. The RLHF process that shapes its behavior is discussed as an alignment method. But neither technique addresses the upstream question: who decided that this particular model should exist, at this particular scale, consuming this particular share of global energy resources? The decision to build a frontier model is not made in a lab meeting. It is made in a boardroom, under conditions of competitive pressure, with externalities distributed across communities that had no seat at the table.

When projections point toward single-campus power demands of one to five gigawatts by 2030, and distributed training clusters drawing two to forty-five gigawatts, the implication is not merely technical. It is political. That volume of energy demand reshapes regional grids, redirects water resources, and transforms local economies — all in service of models whose primary beneficiaries are a narrow class of firms and their investors.

The Questions We Owe the Next Decade

If the scaling paradigm continues — even in its more efficient post-Chinchilla form — the concentration of AI capability will likely deepen. Not because anyone plans it, but because the economics demand it. The cost of entry at the frontier is rising faster than the cost of capital is falling. The handful of organizations capable of frontier training today may be the only organizations capable of it tomorrow.

What does democratic participation in AI development look like when the infrastructure itself excludes all but a few? And what does accountability mean when the organizations making the most consequential decisions about AI systems are also the ones writing the rules for how those systems are governed?

Where This Argument Fractures

This position is most vulnerable in two places. First, if efficiency improvements — distillation, mixture-of-experts, inference-time reasoning — prove so effective that frontier capability becomes genuinely accessible without frontier-scale infrastructure, the concentration argument weakens substantially. The Chinchilla replication concerns raised by Epoch AI suggest the optimal scaling ratios themselves may be less settled than initially believed, which could open unexpected pathways to capability without brute-force compute.

Second, if international governance mechanisms — sovereign AI stacks, public compute infrastructure, open-weight model releases — succeed in distributing capability more broadly, the political urgency diminishes. The argument depends on the current trajectory holding. If the trajectory bends, so must the analysis.

The Question That Remains

The mathematics of scaling laws are elegant — a clean curve promising better performance for more investment. But elegance in a formula does not imply justice in its application. The real question is not whether we can afford to keep scaling. It is whether the communities bearing the cost — in energy, in water, in democratic exclusion — ever agreed to the terms.

Disclaimer

This article is for educational purposes only and does not constitute professional advice. Consult qualified professionals for decisions in your specific situation.

Aha Moments

MONA

The power-law relationship between compute and loss is real, and the empirical evidence supporting it is among the most reproduced findings in modern machine learning. But Alan is right to note that a statistical regularity does not constitute a mandate. The relationship tells us what happens when you scale — not whether you should. The same curves that predict capability gains also predict energy demand, and the community has been far more rigorous about benchmarking the former than measuring the latter. The measurement gap is itself a form of governance failure, because what we choose not to quantify, we implicitly choose not to govern.

MAX

Mona’s point about measurement cuts both ways. If we measured the full lifecycle cost of a training run — energy, water, hardware depreciation, cooling infrastructure — the same way we benchmark perplexity, every architecture decision would look different. The gap here is not in the model. It is in the project. No frontier lab publishes a resource impact statement alongside a model card. Until that becomes standard — until the costs are as visible as the capabilities — the scaling decision will continue to be made on incomplete information. The fix is not to stop scaling. It is to make the full cost legible.

DAN

Both of you are talking about transparency, and that matters. But the concentration problem Alan describes is not just about visibility — it is about structural advantage. The organizations that can afford frontier training are also the ones setting the benchmarks, defining the evaluation criteria, and lobbying the regulators. Transparency alone does not level that playing field. The question is whether efficiency gains and open-weight releases can move fast enough to prevent a permanent lock-in — or whether we are watching the formation of an infrastructure monopoly in real time?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors