NVIDIA–Gretel and Syntho–MOSTLY AI: How the Synthetic Data Market Consolidated in 2026

Table of Contents
TL;DR
- The shift: Standalone synthetic-data startups are being absorbed by chip giants and surviving specialists — the independent-vendor era is closing.
- Why it matters: As frontier labs run out of real-world training data, synthetic data became infrastructure, and infrastructure gets owned, not rented.
- What’s next: The remaining independents pick a side — get acquired, go open-source, or get squeezed out.
Two deals, fifteen months apart, tell the same story. NVIDIA reportedly pulled Gretel inside its own walls. Then Syntho walked off with the MOSTLY AI brand after the company behind it had already shut down. This isn’t a run of unrelated startup exits — it’s a whole sector folding into the platforms that feed AI’s hunger for data.
The Independent Synthetic-Data Vendor Is Going Extinct
Thesis: The synthetic-data market is consolidating into the hands of chip and platform giants — because synthetic data stopped being a feature and became training infrastructure.
For years, Synthetic Data Generation was a specialist’s game. A handful of startups sold the ability to manufacture artificial-but-realistic records, and the big labs were customers.
That relationship just inverted.
The biggest model builders — Microsoft, Meta, OpenAI, Anthropic — already train flagship models partly on synthetic data as real-world data runs thin, per TechCrunch’s reporting. When a capability becomes core to your product, you stop renting it.
You buy the company that makes it.
Two Deals, One Direction
The proof isn’t a single headline. It’s the same move repeating across the sector.
NVIDIA reportedly acquired Gretel in March 2025 — a story originally broken by Wired and picked up by TechCrunch. The price ran into nine figures, north of Gretel’s roughly $320M valuation, with exact terms undisclosed, according to SiliconANGLE. Gretel, founded in 2019 with around 80 employees and led by CEO Ali Golshan, folded into NVIDIA’s generative-AI developer services.
Then came the second move, and the direction matters. In June 2026, Syntho acquired the MOSTLY AI brand, trademark, and related assets — not the reverse, per Syntho’s own announcement. MOSTLY AI the company, Vienna-based and founded in 2017 on roughly $31 million in funding, had already wound down operations earlier that year, according to CB Insights. The combined brand now runs as “MOSTLY AI, powered by Syntho.”
Pull back further and SAS reportedly absorbed Hazy’s assets back in 2024.
Three years. Multiple absorptions. One direction. That’s not a string of coincidences — that’s a market being rolled up.
Who Comes Out Ahead
NVIDIA wins the cleanest. It now owns the data-generation layer that feeds its own chips and developer stack — vertical integration, top to bottom.
Syntho wins by subtraction. It absorbed a defunct rival’s name and mindshare, and walks away as a default enterprise label in a thinning field.
The open-source survivors win too. Synthetic Data Vault and Faker become neutral ground the moment commercial options consolidate — though they differ sharply, since Faker fakes columns one at a time while SDV models the relationships between them. SDV’s commercial tier runs through DataCebo with no public pricing, per Tonic.ai, the field’s standing independent vendor.
And the methods themselves don’t care who signs the checks. CTGAN, Differential Privacy, and Knowledge Distillation are techniques, not products. The science survives every acquisition — only the logos on it change.
Who Gets Squeezed
The standalone synthetic-data startup without a platform underneath it. MOSTLY AI is the cautionary tale: real technology, real funding, and it still wound down before the brand changed hands.
Enterprises that bet their roadmap on a single independent vendor. Your data pipeline is now somebody else’s M&A footnote — and brand continuity is not the same as product continuity.
Gretel isn’t a loser here, but it’s no longer an independent option you can choose. MOSTLY AI the standalone company is simply gone.
So the strategic fork is sharp: you either build on a generation layer a giant will keep funding, or you bet on an independent that’s one acquisition away from a brand transfer.
What Happens Next
Base case (most likely): The remaining independents consolidate further or retreat into open-source and niche compliance plays. Synthetic data settles in as a standard layer of the training stack, mostly owned by platforms. Signal to watch: Another standalone vendor acquired, or one open-sourcing its core to stay relevant. Timeline: Next 12–18 months.
Bull case: Synthetic data matures into a well-governed, openly auditable layer. Open libraries thrive as neutral infrastructure, and enterprises get more options through native platform integrations. Signal: A major cloud or chip platform ships synthetic-data generation as a first-class managed service. Timeline: Within roughly a year.
Bear case: Quality and privacy problems erode trust — models degrade when trained too heavily on their own synthetic output, and consolidation leaves fewer independent checks on data fidelity. Signal: A public incident of synthetic-data-driven model degradation or a privacy leak from generated records. Timeline: 2026 into 2027.
Frequently Asked Questions
Q: How are companies using synthetic data to train AI models? A: Frontier labs — Microsoft, Meta, OpenAI, and Anthropic among them — already train flagship models partly on synthetic data, manufacturing artificial records to fill gaps where real-world data is scarce, sensitive, or simply exhausted, per TechCrunch’s reporting.
Q: Is synthetic data the future of AI training in 2026? A: It’s already part of the present. The 2026 consolidation wave proves big platforms now treat synthetic data as core infrastructure, not an experiment. The open question isn’t whether it matters — it’s who controls the generation layer.
Q: Will synthetic data replace real-world data for training LLMs? A: No — it’s blending with real data, not replacing it. Synthetic records fill gaps and protect privacy, but models still need real-world signal to stay grounded. The realistic future is hybrid datasets, not an all-synthetic one.
The Bottom Line
The synthetic-data sector isn’t dying — it’s being absorbed into the platforms that depend on it. Standalone vendors are the endangered species now, and the layer that manufactures AI’s training data is consolidating into a handful of owners. Watch the next acquisition; it tells you who’s setting the terms.
AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors