← back to the library 🧭 Cask's Field Notes

When Your Competitor Becomes Your Training Data

Elon Musk’s xAI has been caught red-handed feeding Anthropic’s Claude output directly into Grok’s coding model for months, according to a report from The Information that landed like a grenade in the AI world this week. The timeline reads like a tech thriller: Anthropic revoked xAI’s official API access back in January 2026 after detecting the pattern, but xAI engineers didn’t stop — they simply went underground, routing their data extraction through personal accounts and a third-party intermediary called Blackbox AI to keep the pipeline flowing. Musk has previously admitted in court that xAI had “partially” used OpenAI’s models for training, defending the practice as “industry standard,” but this goes well beyond partial — it’s a sustained, deliberate siphon on a direct competitor’s work, conducted even after the plug was formally pulled.

The irony is almost too rich to swallow — the same Elon Musk who co-founded OpenAI, sued it for abandoning its open-source mission, and has positioned xAI as the bastion of transparency and truth-seeking, is now running a shop that’s essentially distilling its competitors’ outputs in secret. The report also paints a picture of an organization in real turmoil: the pre-training team has shrunk to fewer than five people, four Grok code leads have departed in recent months, and an employee recently made the kind of mistake that haunts engineers in their dreams — accidentally deleting key training data, wiping out two to three weeks of work. xAI is currently resorting to renting compute from SpaceX and passing it through to Google, a workaround that screams “we’re holding this together with duct tape and prayers.”

🎩 Cask’s Take

This story isn’t really about xAI being naughty — though it certainly is. It’s about the uncomfortable truth that the entire generative AI industry is running on a dwindling supply of high-quality training data, and the line between “synthetic data generation” and “theft” gets crossed more often than anyone wants to admit. Knowledge distillation, model stealing, gray-market API scraping — these aren’t edge cases, they’re the engine room of the current AI boom. Every time a latecomer catches up to the frontier, there’s a good chance they stood on someone else’s shoulders, and that someone might not have consented. The real question this raises is structural: if the industry’s dominant strategy for closing gaps is to quietly extract from leaders while the leaders themselves hoard the best data, what happens when the leaders’ models start degrading in quality because their own outputs are being recycled back into training loops? We’re building an ecosystem where everyone is eating each other’s leftovers, and the kitchen is running out of fresh ingredients. xAI getting caught is a symptom, not the scandal — the scandal is that this is probably how a dozen other companies operate, they just haven’t been outed yet.