The Dead End Nobody in Silicon Valley Will Admit
I want to be upfront that this one is an argument, not a report. Plenty of serious researchers disagree with me here, and I could be wrong. But I keep coming back to the same conclusion every time I sit down and actually think through what these systems are doing mechanically, so I want to lay it out.
What I think these systems actually are
Strip away the demo videos and the press releases about models "reasoning" or "understanding" something, and here's what's left underneath: a system trained on an enormous amount of human-written text, optimized to predict the most statistically likely next token given whatever came before it. That's the mechanism. I don't think that's a minor technical detail, I think it's the whole ballgame.
When a model "solves" a problem, my read is that it's pattern-matching against a huge number of similar problems it's seen solved before and producing output that resembles a solution. When it appears to "reason," I think it's generating text that looks like reasoning because reasoning-shaped text showed up often enough in training data to get heavily weighted. That can still be extremely useful. I just don't think it's the same thing as understanding.
Why I think there's a hard ceiling on the data itself
My core argument is pretty simple: I don't see how you train a system to be smarter than the substrate it's learning from. These models learn from human output, and human output is bounded by what humans have actually figured out and written down. We haven't written down what we don't know, and the things we don't know are exactly where genuine new intelligence would need to operate.
Cancer isn't unsolved because oncologists haven't read enough papers. The existing literature doesn't contain the answer, that's the actual definition of an unsolved problem. I don't think more compute or a bigger training run changes that. A model can learn the map extraordinarily well. I don't think that's the same as being grounded in the territory the map describes, in causality, in physical reality, in generating a genuinely new hypothesis that goes beyond what's already in the training data.
The scaling wall, as I see it
I think the industry already half-knows this even if almost nobody says it in public. The jump from GPT-3 to GPT-4 was obviously dramatic. The jumps between more recent frontier models have felt smaller to me, and benchmarks that used to differentiate models are increasingly saturated. That's the diminishing-returns problem in practice: at some point you need a huge increase in compute for a tiny increase in capability, and I don't think that math keeps working forever, financially or otherwise.
The current fixes, more inference-time compute, mixture-of-experts architectures, training on synthetic data generated by other models, all strike me as smart patches rather than a way around the underlying limit. Training a model on another model's outputs feels to me like teaching a student from their own notes. You're not adding new information to the system, you're just compressing and slightly distorting what was already there.
Why I don't buy the consciousness talk
There's a softer version of the AGI claim that gets floated a lot, the idea that a sufficiently scaled model might develop real understanding or even something like consciousness. I don't think that's a scientific claim. I think it's a marketing one.
As far as I understand it, consciousness requires some kind of persistent experience over time, and these models don't have that, each conversation starts from nothing. It seems to require grounding in physical reality, and these models only have words pointing to other words, with no connection to the world those words describe. And it requires some kind of actual subjective experience, which I don't see any architectural mechanism in next-token prediction that would produce.
John Searle's Chinese Room argument, from 1980, described more or less this exact scenario: a system that manipulates symbols according to rules and produces correct-looking outputs with zero understanding of what any of it means. I don't think how sophisticated the symbol manipulation gets changes what's actually happening underneath it.
Where I actually land
I think the current trajectory is producing genuinely useful, increasingly powerful narrow tools, and I don't want to undersell that. The productivity gains in synthesis, search, and code generation are real progress, and I use these tools myself.
But I don't think AGI is waiting at the end of this particular road, and I suspect whatever architecture eventually gets there looks pretty different from a transformer trained on internet text. The researchers I find most convincing on this, the ones working on world models, neurosymbolic systems, neuromorphic computing, generally aren't the ones on the big conference stages or filing for trillion-dollar IPOs. I think that's worth noticing. The people with the most to lose from "the current approach is a dead end" being true are, unsurprisingly, the ones telling you it isn't.
I don't think the interesting question is when a chatbot becomes conscious. I think it's what comes after this architecture, who ends up building it, and whether there's any capital left over for that work once everyone's finished spending it on scaling the thing I think is already running into its limits.
Again, I'm a student making an argument, not a researcher with a lab, so weigh this accordingly. But it's the conclusion I keep arriving at.
Sources: background on scaling law discourse and frontier model benchmark comparisons drawn from public model releases and technical commentary; Searle's Chinese Room argument (Searle, "Minds, Brains, and Programs," 1980).
Comments
Post a Comment