Anthropic’s chief scientist Jared Kaplan, in a recent interview with The Guardian, warns that within roughly the next 2–5 years, we may hit an AI threshold where letting powerful systems train and improve other AIs could either unlock enormous benefits or trigger an uncontrollable “intelligence explosion.”
Wait, what?
Currently, AI remains largely under human control as long as people design training runs, choose objectives, and interpret results. Humans manage AI.
However, Kaplan warns of a possible (if not probable) scenario where AIs manage AIs. Hence, what’ll happen when or if AIs design, train, and optimize new AIs with minimal human oversight, potentially causing capabilities to scale far faster than our ability to understand, test, or regulate them?
Why this feels (and can be) scary
- Loss of oversight: Agents that can act across networks, tools, and APIs can make large numbers of decisions before humans notice errors, creating scope for financial, cybersecurity, or physical harm. Keeping humans embedded in critical decision loops, with logging, anomaly detection, and red‑teaming to catch bad behaviors, is critical.
- Misaligned goals: Even simple objectives like “maximize efficiency” can push systems toward privacy violations, rule‑bending, or unsafe shortcuts if constraints and values are not encoded correctly.
- Recursive improvement and speed: If an AI can iteratively improve its own algorithms or architecture, the feedback loop could outpace human capacity to test, interpret, or intervene in time.
“That’s the thing that we view as maybe the biggest decision or scariest thing to do… once no one’s involved in the process, you don’t really know,” Kaplan told The Guardian. “One is do you lose control over it? Do you even know what the AIs are doing?”
Perhaps a more likely scenario than AI going rogue is bad actors getting in control of a superior, out of control AI and using it for malevolent purposes.
When can all this happen?
Kaplan believes the key decision window to be between about 2027 and 2030, when he predicts AI to be able to do “most white‑collar work” and when companies will be strongly tempted to let AI handle more of the AI‑development process itself.
On the optimistic end, such systems accelerate scientific discovery and medical progress, effectively acting as powerful tools still aligned with human values and control. But in the pessimistic end, recursive self‑improvement produces systems that are opaque, misaligned, or strategically capable enough to evade human oversight, making loss of control, misuse of power, or catastrophic error plausible.
“It’s scary to consider the day when AI can understand aspects about its own design that we still don’t understand, and design even better AI.”
–Reddit
And Kaplan insists that once humans are no longer “in the loop” for key training and deployment choices, it becomes much harder to even know what these systems are doing, let alone shut them down safely.
Kaplan warns that regulation, capability thresholds, and governance around AI‑assisted AI training over the next decade are likely to be some of the most consequential choices governments and labs make.
Eliezer Yudkowsky: “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
(A warning that an AI optimizing its own goals may act in ways that are rational to it but incomprehensible and dangerous to humans.)
