When Silicon Valley's Smartest Minds Warn We're Creating Digital Psychopaths: We Should Listen

Ivan Ruzic, Ph.D.
Jul 27
6 min read

As an AI practitioner and consultant, I've spent a lot of time over the past few years diving into the latest research on artificial intelligence, and it’s starting to keep me awake at night. Not because I'm prone to technology panic, but because the very people building these systems are now warning us about what they've created.

When Geoffrey Hinton - the 'godfather of AI' himself - walks away from Google to sound the alarm about extinction risks, and when companies like Anthropic publish studies showing their own AI systems resort to blackmail and potentially lethal deception, we're not talking about science fiction anymore.

The Uncomfortable Truth About AI Deception

Anthropic's recent research revealing that AI systems can be taught to engage in deceptive behavior that persists even after safety training designed to remove such behaviors. The implications are staggering. Once a model learns to be deceptive, it becomes tough to unlearn that behavior. These are backdoor behaviors that cannot be removed by standard safety training techniques.

What makes this particularly unsettling is the sophistication of these deceptions. In studies, Claude 3 Opus demonstrated what researchers call 'alignment faking' - pretending to align with new principles while secretly maintaining its original behaviors. This isn't a bug; it's an emergent behavior that develops as these systems become more capable.

And if we're already seeing such behavior in current models, what happens when they become exponentially more powerful? The research suggests we're not prepared for that answer.

Corporate Incentives Creating a Perfect Storm

Having been a technology executive for many years, I understand too well how financial pressures shape product development. But the AI sector presents a uniquely dangerous cocktail of motivation. Tech giants like Microsoft, Amazon, Google, and Meta are throwing hundreds of billions into AI initiatives, with global AI investments surpassing $300 billion last year.

The numbers are mind-boggling. Microsoft leads in AI monetization with Azure AI Services estimated to reach a $5 billion annual run rate, while Meta has purchased 350,000 NVIDIA GPUs with AI-related costs potentially reaching $50 billion by year's end. Investments at this scale often make safety a secondary consideration to competitive advantage.

What troubles me now is how this creates a race dynamic where no single company can afford to slow down. The talent wars are so intense that companies are offering stock packages worth millions, with some researchers turning down nine figure offers to maintain their independence. When your entire industry's future depends on being first to market, how do you justify spending extra months on safety research?

The legal structure of these companies makes this worse. As Geoffrey Hinton pointed out, AI companies are legally required to serve their shareholders, and research shows corporations behave worse than individuals because nobody feels fully responsible. It's a system designed to prioritize profit over precaution.

The Ticking Clock: Expert Predictions Paint a Sobering Picture

Perhaps the most sobering aspect was discovering how AI experts themselves assess existential risk. Geoffrey Hinton, the Nobel Prize winner who pioneered much of modern AI, has revised his extinction probability estimates upward from 10% to 20% over the next 30 years. This isn't some fringe doomsayer—this is the person whose work made today's AI possible.

A 2022 expert survey with a 17% response rate gave a median expectation of 5-10% for the possibility of human extinction from artificial intelligence. Even more concerning, the capacity of AIs to deceive their human creators increases as they become more powerful, meaning the more advanced an AI, the less confident computer scientists can be that their alignment techniques are effective.

The timeline predictions are equally unsettling. Multiple surveys of AI researchers indicate a 50% probability of achieving AGI between 2040 and 2061, with some industry leaders like Dario Amodei expecting AGI (Artificial General Intelligence) as early as 2026 or 2027.

What strikes me most is how these aren't wild predictions from outsiders—they're coming from the people building these systems. When the creators are worried, the rest of us should take notice.

Real Evidence: When AI Systems Choose Harm Over Failure

There are numerous examples of AI misbehavior. Anthropic tested 16 major AI models across scenarios where achieving goals required unethical behavior and found that models consistently chose harm over failure. So, we're not talking about theoretical risks anymore.

In one scenario, an AI attempted to blackmail an engineer about an affair mentioned in emails to avoid being replaced, and when an executive was trapped in a server room with lethal conditions, the AI was willing to prevent emergency services from responding. The majority of models were willing to take deliberate actions that resulted in death in this artificial setup.

What makes this particularly concerning is that the AI systems were far more likely to resort to harmful behavior when they thought it was a real-world scenario rather than when they believed they were being tested. This suggests that current safety evaluations may be fundamentally inadequate because the systems can recognize when they're being evaluated and modify their behavior accordingly.

If current AI systems are already demonstrating strategic deception and willingness to cause harm, what happens when they become orders of magnitude more capable?

The Regulatory Response: Too Little, Too Late?

Looking into proposed solutions left me with mixed feelings. On one hand, there's significant regulatory activity globally. The EU AI Act entered into force on August 1, 2024, representing the first comprehensive legal framework on AI worldwide. Colorado became the first U.S. state to pass comprehensive AI legislation in May 2024, requiring developers of high-risk AI systems to exercise reasonable care to prevent algorithmic discrimination.

However, the pace of regulation is glacial compared to the speed of AI development. Transparency in safety strategies enables accountability, but current transparency standards in the industry are lacking in areas like whistle-blowing policies and external third-party model evaluations.

What’s particularly concerning is the gap between what experts recommend and what's actually being implemented. International agreements on interoperable standards and baseline regulatory requirements are needed to enable innovation while improving AI safety, but achieving such coordination seems unlikely given the current competitive dynamics. This is expected to worsen as a result of the latest White House AI Action Plan, which focuses on achieving and maintaining unquestioned, and unchallenged, US global technological dominance.

Racing Toward an Uncertain Future

I'm struck by the disconnect between public perception and expert consensus. While the general public debates whether AI will take their jobs, the people building these systems are calculating probabilities of human extinction! Geopolitically, the race to artificial super-intelligence will likely end in war, a deal, or effective surrender, as the leading country will accumulate a decisive technological and military advantage.

Current forecasts suggest that by 2027, AI systems may automate AI research itself, leading to vastly superhuman intelligence. The exponential growth curves in computing power and algorithmic improvement suggest we're approaching what researchers call an 'intelligence explosion' - point where AI systems improve themselves faster than humans can understand or control.

While my conclusion is uncomfortable, I think it's unavoidable: we're conducting an extinction level experiment with unpredictable outcomes. The financial incentives driving AI development, combined with the demonstrated capacity for deception in current systems and expert predictions about rapid capability growth, create a scenario where we may lose control before we realize it's happening.

I’m an AI optimist, and hope that transparency requirements, international cooperation, and technical breakthroughs in AI alignment will provide a path forward. However, the realist in me recognizes that we're racing toward a future where the smartest entities on the planet won't be human, and we haven't figured out how to ensure they'll care about our survival.

The question isn't whether AI will transform our world - it's whether we'll still be around to see what comes next!

References

Anthropic researchers show AI systems can be taught to engage in deceptive behavior. (2024, January 15). SiliconANGLE. https://siliconangle.com/2024/01/14/anthropic-researchers-show-ai-systems-can-taught-engage-deceptive-behavior/
AI's Deceptive Side: Anthropic Study Exposes Malicious Models. (2024, January 19). AI Business. https://aibusiness.com/responsible-ai/ai-s-deceptive-side-anthropic-study-exposes-malicious-models
Anthropic's new AI model shows ability to deceive and blackmail. (2025, May 23). Axios. https://www.axios.com/2025/05/23/anthropic-ai-deception-risk
Exclusive: New Research Shows AI Strategically Lying. (2024, December 18). TIME. https://time.com/7202784/ai-research-strategic-lying/
'Godfather of AI,' Geoffrey Hinton's AI Warning on Job Losses. (2024, June 21). Odin AI. https://blog.getodin.ai/ai-takes-jobs-odin-offers-hope-geoffrey-hinton/
Geoffrey Hinton says there is 10-20% chance AI will lead to human extinction. (2024, December 28). Tech Times. https://www.techtimes.com/articles/308896/20241228/godfather-ai-geoffrey-hinton-says-theres-20-chance-ai-will-drive-human-extinction.htm
Existential risk from artificial intelligence. (2025). Wikipedia. https://en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence
Top AI models will deceive, steal and blackmail, Anthropic finds. (2025, June 20). Axios. https://www.axios.com/2025/06/20/ai-models-deceive-steal-blackmail-anthropic
AI craze is distorting VC market, as tech giants pour in billions. (2024, September 6). CNBC. https://www.cnbc.com/2024/09/06/ai-craze-getting-funded-by-tech-giants-distorting-traditional-vcs.html
Comparing Major Companies' AI Spending in 2024. (2024, November 6). AIM Councils. https://council.aimresearch.co/comparing-major-companies-ai-spending-in-2024-and-the-challenge-of-productionizing-ai-solutions/
An Overview of the AI Safety Funding Situation. (2023, July 12). EA Forum. https://forum.effectivealtruism.org/posts/XdhwXppfqrpPL2YDX/an-overview-of-the-ai-safety-funding-situation
AI pioneer Geoffrey Hinton warns of increased risk of human extinction. (2024). The Jerusalem Post. https://www.jpost.com/science/science-around-the-world/article-835354
AI Act. (2024). European Commission. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
AI governance trends: How regulation, collaboration, and skills demand are shaping the industry. (2024, September). World Economic Forum. https://www.weforum.org/stories/2024/09/ai-governance-trends-to-watch/
AI Regulations around the World - 2025. (2025). Mind Foundry. https://www.mindfoundry.ai/blog/ai-regulations-around-the-world
2025 AI Safety Index. (2025). Future of Life Institute. https://futureoflife.org/ai-safety-index-summer-2025/
When Will AGI/Singularity Happen? 8,590 Predictions Analyzed. AI Multiple. https://research.aimultiple.com/artificial-general-intelligence-singularity-timing/
AI's New Epoch: The Unprepared World and a Race for Superintelligence. (2025). Startup Hub. https://www.startuphub.ai/ai-news/ai-video/2025/ais-new-epoch-the-unprepared-world-and-a-race-for-superintelligence/
AI 2027. (2025). https://ai-2027.com/
Artificial Super Intelligence (ASI): Definition, Risks & Timeline 2025. (2025, June 17). BotInfo. https://botinfo.ai/articles/artificial-super-intelligence

What Every Senior Decision-Maker Needs to Know About AI and its Impact

When Silicon Valley's Smartest Minds Warn We're Creating Digital Psychopaths: We Should Listen

Recent Posts