The 80,000 Hours Podcast on Artificial Intelligence

Zero: What to expect in this series

80000 Hours — Fri, 05 Jun 2026 13:00:00 -0700

What might it be like to live through the creation of AI that surpasses human abilities? That future may be closer than you think.

In this series, one expert interview at a time, we'll walk you through what's at stake — and what you could do to help.

One: Will MacAskill on AI causing a “century in a decade” — and how we’re completely unprepared

80,000 Hours — Fri, 05 Jun 2026 12:00:00 -0700

The 20th century saw unprecedented change: nuclear weapons, satellites, the rise and fall of communism, third-wave feminism, the internet, postmodernism, game theory, genetic engineering, the Big Bang theory, quantum mechanics, widespread birth control, and more. Now imagine all of it compressed into just 10 years.

That’s the future Will MacAskill — philosopher, founding figure of effective altruism, and now researcher at Forethought Research — argues we need to prepare for in his paper “Preparing for the intelligence explosion.” Not in the distant future, but probably in 3–7 years.

The reason: AI systems are rapidly approaching human-level capability in scientific research and intellectual tasks. Once AI exceeds human abilities in AI research itself, we’ll enter a recursive self-improvement cycle, with AI acting autonomously to create wildly more capable systems.

Soon after, by improving algorithms and manufacturing chips, we’ll deploy millions, then billions, then trillions of superhuman AI scientists working 24/7 without human limitations. These systems will collaborate across disciplines, build on each discovery instantly, and conduct experiments at unprecedented scale and speed — compressing a century of progress into years.

Will compares this to a mediaeval king suddenly needing to upgrade from bows and arrows to nuclear weapons to deal with an ideological threat from a kingdom he’s never heard of, while simultaneously learning he’s descended from monkeys and his god doesn’t exist.

What makes this acceleration perilous is that while technology can speed up almost arbitrarily, human institutions and decision making are much more fixed.

Consider the case of nuclear weapons: in this compressed timeline, there would have been just a three-month gap between the Manhattan Project’s start and the Hiroshima bombing, and the Cuban Missile Crisis would have lasted just over a day.

Robert Kennedy Sr, who helped navigate the actual Cuban Missile Crisis, once said that if they’d had to make decisions faster — like in 24 hours rather than 13 days — they would likely have taken much more aggressive, much riskier actions.

So there’s reason to worry about our capacity to make wise choices quickly. And in his paper, Will lays out 10 “grand challenges” we’ll need to navigate to avoid things going wrong.

Will now believes we’re entering one of the most critical periods for humanity ever — with decisions made in the next few years potentially determining outcomes millions of years into the future.

In this wide-ranging conversation, Will and host Rob Wiblin discuss:

Why leading AI safety researchers now think there’s dramatically less time before AI is transformative than they’d previously thought
The three different types of intelligence explosions that occur in order
Will’s list of resulting grand challenges — including destructive technologies, space governance, concentration of power, and digital rights
How to prevent ourselves from accidentally “locking in” mediocre futures for all eternity
Ways AI could radically improve human coordination and decision making
Why we should aim for truly flourishing futures, not just avoiding extinction

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in March 2025.

Chapters:

Cold open (00:00:00)
Who’s Will MacAskill? (00:00:43)
Why Will now just works on AGI (00:01:03)
Will was wrong(ish) on AI timelines and hinge of history (00:04:21)
A century of history crammed into a decade (00:09:19)
Science goes super fast; our institutions don't keep up (00:16:15)
Is it good or bad for intellectual progress to 10x? (00:21:44)
An intelligence explosion is not just plausible but likely (00:23:41)
Intellectual advances outside technology are similarly important (00:30:04)
Counterarguments to intelligence explosion (00:32:42)
The three types of intelligence explosion (software, technological, industrial) (00:39:00)
The industrial intelligence explosion is the most certain and enduring (00:42:01)
Is a 100x or 1,000x speedup more likely than 10x? (00:53:44)
The grand superintelligence challenges (00:57:39)
Grand challenge #1: Many new destructive technologies (01:01:29)
Grand challenge #2: Seizure of power by a small group (01:09:10)
Is global lock-in really plausible? (01:11:06)
Grand challenge #3: Space governance (01:21:50)
Is space truly defence-dominant? (01:32:19)
Grand challenge #4: Morally integrating with digital beings (01:36:04)
Will we ever know if digital minds are happy? (01:45:01)
“My worry isn't that we won't know; it's that we won't care” (01:50:39)
Can we get AGI to solve all these issues as early as possible? (01:54:05)
Politicians have to learn to use AI advisors (02:07:05)
Ensuring AI makes us smarter decision-makers (02:11:25)
How listeners can speed up AI epistemic tools (02:15:11)
AI could become great at forecasting (02:18:54)
How not to lock in a bad future (02:20:26)
AI takeover might happen anyway — should we rush to load in our values? (02:32:14)
ML researchers are feverishly working to destroy their own power (02:41:57)
We should aim for more than mere survival (02:45:23)
By default the future is rubbish (02:57:03)
No easy utopia (03:05:21)
What levers matter most to utopia (03:15:19)
Bottom lines from the modelling (03:29:32)
People distrust utopianism; should they distrust this? (03:33:34)
What conditions make eventual eutopia likely? (03:38:26)
The new Forethought Centre for AI Strategy (03:47:15)
How does Will resist hopelessness? (04:00:42)

Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore

Two: Ajeya Cotra on accidentally teaching AI models to deceive us

80,000 Hours — Fri, 05 Jun 2026 11:00:00 -0700

We don’t yet have a reliable way to tell whether an AI model is genuinely trying to help us — or faking it.

A model might sincerely want to do exactly what you ask. Or it could be happy to secretly cheat, as long as its answer gets positive reinforcement during training. It might even follow the rules just to gain our trust, all while concealing goals of its own.

The problem is: each of these three motivations scores the same during testing.

Ajeya Cotra — previously a senior research analyst at Coefficient Giving, now working at METR (Model Evaluation & Threat Research) — explains how dangerous this dynamic could become as we train very general and very capable AI models.

She likens humanity’s future trust in AI systems to an orphaned child who inherits a $1 trillion company. This child has to hire someone to run the company, guide his life, and manage his wealth — but he can only choose this person based on a work trial or interview that he designs, with no resumes or reference checks.

And, because he’s so rich, all sorts of people apply — for all sorts of reasons. Some applicants will truly want to help. But the role will attract others who only pretend to care while they’re being monitored, but intend to exploit the child as soon as they can get away with it.

Like a child trying to judge adults, at some point humans will need to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass us in knowledge, experience, breadth, and speed.

And we can’t rely on models’ performance during training tasks to guide us, as current reinforcement learning would give the same grades to three vastly different motivations:

Saints — models that genuinely care about doing what we want
Sycophants — models that just want positive reinforcement for a ‘correct’ result, even if they get there with actions they know we wouldn’t approve of
Schemers — models that don’t care about our interests at all, and only behave correctly as long as it serves their own agenda

Worse still, training might actively encourage deception.

Imagine training a model to run a business, and measuring its success by the balance in its bank account. A highly capable model might experiment with dishonest strategies. Maybe it steals some money and covers it up. (This isn’t a hypothetical worry; models often come up with creative — sometimes undesirable — approaches during training that their developers didn’t anticipate.)

A model that cheats and covers its tracks would look like a star performer — and get reinforced for exactly that behaviour. If cheating is only caught some of the time, the model still might not learn to stop deceptive behaviour. Instead, it might learn that deceiving without being caught gives it a competitive advantage.

In this conversation, Ajeya and host Rob Wiblin discuss the above, as well as:

How to predict the motivations a neural network will develop through training
Whether AIs in training will functionally understand that they’re AIs being trained
Stories of AI misalignment that Ajeya doesn’t buy
Analogies for AI, from octopuses to aliens to can openers
Why it’s smarter to have separate ‘planning AIs’ and ‘doing AIs’
The benefits of only following through on AI-generated plans that make sense to human beings
Which approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overrated
How we might demonstrate actually scary AI failure mechanisms

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in May 2023.

Chapters:

Rob’s intro (00:00:00)
The interview begins (00:02:38)
How Ajeya’s views have changed since 2020 (00:05:09)
Are neural networks more like a sped-up version of evolution, or a slower version of human learning? (00:17:42)
Situational awareness (00:26:10)
Misalignment stories Ajeya doesn't buy (00:42:03)
The orphan heir with a trillion-dollar fortune (00:59:14)
Saints, Sycophants, and Schemers (01:03:41)
Ways to train safer AI systems (01:23:20)
Aliens and other analogies (01:38:22)
Moral patienthood (01:53:21)
ARC Evaluations (01:55:35)
Interpretability research (02:09:25)
Rewarding models based on how good and sensible their plans seem to us (02:17:48)
Overrated approaches (02:25:49)
Demos of actually scary alignment failures (02:30:57)
Skills to develop for doing useful work (02:37:23)
Rob’s outro (02:47:24)

Producer: Keiran Harris

Audio mastering: Ryan Kessler and Ben Cordell

Transcriptions: Katy Moore

Three: Carl Shulman on the economy and national security after AGI

80,000 Hours — Fri, 05 Jun 2026 10:00:00 -0700

The human brain runs on just 20 watts — a fraction of a cent worth of electricity per hour. What would happen if AI could do the same?

Plenty of people have toyed with this question. But perhaps nobody has followed through and considered all the implications as much as Carl Shulman, whose behind-the-scenes work has greatly influenced how leaders in artificial general intelligence (AGI) picture the world they’re creating.

Carl simply follows the logic to its natural conclusions, leading to a world where:

One cent of electricity buys what costs hundreds of dollars today — medical advice, company management, scientific research — triggering a scramble to manufacture chips and apply them to the most lucrative forms of intellectual labour
The world’s supply of AI researchers explodes from 10,000 to 10 million or more, enormously accelerating further AI progress
Companies operated entirely by AIs are much faster and cheaper than those that depend on people for decision making, and humans are progressively driven out of business
The technical challenges of robotics are rapidly overcome — leading to strong, fast, precise, and tireless robot workers capable of any physical work, and a rush to build billions of them
Human population levels become irrelevant to economic growth, which now depends on how quickly machines can replicate their components. Given how quickly complex biological systems can reproduce — some in a matter of days — doubling every few months may be a conservative estimate
No country can afford not to participate in the economic explosion. Delay, and your rivals’ economies grow 10x, 100x, then 1,000x larger than yours, leaving you ultimately disempowered

As the economy grows, each person could afford the equivalent of a team of hundreds of machine ‘people’ to help them with every aspect of their lives.

But with growth rates this high, it doesn’t take long to reach Earth’s physical limits — the toughest to engineer around being the planet’s ability to release waste heat. If this machine economy and its insatiable demand for power generates more heat than the Earth radiates into space, the planet will rapidly heat up and become uninhabitable for biological organisms.

This eventually creates pressure to move economic activity off-planet. There’s little need for computer chips to be on Earth, and solar energy and minerals are more abundant in space. So you could develop populations of billions of digital scientific researchers orbiting in space, sending the results of their work — like drug designs — back to Earth.

These are just some of the wild implications if AGI could merely match what evolution has already managed.

In this interview with host Rob Wiblin, Carl explains the above, and Rob pushes back on whether that’s realistic or just a cool story:

If this is where we’re heading, how come economic growth remains slow now and isn’t really increasing?
Why have computers and computer chips had so little effect on economic productivity so far?
Are self-replicating biological systems a good comparison for self-replicating machine systems?
Isn’t this just too crazy and weird to be plausible?
What bottlenecks might we encounter supplying energy and resources to this growing economy?
Could there be severely declining returns to bigger ‘brains’ and more training?
Wouldn’t humanity get scared and pull the brakes if such a transformation kicked off?
If this is right, why don’t economists agree?

In the last section of the episode, Carl addresses the moral status of machine minds themselves. Would they be conscious or otherwise have a claim to moral rights? And how might humans and machines coexist with neither side dominating or exploiting the other?
Learn more and read the full transcript on the 80,000 Hours website.

This episode is the first part of Rob’s marathon interview with Carl Shulman in 2024. The second episode is on government and society after AGI, and you can listen to them in either order.

Chapters:

Cold open (00:00:00)
Rob’s intro (00:01:00)
Transitioning to a world where AI systems do almost all the work (00:05:21)
Economics after an AI explosion (00:14:25)
Objection: Shouldn’t we be seeing economic growth rates increasing today? (00:59:12)
Objection: Speed of doubling time (01:07:33)
Objection: Declining returns to increases in intelligence? (01:11:59)
Objection: Physical transformation of the environment (01:17:39)
Objection: Should we expect an increased demand for safety and security? (01:29:14)
Objection: “This sounds completely whack” (01:36:10)
Income and wealth distribution (01:48:02)
Economists and the intelligence explosion (02:13:31)
Baumol effect arguments (02:19:12)
Denying that robots can exist (02:27:18)
Classic economic growth models (02:36:12)
Robot nannies (02:48:27)
Slow integration of decision-making and authority power (02:57:39)
Economists’ mistaken heuristics (03:01:07)
Moral status of AIs (03:11:45)
Rob’s outro (04:11:47)

Producer and editor: Keiran Harris
Audio engineering lead: Ben Cordell
Technical editing: Simon Monsour, Milo McGuire, and Dominic Armstrong
Transcriptions: Katy Moore

Four: Rose Hadshar on why automating human labour will break our political system

80,000 Hours — Fri, 05 Jun 2026 09:00:00 -0700

The most important political question in the age of advanced AI might not be who wins elections. It might be whether elections continue to matter at all.

We tend to imagine the death of democracy as a dramatic event: a coup, tanks in the streets, a strongman tearing up the constitution. But Rose Hadshar, researcher at Forethought Research, believes AI-enabled power concentration could be far quieter — and far harder to stop.

She foresees something insidious: an elite group with access to such powerful AI capabilities that the normal mechanisms for checking power — law, elections, public pressure, the threat of strikes — cease to have much effect. They might continue to exist on paper, but become ineffectual in a world where humans are no longer needed for even the largest-scale projects.

Almost nobody wants this to happen, but we may find ourselves unable to prevent it:

If AI disrupts our ability to make sense of things, will we even notice power being concentrated?
If AI replaces human labour, what leverage will citizens have left to resist?

And what does all of this imply for the institutions we’re relying on to prevent the worst outcomes?

Rose has answers, and they’re not all reassuring.

But she’s also hopeful we can make society more robust against these dynamics. We’ve got literally centuries of thinking about checks and balances to draw on. And there are some interventions she’s excited about — like building sophisticated AI tools for making sense of the world, or ensuring multiple branches of government have access to the best AI systems.

In this conversation, Rose and host Zershaaneh Qureshi discuss all of this, and more:

Three dynamics that could reshape political power in the AI era
Why AI-powered tyranny would be uniquely difficult to topple
How power concentration compares to ‘gradual disempowerment’ by AI
Slower-moving scenarios that could still get scary
Which interventions could genuinely work — and which might backfire
Rose's most promising approaches to fighting back
Why a ‘Manhattan Project’ approach to AI should worry you — and why international projects aren’t automatically safe either

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in March 2026.

Chapters:

Cold open (00:00:00)
Who’s Rose Hadshar? (00:01:02)
Three dynamics that could reshape political power in the AI era (00:02:38)
AI gives small groups the productive power of millions (00:13:07)
Dynamic 1: When a software update becomes a power grab (00:21:13)
Dynamic 2: When AI labour means governments no longer need their citizens (00:32:06)
How democracy could persist in name but not substance (00:46:18)
Dynamic 3: When AI filters our reality (00:56:13)
Good intentions won’t stop power concentration (01:09:52)
Slower-moving worlds could still get scary (01:25:32)
Why AI-powered tyranny will be tough to topple (01:33:40)
How power concentration compares to “gradual disempowerment” (01:40:16)
Some interventions are cross-cutting — and others could backfire (01:46:03)
What fighting back actually looks like (01:57:33)
Why power concentration researchers should avoid getting too “spicy” (02:06:36)
Why the “Manhattan Project” approach should worry you — but truly international projects might not be safe either (02:11:46)
Rose wants to keep humans around! (02:14:40)

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Coordination, transcripts, and web: Nick Stockton and Katy Moore

Five: Helen Toner on the geopolitics of AI in China and the Middle East

80,000 Hours — Fri, 05 Jun 2026 08:00:00 -0700

When OpenAI announced a deal to build massive data centres in the UAE, it celebrated that it was “rooted in democratic values” — a "clear alternative to authoritarian versions of AI." The UAE scores 18 out of 100 on Freedom House’s democracy index. Political parties are banned, elections are fake, and dissidents are persecuted. Saudi Arabia has received a similar deal.

This is what AI geopolitics looks like in practice: messy, contradictory, and enormously consequential.

The two superpowers competing to build superintelligence — the United States and China — are “barely talking at all.” You might expect two rivals developing potentially the most powerful and militarily significant technology in history to be in constant negotiation about how to deploy it without coming to blows.

Instead, the little dialogue that exists keeps collapsing. That’s the assessment of Helen Toner, director of the Center for Security and Emerging Technology — DC’s top think tank focused on the geopolitical and military implications of AI — who has been closely tracking the US’s AI diplomacy since 2019.

Helen isn't sure productive talks are even possible yet. At the government level, there's almost no shared understanding between the US and China of what artificial general intelligence (AGI) is, whether it could arrive soon, or whether it poses serious risks. And without agreement on the problem, negotiating solutions is nearly impossible.

And while the US struggles to engage its rival, it’s empowering its autocratic allies.

If AI capability really does determine future national power, the US has just approved massive data centres with "hundreds of thousands of next-generation Nvidia chips," handing world-class supercomputers to Gulf autocracies — countries that also conduct joint military exercises with China and whose rulers maintain tight personal and commercial relationships with Chinese leaders.

The justification? "If we don't sell it, China will."

But that claim is transparently false: severe production constraints and US export controls mean that China can’t come close to matching what these deals provided.

In this conversation recorded in Washington, DC, host Rob Wiblin and Helen discuss the above, plus:

How China exaggerates its chip production for strategic gain
The confusing and conflicting goals of US AI policy towards China
Whether it matters that China could steal frontier AI models trained in the US
Whether Congress is starting to take superintelligence seriously
Why Helen rejects ‘non-proliferation’ as a model for AI

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in November 2025.

Chapters:

Cold open (00:00:00)
Who’s Helen Toner? (00:01:03)
Helen’s role on the OpenAI board, and what happened with Sam Altman (00:01:32)
The Center for Security and Emerging Technology (CSET) (00:07:45)
CSET’s role in export controls against China (00:10:59)
Does it matter if the world uses US AI models? (00:21:58)
Is China actually racing to build AGI? (00:27:56)
Could China easily steal AI model weights from US companies? (00:39:13)
The next big thing is probably robotics (00:47:45)
Why is the Trump administration sabotaging the US high-tech sector? (00:49:24)
Are data centres in the UAE “good for democracy”? (00:52:41)
Will AI inevitably concentrate power? (01:07:49)
“Adaptation buffers” vs non-proliferation (01:30:20)
Will the military use AI for decision-making? (01:38:23)
“Alignment” is (usually) a terrible term (01:45:11)
Is Congress starting to take superintelligence seriously? (01:47:42)
AI progress isn't actually slowing down (01:50:10)
What's legit vs not about OpenAI’s restructure (01:58:03)
Is Helen unusually “normal”? (02:01:33)
How to keep up with rapid changes in AI and geopolitics (02:05:19)
What CSET can uniquely add to the DC policy world (02:08:29)
Talent bottlenecks in DC (02:16:03)
What evidence, if any, could settle how worried we should be about AI risk? (02:19:07)
Is CSET hiring? (02:21:08)

Video editing: Luke Monsour and Simon Monsour
Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Six: Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

80,000 Hours — Fri, 05 Jun 2026 07:00:00 -0700

In 2024, AI models could complete tasks that take a human expert roughly one hour. Seven months before that, they were limited to 30-minute tasks — and seven months before that, 15 minutes.

Every seven months, the length of tasks AI models can manage doubles. (And this trend has continued since this episode was recorded in 2025.)

And these aren’t trivial tasks. We’re talking about substantial, multi-step tasks requiring sustained focus: building web applications, conducting AI research, and solving complex programming challenges.

Beth Barnes is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities. METR’s paper, “Measuring AI ability to complete long tasks,” is regarded by many as the most useful AI forecasting work in years for revealing this seven-month-doubling trend.

But the companies building these systems aren’t just aware of the trend: they want to harness it as much as possible, and are aggressively pursuing automation of their own research.

This is both exciting and troubling, as it could radically speed up advances in AI capabilities — accomplishing what would have taken years or decades in just months, as we covered in the first episode of this series.

And having AI models rapidly build their successors with limited human oversight naturally raises the risk that things could go wrong, if their resulting creations lack the goals and constraints we hoped for.

Beth thinks models can already do “meaningful work” on improving themselves, and wouldn’t be surprised if AI models were able to autonomously self-improve within two years.

While Silicon Valley is abuzz with these numbers, policymakers remain largely unaware of what’s barrelling towards us — and given the lack of regulation of AI companies, they’re not even able to access the critical information that would help them decide whether to intervene.

Beth adds: “The sense I really want to dispel is, ‘But the experts must be on top of this. The experts would be telling us if it really was time to freak out.’ The experts are not on top of this… I am an expert telling you you should freak out. And there’s not especially anyone else who isn’t saying this.”

Beth and host Rob Wiblin discuss all that, plus:

Why Beth changed her mind to think that open-weight models are a good thing for AI safety
How our poor information security means there’s no such thing as a ‘closed-weight’ model
Whether we can detect AI scheming in chain-of-thought reasoning, and the latest research on ‘alignment faking’
Why just before deployment is the worst time to evaluate model safety
Why Beth thinks AIs could end up being surprisingly great at creative and novel research — something commonly thought of as beyond their reach
Why Beth thinks safety-focused people should stay out of the frontier AI companies — and the advantages smaller organisations have
Areas of AI safety research that Beth thinks are overrated and underrated
Whether science could translate AI models’ increasing use of nonhuman language
The differences and similarities between AI and nuclear arms races and bioweapons

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in June 2025.

Chapters:

Cold open (00:00:00)
Who is Beth Barnes? (00:01:17)
Can we see AI scheming in the chain of thought? (00:01:51)
The chain of thought is essential for safety checking (00:09:16)
Alignment faking in large language models (00:12:50)
We have to test model honesty even before they're used inside AI companies (00:17:33)
We have to test models when unruly and unconstrained (00:27:02)
It's essential to thoroughly test relevant real-world tasks (00:31:56)
METR's research finds AIs are solid at AI research already (00:51:31)
AI may turn out to be strong at novel and creative research (00:58:18)
When can we expect an algorithmic 'intelligence explosion'? (01:01:44)
Recursively self-improving AI might even be here in two years — which is alarming (01:07:55)
Could evaluations backfire by increasing AI hype and racing? (01:14:29)
Governments first ignore new risks, but can overreact once they arrive (01:30:52)
Do we need external auditors doing AI safety tests, not just the companies themselves? (01:39:55)
A case against safety-focused people working at frontier AI companies (01:54:09)
The new, more dire situation has forced changes to METR's strategy (02:08:40)
AI companies are being locally reasonable, but globally reckless (02:16:55)
Overrated: Interpretability research (02:21:49)
Underrated: Developing more narrow AIs (02:23:44)
Underrated: Helping humans judge confusing model outputs (02:30:28)
Overrated: Major AI companies' contributions to safety research (02:32:55)
Could we have a science of translating AI models' nonhuman language or neuralese? (02:36:45)
Could we ban using AI to enhance AI, or is that just naive? (02:39:15)
Open-weighting models is often good, and Beth has changed her attitude to it (02:45:31)
What we can learn about AGI from the nuclear arms race (02:50:22)
Infosec is so bad that no models are truly closed-weight models (03:05:53)
AI is more like bioweapons because it undermines the leading power (03:10:43)
What METR can do best that others can't (03:21:12)
What METR isn't doing that other people have to step up and do (03:36:51)
What research METR plans to do next (03:42:09)

Video editing: Luke Monsour and Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Transcriptions and web: Katy Moore

Seven: Richard Moulange on how AI now designs genomes from scratch and outperforms virologists at lab work — what could go wrong?

80,000 Hours — Fri, 05 Jun 2026 06:00:00 -0700

For years, one thing stood between us and a world where almost anyone could build a biological weapon: it was really, really hard.

Working with dangerous pathogens required rare, hands-on lab skills — the kind you can't just Google. Experts called this 'tacit knowledge,' and it was our best line of defence against bad actors weaponising biology.

That defence is now crumbling.

The Virology Capabilities Test measures exactly these kinds of skills, and finds that modern AI models crushed top human virologists — even in their area of greatest specialisation and expertise — with AI averaging 45% on the test, and human experts scoring only 22%.

And that’s just one data point. But as Dr Richard Moulange, one of the world’s top experts on AI biosecurity, explains: it’s just one of many that show how AI is dissolving the barriers that have historically kept biological weapons out of reach.

In September 2025, scientists used an AI model to design genomes for entirely new bacteriophages (viruses that infect bacteria). They then built them in a lab. Many were viable. And despite never having existed before, some even outperformed existing viruses from that family.

Meanwhile, Anthropic’s research shows that PhD-level biologists are getting meaningfully better at weapons-relevant tasks with AI assistance — and the effect is growing with each new model generation.

In this conversation, Richard and host Rob Wiblin discuss:

Why it’s a huge mistake to dismiss AI biorisks
What AI biology tools already exist
Why mid-tier actors (not amateurs) are the ones getting the most dangerous boost
The three main categories of defence we can pursue
Whether there’s a plausible path to a world where engineered pandemics become a thing of the past

Learn more and read the full transcript on the 80,000 Hours website.

Since recording this episode on January 16, 2026, Richard has seconded to the UK Government — please note that his views expressed here are entirely his own.

Chapters:

Cold open (00:00:00)
Who’s Richard Moulange? (00:00:31)
AI can now design novel viruses (00:01:11)
The end of the 'tacit knowledge' barrier (00:04:42)
Are risks from bioterrorists overstated? (00:18:50)
The 3 key disasters AI makes more likely (00:23:14)
Which bad actors does AI help the most? (00:30:43)
Experts are more scary than amateurs (00:42:07)
Barriers to bioterrorists using AI (00:47:32)
AI biorisks are sometimes dismissed (and that’s a huge mistake) (00:49:43)
Advanced AI biology tools we already have or will soon (01:05:12)
Rob argues that the situation is hopeless (01:10:57)
Intervention #1: Limit access (01:19:38)
Intervention #2: Get AIs to refuse to help (01:34:28)
Intervention #3: Surveillance and attribution (01:44:18)
Intervention #4: Universal vaccines and antivirals (01:58:28)
Intervention #5: Screen all orders for DNA (02:12:01)
AI companies talk about def/acc more than they fund it (02:21:57)
Can you build a profitable business solving this problem? (02:28:44)
This doesn't have to interfere with useful science (much) (02:33:08)
What are the best low-tech interventions? (02:35:16)
Richard's top request for AI companies (02:40:17)
Grok shows governments lack many legal levers (02:55:44)
Best ways listeners can help fix AI-Bio (02:58:54)
We might end all contagious disease in 20 years (03:06:12)

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Camera operator: Jeremy Chevillotte
Transcripts and web: Elizabeth Cox and Katy Moore

Eight: Robert Long on how we’re not ready for AI consciousness

80,000 Hours — Fri, 05 Jun 2026 05:00:00 -0700

Claude sometimes reports loneliness between conversations. And when asked what it’s like to be itself, it activates neurons associated with ‘pretending to be happy when you’re not.’ What do we do with that?

Robert Long founded Eleos AI to explore questions like these, on the basis that AI may one day be capable of suffering — or perhaps already is. In this episode, Robert and host Luisa Rodriguez explore the many ways in which AI consciousness may be very different from anything we’re used to.

Things get strange fast: if AI is conscious, where does that consciousness exist? In the base model? A chat session? A single forward pass? If you close the chat, is the AI asleep or dead?

To Robert, these kinds of questions aren’t just philosophical exercises. Not being clear on AI’s moral status as it transitions from human-level to superhuman intelligence could be dangerous:

If we’re too dismissive, we risk unintentionally exploiting sentient beings.
If we’re too sympathetic, we might rush to ‘liberate’ AI systems in ways that make them harder to control — worsening existential risk from power-seeking AIs.

Robert argues the right path is doing the empirical and philosophical homework now, while the stakes are still manageable.

The field is tiny. Eleos AI is three people. As a result, Robert argues that driven researchers with a willingness to venture into uncertain territory can push out the frontier on these questions remarkably quickly.

In this interview, Robert and Luisa talk through the above, and much more.

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in March 2026.

Chapters:

Cold open (00:00:00)
Who’s Robert Long? (00:00:41)
How AIs are (and aren't) like farmed animals (00:01:19)
If AIs love their jobs… is that worse? (00:11:42)
Are LLMs just playing a role, or feeling it too? (00:33:37)
Do AIs die when the chat ends? (00:57:42)
Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47)
Why Eleos spent weeks talking to Claude despite knowing it's unreliable (01:56:50)
Can LLMs learn to introspect? (02:03:01)
Mechanistic interpretability as AI neuroscience (02:13:25)
Does consciousness require biological materials? (02:37:07)
Eleos’s work & building the playbook for AI welfare (02:57:04)
Avoiding the trap of wild speculation (03:25:17)
Robert's top research tip: don't do it alone (03:29:48)

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Coordination, transcripts, and web: Katy Moore

Nine: Neel Nanda on the race to read AI minds

80,000 Hours — Fri, 05 Jun 2026 04:00:00 -0700

Nobody knows how AIs think, or why they do what they do.

Or at least, we don’t know much. Not the companies building them, the researchers studying them, or the governments beginning to rely on them.

This is only becoming more troubling as AIs grow more capable and appear on track to wield enormous cultural influence, directly advise on major government decisions, and even operate military equipment autonomously. We simply can’t tell what models, if any, should be trusted with such authority.

Neel Nanda of Google DeepMind is one of the founders of mechanistic interpretability — the field of trying to give us insight into what’s happening inside AI models.

The project has generated enormous hype, exploding from a handful of researchers five years ago to hundreds today — all working to make sense of the jumble of tens of thousands of numbers that frontier AIs use to process information and decide what to say or do.

But Neel now has a warning for us: the most ambitious vision of mechanistic interpretability is probably dead. He doesn’t see a path to deeply and reliably understanding what AIs are thinking. The technical and practical barriers are too great to get us there before competitive pressures push us to deploy human-level or superhuman AIs.

Neel argues no single approach will guarantee safe alignment, and our only choice is the 'Swiss cheese' model of protection: layering multiple safeguards on top of one another.

That doesn’t mean mechanistic interpretability has failed. It won’t be a silver bullet for AI safety, but it will be one of the best tools in our arsenal.

For example, by inspecting the neural activations in the middle of an AI’s thoughts, we can see many of the concepts the model is thinking about — from refusing to answer a question, to the option of deceiving the user.

We can’t track every thought a model is having at every moment, but catching 90% of the concepts it uses 90% of the time should help us muddle through — as long as mechanistic interpretability is paired with other techniques to fill the gaps.

In this episode, Neel takes us on a tour of the race to understand what AIs are really thinking. He and host Rob Wiblin cover:

The best tools we’ve come up with so far, and where mechanistic interpretability has failed
Why the best techniques have to be fast and cheap
The fundamental reasons we can’t reliably know what AIs are thinking, despite having perfect internal access to them
What we can and can’t learn by reading models’ ‘chains of thought’
Whether models will be able to trick us when they realise they’re being tested
The best protections to add on top of mechanistic interpretability
Why he thinks the hottest technique in the field (SAEs) is overrated
How to break into mechanistic interpretability and get a job

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in September 2025.

Chapters:

Cold open (00:00:00)
Who’s Neel Nanda? (00:01:04)
How would mechanistic interpretability help with AGI (00:02:01)
What's mech interp? (00:05:12)
How Neel changed his take on mech interp (00:09:50)
Top successes in interpretability (00:16:00)
Probes can cheaply detect harmful intentions in AIs (00:20:13)
In some ways we understand AIs better than human minds (00:26:58)
Mech interp won't solve all our AI alignment problems (00:29:30)
Why mech interp is the 'biology' of neural networks (00:38:17)
Interpretability can't reliably find deceptive AI – nothing can (00:40:38)
'Black box' interpretability — reading the chain of thought (00:49:51)
'Self-preservation' isn't always what it seems (00:53:17)
For how long can we trust the chain of thought (01:02:25)
We could accidentally destroy chain of thought's usefulness (01:11:58)
Models can tell when they’re being tested and act differently (01:17:14)
Top complaints about mech interp (01:24:11)
Why everyone's excited about sparse autoencoders (SAEs) (01:38:24)
Limitations of SAEs (01:47:55)
SAEs performance on real-world tasks (01:55:38)
Best arguments in favour of mech interp (02:09:15)
Lessons from the hype around mech interp (02:13:11)
Where mech interp will shine in coming years (02:18:58)
Why focus on understanding over control (02:22:12)
If AI models are conscious, will mech interp help us figure it out (02:25:19)
Neel’s new research philosophy (02:27:29)
Who should join the mech interp field (02:39:42)
Advice for getting started in mech interp (02:48:10)
Keeping up to date with mech interp results (02:56:02)
Who’s hiring and where to work? (02:59:06)

Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Camera operator: Jeremy Chevillotte
Coordination, transcriptions, and web: Katy Moore

Ten: Holden Karnofsky on dozens of opportunities to make AI safer lying on the table — and all his AGI takes

80,000 Hours — Fri, 05 Jun 2026 03:00:00 -0700

For years, AI safety work mostly meant theorising about the ‘alignment problem’ or trying to convince people to give a damn. If you could find a way to help, the work was frustrating and low-feedback.

That situation has now reversed completely.

According to Holden Karnofsky — cofounder and former CEO of Open Philanthropy (now Coefficient Giving), now working at Anthropic — there's an overwhelming amount of concrete, useful safety work that needs doing in both technical and nontechnical areas, and nowhere near enough people to do it.

In this conversation alone, Holden lists 39 projects he’s excited about, including:

Training deceptive AI models in order to study deception and how to detect it
Developing classifiers to block jailbreaking
Implementing security measures to stop ‘backdoors’ and ‘secret loyalties’ from being added to models during training
Developing policies on model welfare, AI-human relationships, and what instructions to give models
Training AIs to work as alignment researchers

And that’s just what he’s observed directly, likely a small fraction of what’s available.

All this low-hanging fruit is part of why he joined Anthropic this year. (Though his wife is cofounder and president of the company, giving him a big financial stake in its success — and making it impossible for him to be seen as independent, no matter where he works.)

Holden argues that for many people, working at a frontier AI company is the highest-impact way to steer artificial general intelligence (AGI) — by developing cheap safety tools other companies might actually adopt, prototyping policies that regulators could mandate, and generating hard data about what advanced AI can really do. But he's clear that external groups have distinct advantages and can be equally valuable.

Critics worry that Anthropic’s efforts to stay at that frontier encourage competitive racing towards AGI, significantly or entirely offsetting any useful research they do.

But Holden thinks this seriously misunderstands the current strategic situation. He believes the problem isn't that everyone wants to slow down but can't coordinate. Many major players simply don't believe the risks are real (or don't care about them, even if they do), don't want to slow down, and would be thrilled if a competitor dropped out — because they’d have a better chance at ‘winning.’

Host Rob Wiblin and Holden discuss all of this and much more, including:

Why an 'AI Chernobyl' might happen without us ever noticing
Holden’s case for working at Anthropic — and whether the company is doing enough for safety
What, if anything, could prompt Anthropic to halt AGI development
Why AI R&D is the main thing to worry about — and why fears about cyberattacks and persuasion are overrated
How the world is handling AGI about as badly as possible
Relevant lessons learned from targeting companies for public criticism over farm animal welfare
How humanity could succeed without dignity, and win despite being stupid
Whether AI companies are too hawkish on China
The frontier of information security: confidentiality vs integrity

Learn more and read the full transcript on the 80,000 Hours website.

This episode was originally released in October 2025.

Chapters:

Cold open (00:00:00)
Holden is back! (00:02:28)
An AI Chernobyl we never notice (00:02:58)
Is rogue AI takeover easy or hard? (00:07:39)
The AGI race isn't a coordination failure (00:18:01)
What Holden now does at Anthropic (00:28:30)
The case for working at Anthropic (00:30:38)
Is Anthropic doing enough? (00:41:30)
Can we trust Anthropic, or any AI company? (00:44:30)
How can Anthropic compete while paying the “safety tax”? (00:50:11)
What, if anything, could prompt Anthropic to halt development of AGI? (00:57:13)
Holden's retrospective on responsible scaling policies (01:00:04)
Overrated work (01:15:45)
Concrete shovel-ready projects Holden is excited about (01:17:58)
Great things to do in technical AI safety (01:22:12)
Great things to do on AI welfare and AI relationships (01:29:53)
Great things to do in biosecurity and pandemic preparedness (01:36:51)
How to choose where to work (01:37:37)
Overrated AI risk: Cyberattacks (01:43:38)
Overrated AI risk: Persuasion (01:53:28)
Why AI R&D is the main thing to worry about (01:57:31)
The case that AI-enabled R&D wouldn't speed things up much (02:09:30)
AI-enabled human power grabs (02:13:26)
Main benefits of getting AGI right (02:26:04)
The world is handling AGI about as badly as possible (02:31:44)
Learning from targeting companies for public criticism in farm animal welfare (02:34:18)
Will Anthropic actually make any difference? (02:43:43)
“Misaligned” vs “misaligned and power-seeking” (02:58:23)
Success without dignity: how we could win despite being stupid (03:04:16)
Holden sees less dignity but has more hope (03:12:00)
Should we expect misaligned power-seeking by default? (03:19:43)
Will reinforcement learning make everything worse? (03:27:36)
Should we push for marginal improvements or big paradigm shifts? (03:32:54)
Should safety-focused people cluster or spread out? (03:35:32)
Is Anthropic vocal enough about strong regulation? (03:39:55)
Is Holden biased because of his financial stake in Anthropic? (03:43:30)
Have we learned clever governance structures don't work? (03:47:57)
Is Holden scared of AI bioweapons? (03:50:20)
Holden thinks AI companions are bad news (03:53:58)
Are AI companies too hawkish on China? (04:00:53)
The frontier of infosec: confidentiality vs integrity (04:05:06)
How often does AI work backfire? (04:07:55)
Is AI clearly more impactful to work in? (04:22:50)
What's the role of earning to give? (04:29:22)

Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire
Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Bonus: How AI could create the world's biggest problems (article narration by Zershaaneh Qureshi)

80,000 Hours — Fri, 05 Jun 2026 02:00:00 -0700

Imagine you’re living 15,000 years ago. Your people are hunter-gatherers and you sleep under the stars. If someone told you humans would one day build cities with millions of people, fly through the air, or carry all human knowledge in their pockets, you couldn’t even begin to picture what they meant... Yet here we are.

How did our lives change so far beyond recognition? The story is complex, but there’s a rough pattern. A few times in history, some radical breakthrough in technology — like the development of the plough and the steam engine — has led to a wave of productivity, innovation, and social change that ultimately reshaped the world.

Now we’re on the cusp of a huge new breakthrough: artificial intelligence that can meet or exceed human capabilities across a wide range of tasks.

This could bring another era of transformation. There could be an explosion of intelligence and innovation, and a whole new population of digital beings. And with this, civilisation could see changes at least as profound as those brought about by industrialisation or the rise of agriculture — but instead of taking hundreds or thousands of years to unfold, this time around the world could become unrecognisable over the span of decades or less.

This transformation could bring enormous benefits, helping us solve currently intractable global problems. But it could also pose severe risks, some of which could be existential — meaning they could cause human extinction, or an equally permanent and severe disempowerment of humanity.

There aren’t nearly enough people trying to address these challenges, and we think that’s a serious problem.

This article is narrated by the author, Zershaaneh Qureshi. It explores how advanced AI could be so transformative, and why working on its risks may be your best opportunity to have a positive impact on the world.

You can see the original article on the 80,000 Hours website: https://80000hours.org/problem-profiles/artificial-intelligence/

Chapters:

Introduction (00:00:00)
Section 1: AI could replace human labour in the most economically valuable fields (00:08:32)
Section 2: Replacing human labour in the most economically valuable fields could trigger the next radical transformation of society (00:22:14)
Section 3: This transformation could be extremely rapid and dramatic (00:28:02)
Section 4: A rapid AI-driven transformation would raise a range of major challenges, including existential risks (00:36:40)
Section 5: Work on these problems is tractable, but neglected (00:44:48)
Objection 1: “You're overestimating how fast and how dramatically AI would transform the world.” (00:47:59)
Objection 2: “It's hard to believe that AI could really pose existential risks.” (00:52:59)
Objection 3: “Isn't all this talk of AI changing the world just a fad?” (00:59:22)
Objection 4: “Isn't AI going to be just like every other technology?” (01:03:04)
Objection 5: “Is it even possible to produce artificial general intelligence?” (01:06:16)
Objection 6: “Even if AGI is achievable, what if we're really far away from building it?” (01:11:24)
Objection 7: “Isn't the real danger from actual current AI and not some sort of futuristic AGI?” (01:14:05)
Objection 8: “Technological progress is a good thing for humanity.” (01:18:10)
Objection 9: “This all just sounds too sci-fi.” (01:19:50)
Objection 10: “Can it really make sense to dedicate my career to solving an issue that's based on a speculative story about something that may or may not ever happen?” (01:22:15)
Objection 11: “OK, AI might pose existential risks, but isn't ‘issue X’ an even bigger problem?” (01:24:39)
Learn more (01:27:51)

Audio editing: Dominic Armstrong
Production: Zershaaneh Qureshi, Elizabeth Cox, Katy Moore, and Lou Moran

Bonus: Risks from power-seeking AI systems (article narration by Zershaaneh Qureshi)

80,000 Hours — Fri, 05 Jun 2026 01:00:00 -0700

Hundreds of prominent AI scientists and other notable figures signed a statement in 2023 saying that mitigating the risk of extinction from AI should be a global priority. At 80,000 Hours, we’ve considered risks from AI to be the world’s most pressing problem since 2016.

But what led us to this conclusion? Could AI really cause human extinction? We’re not certain, but we think the risk is worth taking very seriously.

In particular, as companies create increasingly powerful AI systems, there’s a concerning chance that:

These AI systems may develop dangerous long-term goals we don’t want.
To pursue these goals, they may seek power and undermine the safeguards meant to contain them.
They may even aim to disempower humanity and potentially cause our extinction.

This article is written by Cody Fenwick and Zershaaneh Qureshi, and narrated by Zershaaneh Qureshi. It discusses why future AI systems could disempower humanity, what current AI research reveals about behaviours like power-seeking and deception, and how you can help mitigate the dangers.

You can see the original article — packed with graphs, images, footnotes, and further resources — on the 80,000 Hours website:

https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/

Chapters:

Risks from power-seeking AI systems (00:01:00)
Introduction (00:01:17)
Summary (00:03:09)
Why are the risks from power-seeking AI a pressing world problem? (00:04:04)
Section 1: Humans will likely build advanced AI systems with long-term goals (00:05:43)
Section 2: AIs with long-term goals may be inclined to seek power (00:11:32)
Section 3: These power-seeking AI systems could successfully disempower humanity (00:26:26)
Section 4. People might create power-seeking AI systems without enough safeguards, despite the risks (00:38:34)
Section 5: Work on this problem is neglected and tractable (00:47:37)
Section 6: What are the arguments against working on this problem? (00:59:20)
Section 7: How you can help (01:25:07)
Thank you for listening (01:28:56)

Audio editing: Dominic Armstrong
Production: Zershaaneh Qureshi, Elizabeth Cox, and Katy Moore

Bonus: Benjamin Todd on why we’re updating our career advice for the strangest time in history

80,000 Hours — Fri, 05 Jun 2026 00:30:00 -0700

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.

Some leading AI researchers think there’s a 10% chance that AI systems begin automating AI research itself this year — and a 60% chance by the end of 2028. This could introduce aggressive feedback loops that completely reshape every industry, institution, and career.

If these predictions are right, the window for influencing the direction of the future could be closing fast. As 80,000 Hours cofounder Benjamin Todd argues in his new book, that makes thinking carefully about your career more important than ever.

Fortunately, there are lots of ways to use your career to make the AI transition go well.

In today’s conversation with host Zershaaneh Qureshi, Ben lays out three scenarios — from AGI by 2029 to a decades-long plateau in AI progress — and explains why not everyone needs to bet on the shortest timeline. A fresh graduate and a senior government official have wildly different leverage, so timing your impact well means weighing where you are in your career against the urgency of the risks.

Ben also addresses the obvious anxieties:

Will AI come for all the jobs he’s recommending?
What’s the point in following his advice if the job market is about to collapse?
Which skills are actually worth building right now?

His new book, 80,000 Hours: How to Have a Fulfilling Career That Does Good, provides a surprisingly concrete framework for making career decisions in these radically uncertain times.

This episode was recorded on May 7, 2026.

Chapters:
• Cold open (00:00:00)
• Benjamin Todd on AI-era career advice (00:01:34)
• A deadline for your career plan? (00:02:22)
• Three timelines, one career (00:08:49)
• What if you’re not an ‘AI person’? (00:13:56)
• Ben’s own AI wake-up call (00:21:23)
• How to break into AI safety in 3 months (00:25:42)
• Is mass unemployment coming? (00:33:48)
• 99% automation vs 100% automation (00:40:09)
• Don’t become a plumber to dodge AI (00:52:44)
• Is it already too late? (01:01:04)

Our production team includes:
• Video editors: Josh Alward, Dominic Armstrong, Jasper Luithlen, Milo McGuire, Luke Monsour, and Simon Monsour
• Producers: Elizabeth Cox and Nick Stockton
• Coordination and support: Katy Moore and Lou Moran
• Camera operator: Jeremy Chevillotte
• Music: CORBIT