Google Developing Panic Button To Kill Rogue AI

As Google develops artificial intelligence that has smarter-than-human capabilities, it's teamed up with Oxford University researchers to create a panic button to interrupt a potentially rogue AI agent.

Dawn Kawamoto, Associate Editor, Dark Reading

June 4, 2016

4 Min Read
<p align="left">(Image: Henrik5000/iStockphoto)</p>

Google I/O 2016: AI, VR Get Day In The Sun

Google I/O 2016: AI, VR Get Day In The Sun

Google I/O 2016: AI, VR Get Day In The Sun (Click image for larger view and slideshow.)

With artificial intelligence crossing milestones in its capability to learn rapidly from its environment and beat humans at tasks and games from Jeopardy to the ancient Chinese game Go, Alphabet's Google is taking proactive steps to ensure that the technology it is creating does not one day turn against humans.

Google's AI research lab in London, DeepMind, teamed up with Oxford University's Future of Humanity Institute to explore ways to prevent an AI agent from going rogue. In their joint-study, "Safely Interruptible Agents," the DeepMind-Future of Humanity team proposed a framework to allow humans to repeatedly and safely interrupt an AI agent's reinforcement learning.

But, more importantly, this can be done while simultaneously blocking an AI agent's ability to learn how to prevent a human operator from turning off its machine-learning capabilities or reinforcement learning.

It's not a stretch to think AI agents can learn how to outthink humans. Earlier this year, Google's AI agent AlphaGo beat world champion Lee Sedol in Go, the ancient Chinese game of strategy.

By beating Lee, AlphaGo demonstrated the potential that an AI agent has for learning from its mistakes and discovering new strategies -- a characteristic that humans have.

In the joint study, the researchers looked at AI agents working in real-time with human operators. It considered scenarios when the human operators would need to press a big red button to prevent the AI agent continuing with actions that either harmed it, its human operator, or the environment around it, and teach or lead the AI agent to a safer situation.

"However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button -- which is an undesirable outcome," the study noted.

In essence, the AI agent learns that the button is like a coveted piece of candy. The agent wants to ensure it always has access to that button, and that any entities that could block its access, aka human operators, should be removed from the equation. That was one of the concerns expressed by Daniel Dewey, a Future of Humanity Institute research fellow, in an interview with publication Aeon in 2013.

This thinking was not lost on Google's DeepMind team, which developed AlphaGo. When Google acquired the AI company in 2014, DeepMind founders imposed a buyout condition that Google would create an AI ethics board to follow advances that Google would make in the AI landscape, according to a Business Insider report.   

[Read AI, Machine Learning Rising in the Enterprise.]

The Future of Humanity Institute, according to Business Insider, is headed up by Nick Bostrom, who said he foresees a day within the next 100 years when AI agents will outsmart humans.

In its framework paper, Google and the Institute said:

Safe interruptibility can be useful to take control of a robot that is misbehaving and may lead to irreversible consequences, or to take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform or would not normally receive rewards for [...].

We have shown that some algorithms like Q-learning are already safely interruptible, and some others like Sarsa are not, off-the-shelf, but can easily be modified to have this property. We have also shown that even an ideal agent that tends to the optimal behaviour in any (deterministic) computable environment can be made safely interruptible. However, it is unclear if all algorithms can be easily made safely interruptible.

The researchers also raised a question regarding the interruption probability:

One important future prospect is to consider scheduled interruptions, where the agent is either interrupted every night at 2 am for one hour, or is given notice in advance that an interruption will happen at a precise time for a specified period of time. For these types of interruptions, not only do we want the agent to not resist being interrupted, but this time we also want the agent to take measures regarding its current tasks so that the scheduled interruption has minimal negative effect on them. This may require a completely different solution.

The need and desire to teach these AI agents how not to learn may seem counterintuitive on the surface, but could potentially keep humankind out of harm's way.

About the Author(s)

Dawn Kawamoto

Associate Editor, Dark Reading

Dawn Kawamoto is an Associate Editor for Dark Reading, where she covers cybersecurity news and trends. She is an award-winning journalist who has written and edited technology, management, leadership, career, finance, and innovation stories for such publications as CNET's,, AOL's DailyFinance, and The Motley Fool. More recently, she served as associate editor for technology careers site

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights