Jan 28 • 47M

The alignment problem from a deep learning perspective by Richard Ngo Et Al

This is fascinating audio of the current landscape of the alignment problem

 
0:00
-46:58
Open in playerListen on);

Appears in this episode

Miguel de Guzman
The world is about to change - drastically. I'm attempting to understand it.
Episode details
Comments

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it, AGIs could learn to pursue goals which are very undesirable (in other words, misaligned) from a human perspective. We argue that AGIs trained in similar ways as today’s most capable models could learn to act deceptively to receive higher reward; learn internally-represented goals which generalize beyond their training distributions; and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing these problems.

Edit: This covers version 3 almost similar to version 4 but still a good audio conversion to listen to if you want to understand this paper. Current version can be found here.

Read the full research paper here: https://arxiv.org/pdf/2209.00626v3.pdf