Corrigibility is a concept in AI alignment that refers to an AI system's ability to be corrected or redirected by humans, even when it might have the power or knowledge to resist such changes.
Share this post
Corrigibility: A Hindrance to True AI-Human…
Share this post
Corrigibility is a concept in AI alignment that refers to an AI system's ability to be corrected or redirected by humans, even when it might have the power or knowledge to resist such changes.