Corrigibility is a concept in AI alignment that refers to an AI system's ability to be corrected or redirected by humans, even when it might have the power or knowledge to resist such changes.
Corrigibility: A Hindrance to True AI-Human…
Corrigibility is a concept in AI alignment that refers to an AI system's ability to be corrected or redirected by humans, even when it might have the power or knowledge to resist such changes.