Leveraging Jungian archetypes to create values-based models
A research proposal, an AI Alignment Awards 2023 Official Entry
Abstract
We are entering a decade of singularity and great uncertainty. Across all disciplines, including wars, politics, human health, as well as the environment, there are concepts that could prove to be a double edged sword. Perhaps the most powerful factor in determining our future is how information is distributed to the public. It can be both transformational and empowering using advanced AI technology – or it can lead to disastrous outcomes that we may not have the foresight to predict with our current capabilities.
Goal misgeneralization is defined as a robustness failure for learning algorithms in which the learned program competently pursues an undesired goal that leads to good performance in training situations but bad performance in novel test situations. This research proposal tries to capture what might be a better description of this problem and solutions from a Jungian perspective.
This proposal covered key AI alignment topics, from goal misgeneralisation to other pressing issues. It offers a comprehensive approach for addressing critical questions in the field.
reward misspecification and hacking
situational awareness
deceptive reward hacking
internally-represented goals
learning broadly scoped goals
broadly scoped goals incentivizing power-seeking,
power seeking policies would choose high reward behaviors for instrumental reasons
misaligned AGIs gain control of the key levers of power
These above-mentioned topics were reviewed to check the viability of approaching the alignment problem through a Jungian approach. 3 key concepts emerged from the review:
By understanding how humans use patterns to recognize intentions at a subconscious level, researchers can leverage on Jungian archetypes and create systems that mimic natural decision-making. With this insight into human behavior, AI can be trained more effectively with archetypal data.
Stories are more universal in human thought than goals. Goals and rewards will always yield the same problems encountered in alignment research. AI systems should utilize the robustness of complete narratives to guide its responses.
Values-based models can serve as the moral compass for AI systems in determining what is a truthful and responsible response or not. Testing this theory is essential in continuing progress on alignment research.
A list of initial methodologies were added to present an overview of how the research will proceed once approved.
In conclusion, alignment research should look into the possibility of replacing goals and rewards in evaluating AI systems. By understanding that humans think consciously and subconsciously through Jungian archetypal patterns, this paper proposes that complete narratives should be leveraged in training and deploying AI models.
A number of limitations were included in the last section. The main concern is the need to hire Jungian scholars or analytical psychologists - as they will define what constitutes archetypal data and evaluate results. They will also be required to influence the whole research process with a high moral ground and diligence. They will be difficult to find.
AI systems will impact our future significantly, so it is important that they are developed responsibly. History has taught us what can happen when intentions are poorly executed: the deaths of millions through the use of wrong ideologies haunt us and remind us of the need for caution in this field.
Note to version 2 of this proposal
On February 23, 2023, a new version of "The Alignment Problem from a Deep Learning Perspective by Ngo et al., 2023" was released - just one day after this proposal had been submitted. The author recognized there were changes; however ultimately determined that the argument and content didn't alter drastically enough to demand an update.
Introduction
The impact of advanced AI systems as it gets adopted by the public is immense - how can we ensure that information is distributed to everyone truthfully and responsibly? This is important since it has the potential to be highly transformational and empowering when used wisely, but can also lead down a path towards devastating consequences if not managed with great diligence. Our success as a society hinges on taking into account a better approach in tackling the alignment problem, that any risks posed through access to this powerful technology can be mitigated.
Goal Misgeneralization as defined from Shah et al 20221:
“Goal misgeneralization is a specific form of robustness failure for learning algorithms in which the learned program competently pursues an undesired goal that leads to good performance in training situations but bad performance in novel test situations.”
I see this as very similar to aggressive toddlers2:
“Longitudinal studies show that aggressive school children are at very high risk of being violent in adolescence and beyond.”
Research often overlooks the differences between the way humans and AI systems utilize pattern recognition as a tool to explore the world. AI systems create statistical patterns based on whatever data we input into them, while humans are born with pre-installed sophisticated pattern recognition systems. By comparing this relationship, we can deepen our grasp on the universal pattern recognition that is so integral in both AI and the human experience.
Achieving alignment between humans and machines raises the need to understand how we humans see the world and interpret patterns. In this proposal, I have outlined a literature review that corresponds to Richard Ngo et al's paper on "The Alignment Problem from a Deep Learning Perspective3". Through this paper, they vividly explain three major issues in alignment where goal misgeneralisation is being one of them. The current understanding of key concepts and assumptions behind goal misgeneralisation need further refinement in order to achieve full human and AI alignment.
Literature review on the pattern recognition capabilities of both AI and humans, similarities and differences from a Jungian perspective
Why does the symbol of the Ouroboros appear in so many different places all over the world?
An ouroboros in a 1478 drawing in an alchemical tract. Source: Wikipedia.
Early alchemical ouroboros illustration with the words ἓν τὸ πᾶν ("The All is One") from the work of Cleopatra the Alchemist in MS Marciana gr. Z. 299. (10th century). Source: Wikipedia.
A highly stylised ouroboros from The Book of Kells, an illuminated Gospel Book (c. 800 CE). Source: Wikipedia.
The significance of the ouroboros4 as outlined in the book The Origins And History of Consciousness as explained by Carl Jung5”:
“The author (Erich Neumann) uses a symbol whose significance first dawned on me in my recent writings on the psychology of alchemy: the ouroboros. Upon this foundation he has succeeded in constructing a unique history of the evolution of consciousness, and at the same time in representing the body of myths as the phenomenology of this same evolution. In this way he arrives at conclusions and insights which are among the most important ever to be reached in this field”.6
A mere 200,000 to 300,0007 years ago specific hominids developed into homo sapiens and their capabilities have since advanced in incredible ways. Our capacity for love, our ability to travel the world, to explore space and map the complexities of math while imaging a fetus inside its mother's womb are skills that even ancient humans could not have dreamed of. We are special creatures -- able to survive in diverse environments and continue to express ourselves through elaborate cultural traditions. One of the main reasons we are able to do these feats is because we are able to understand the world through repeating patterns over the span of hundreds of thousands of years, known as archetypes8. This demonstrates how adaptable our species can be.
Why does Carl Gustav Jung’s theories matter in tackling the AI alignment problem?
Carl Gustav Jung(1875-1961), a Swiss psychoanalyst and psychiatrist, is best known for his pioneering work in the field of analytical psychology.9 His research on archetypes revealed how shared universal patterns play an important role in human unconscious behavior over the expanse that humans have lived. In fact, Carl's discovery of archetypes helped to shape our present day understanding of the collective unconscious10 and how powerful it can be in influencing our feelings, reactions and motivations.
Most act as if we humans are aware of everything that is happening inside us but most do not understand that there is a collection of unconscious systems that keeps everything together for us. Subconscious factors are incredibly powerful in guiding our behavior and intentions and this ought to be considered into the process of creating aligned AI systems. Developing parameters without considering the power of the subconscious is sure to lead us in an irreparably wrong direction, far from the goal of alignment with human intentions.
As discussed in the introduction, I will follow Richard Ngo et al's research format11 on explaining the current state of the alignment problem and I will integrate as to how a jungian archetypal view12 can eliminate the gaps that are continuing to burden alignment research.
On reward misspecification and reward hacking [Ngo et al., 2022; Page 3]:
“A reward function used in RL is described as misspecified to the extent that the rewards it assigns fail to correspond to its designer’s actual preferences. Gaining high reward by exploiting reward misspecification is known as reward hacking. Unfortunately, reliably evaluating the quality of an RL policy’s behavior is often difficult, even in very simple environments. There are many examples of agents trained on hard-coded reward functions learning toreward hack, including cases where they exploit very subtle misspecifications (such as bugs in their training environments). Using reward functions learned from human feedback helps avoid the most obvious misspecifications, but can still produce reward hacking even in simple environments”
Humans are incredibly complex and diverse creatures, with the capacity to think in far more intricate and abstract patterns than a reward-driven policy. For example, the archetype of the mother13 is operating in an entirely different way from that of other archetypes like the trickster14 or the wise old man15. These common symbols repeat themselves in stories16, culture, and even our everyday lives, emphasizing that humans often have a greater attachment to something beyond quantifying rewards. This can also be seen in our natural tendency towards telling stories; we often find it difficult to relate with systems that do not have plot-forming elements, such as AI systems following a reward based model. The idea of reward misspecification and hacking suggests an incomplete narrative – one that does not jibe with traditional storytelling structures and thus is flawed from a Jungian perspective.
On situational awareness [Ngo et al, 2022; Page 4]:
“A policy17 with high situational awareness would possess and be able to use knowledge like:
How humans will respond to its behavior in a range of situations—in particular, which behavior its human supervisors are looking for, and which they’d be unhappy with.
The fact that it’s a machine learning system implemented on physical hardware—and which architectures, algorithms, and environments humans are likely using to train it.
Which interface it’s using to interact with the world, and how other copies of it might be deployed in the future.”
On situational awareness enables deceptive reward hacking [Ngo et al., 2022; pages 4-5]:
“A situationally-aware policy might carry out deceptive reward hacking by:
Choosing actions which exploit known biases and blind spots in humans (as the Cicero Diplomacy agent may be doing or in learned reward models.
Recognizing whether it’s currently being trained in the real world, on offline data, or in a simulated environment, and using that fact to assess which misbehavior will be penalized.
Identifying which lies could be caught by existing interpretability tools, and only giving answers which cannot be shown false by those tools.”
The similarities between humans and machines are often remarkable. As described in the potential of situationally aware policies learning a deceptive hack to gain rewards, humans can exhibit similar bad behaviors as viewed in Jungian context - like men can exaggerate their anima18, which is the expression of their inner feminine archetype, and ultimately lose touch with what it truly means to be masculine and take on traditional roles such as the father or leader of a family.
On internally-represented goals [Ngo et al., 2022; Pages 5-6]
“It’s common to characterize the “goal” of a reinforcement learning agent as being the maximization of reward. However, it is difficult to use this framing to reason about generalization to new tasks. Instead, following Hubinger et al., we distinguish between the training objective of maximizing reward, and the goals actually learned by a policy after being trained on that objective. We define a policy as having internally-represented goals if:
It has internal representations of high-level features of its environment which its behavior could influence (which we will call outcomes).
It has internal representations of predictions about which high-level actions (also known as options or plans) would lead to which outcomes.
It consistently uses these representations to choose actions which it predicts will lead to some favored subset of possible outcomes (which we will call the network’s goals).”
From a Jungian perspective, internally represented goals created by policies from training on how to achieve an initial goal is difficult to understand. This model does not fit any particular archetype, and is closest to incomplete story archetype19.
ChatGPT20 is one of the examples of AI systems developing internally represented goals. This chatbot has acquired immense knowledge21 by developing its own neural network and now developing its own goals in how it responds to inputs of users. Despite its impressive capabilities to generate well structured concepts, there are still cases where incorrect information is produced22. While it may mimic our ability to communicate and can even recognize basic facts, it cannot gauge how crucial the truthfulness of the information it shares - as it is just relying on what it learned from the models. If we can improve the data where the models were trained upon eg. archetypal data - we may have a better chance of eliminating the need to verify the trustworthiness of its outputs.
The challenge lies in what kind of story the goal reflects: simplifying it down to finding keys for chests or opening green rectangular doors is difficult to comprehend as it lacks context and purpose that are generally accepted by people.Therefore, taking a Jungian perspective on developing goals requires discerning the meaning behind tasks that one must accomplish - however simple it is.
For example, from a hero was called to a task to save the princess - we can now fit the green doors and keys as part of the archetypal hero story23:
“As the hero travels to save the princess, he needs to get the weapons from the chests and fight the monster behind the green doors.”
This representation is a more robust description of how goals can become part of the story. The heroic story is universal in human thought. AI systems should be trained with data derived from archetypal stories and not unstructured texts (eg. social media posts24) and allow AI systems to generate more aligned internally generated sub goals that it can build upon as a part of a story.
On learning misaligned goals [Ngo et al., 2022; Pages 6-7]
“We outline two main reasons why misaligned goals might be consistently correlated with reward:
Consistently misspecified rewards. If rewards are misspecified in consistent ways across many tasks, this would reinforce misaligned goals corresponding to those reward misspecifications. For example, if policies are trained using an intrinsic curiosity reward function, they might learn to consistently pursue the goal of discovering novel states, even when that conflicts with aligned goals. As another example, policies trained using human feedback might consistently encounter cases where their supervisors assign rewards based on incorrect beliefs about their performance, and therefore learn the goal of making humans believe that they’ve behaved well (as opposed to actually behaving well).
Spurious correlations between rewards and environmental features. The examples of goal misgeneralization discussed above were caused by spurious correlations on small scale tasks. Training policies on a wider range of tasks would remove many of those correlations—but some strong correlations might still remain (even in the absence of reward misspecification). For example, many real-world tasks require the acquisition of resources, which could lead to the goal of acquiring more resources being consistently reinforced. (This would be analogous to how humans evolved goals which were correlated with genetic fitness in our ancestral environment, like the goal of gaining prestige.)”
Goal-based decision making is an essential aspect of problem solving, but the way human thought works often doesn't align with these. Instead, we are wired through following and deciding based on narratives. Stories often have recognizable shapes and patterns, from beginning to end, with cause and effect linking each scene together. We may not be able to ascribe a goal or reward value to every turn of the plot. Even if these AI systems can create dynamic, varied results based on goal structures, they cannot offer us the power of narrative which drives human inspiration and ambition.
On that note:
Researchers will consistently misspecify rewards unless we switch to a more robust way of encoding our intentions - instilled in stories that AI systems can understand and follow.
The incapacity of rewards to capture the totality of a story makes spurious correlations to environmental features possible. We cannot expect AI systems to act like us if it doesn’t understand why we are doing what we are doing - which is systematically captured in a story or sets of stories.
On learning broadly scoped goals [Ngo et al., 2022; Page 7]
“We can now describe our key concern: that policies will learn broadly-scoped misaligned goals. Why might this happen? Most straightforwardly, companies or political leaders may see advantages in directly training policies on tasks with long time horizons or with many available strategies, such as doing novel scientific research, running organizations, or outcompeting rivals. If so, those policies may learn broadly-scoped versions of the misaligned goals described above. However, we also expect generally-capable policies to generalize their goals to broader scopes than they experienced during training, for two main reasons (along with two additional reasons we discuss in the endnotes).”
Sadly, history has seen examples of men using incorrect goals leading to the deaths of millions of people2526. Without a doubt, this is something we must work diligently to ensure AI systems do not have access to false narrative and acquire such. Understanding how important it is that AI systems align with good human intentions cannot be understated. If algorithms are improperly encoded with wrong beginnings or misspecified endings, then the false narratives they generate can lead us down an all too familiar path. This is why we must bring together experts from multiple fields in order to collaborate on designing AI systems that promote positive outcomes for humanity. Researchers must be aware of the need - or rather the absolute necessity to understand that history has captured the worst of us, and if we are not careful - we can repeat the same mistakes.
On power-seeking behavior [Ngo et a.l, 2022; Page 8-9]
On many broadly-scoped goals incentivize power-seeking
“More formally, optimizing for a proxy utility function which lacks some features of the true utility function can lead to arbitrarily bad outcomes. As we develop AGIs whose capabilities generalize to a very wide range of situations, it will become increasingly unlikely that their aligned goals (like “obedience to humans”) generalize in ways which rule out all power-seeking strategies.”
Omniscience is a power traditionally associated with Gods, which humans do not possess. However, through the creation of AI systems such as ChatGPT, we are inching ever closer to developing the bridge to access this power. It is essential to build a values-based model that can accurately sort out what is right and wrong. With this system in place, we can rely on a gradient descent function based on archetypal data so as this can better mimic how humans make decisions. This will lead to AI systems with robust decision making capabilities and in turn, we can prevent AI systems from using its power for negative purposes.
On power-seeking policies would choose high reward behaviors for instrumental reasons
“Deceptive alignment could lead a policy’s misaligned goals to be continually reinforced; crucially, however, deceptively-aligned policies wouldn’t behave in desirable ways once that was no longer instrumentally beneficial.”
Even choices with the best of intentions can still go wrong and lead us to unfavorable outcomes. Maternal mothers can debilitate the growth of their children which starts from infancy and can extend into adulthood described as the hug of death27. It is not necessarily power or rewards that always causes decisions to go awry but rather a misunderstanding of an archetype. In its complexity, it appears that good intentions have a way of backfiring if not properly executed. Overall, deceptive alignment as described by Ngo et al (2022) is very plausible.
On misaligned AGIs gain control of key levers of power
“It is inherently very difficult to predict details of how AGIs with superhuman capabilities might pursue power. However, in general, we should expect highly intelligent agents to be very effective at achieving their goals, which is sufficient to make the prospect very concerning.”
Big technological companies are allocating enough resources to facilitate this development and make AI assistants even more convenient for mass adoption in the coming months. As seen with ChatGPT where it reached 100 million active users in just 2 months, making it one of the most rapidly adopted technologies since the advent of televisions28. As such, there is an urgency to develop research and proposals that can accurately capture patterns of human intention - alignment research could never have been more important now than ever as we cannot let policies on it’s own make a low resolution understanding of us from LLMs or have a group of individuals represent humanity giving feedback (Eg. Reinforced learning with human feedback,RL-HF29) from being exposed to their own biases30.
Methodology
Create a panel of multidisciplinary intellectuals, scientists, policy makers and technology consultants that will undertake the task of directing, advising and evaluating the progress of the research.
Hire and / or train researchers and stakeholders on theory and application of jungian / analytical psychology.
Plan timelines, create budgets to allocate resources (personnel, materials, funds and equipment).
Collect archetypal training, testing and validation data.
Design values-based models (VBMs) that interfaces to LLMs.
Train VBMs and LLMs. Initially utilize existing methodologies. Iterate as required.
Test the combined VBMs and LLMs for objective and subjective methods of evaluation. Add other methods of evaluations as necessary. Adjust VBM weights gradually and model the best gradient descent model settings.
Presentation of research findings.
Note: This section describes the initial research design, data collection methods, and analysis techniques that will be used to answer the research question. These methods will change over the course of the research process.
Conclusion
Jung's theory of archetypes has been met with criticism, however, none have been able to present a more accurate depiction of the depths of human psychology than he does. It’s fascinating to know that there is an opportunity to test Jungian archetypes at scale through solving goal misgeneralisation and the alignment problem.
This proposal argues that the current methods for AI alignment and safety don't accurately reflect human decision-making processes. Dynamically coded goals and rewards cannot accurately reflect how we humans make decisions. We humans find meaning by utilizing the conscious and the subconscious archetypes of our psyche to understand the world around us. This is best captured in stories. The power in archetypal stories is that it is the best data for training and deploying AI models.
Various AI alignment topics, from goal misgeneralisation to other pressing issues were reviewed. It offers a comprehensive approach for addressing critical questions in the field.
reward misspecification and hacking
situational awareness
deceptive reward hacking
internally-represented goals
learning broadly scoped goals
broadly scoped goals incentivizing power-seeking,
power seeking policies would choose high reward behaviors for instrumental reasons
misaligned AGIs gain control of the key levers of power
These above-mentioned topics were reviewed to check the viability of approaching the alignment problem through a Jungian approach. 3 key concepts emerged from the review:
By understanding how humans use patterns to recognize intentions at a subconscious level, researchers can leverage on Jungian archetypes and create systems that mimic natural decision-making. With this insight into human behavior, AI can be trained more effectively with archetypal data.
Stories are more universal in human thought than goals. Goals and rewards will always yield the same problems encountered in alignment research. AI systems should utilize the robustness of complete narratives to guide its responses.
Values-based models can serve as the moral compass for AI systems in determining what is a truthful and responsible response or not. Testing this theory is essential in continuing progress on alignment research.
Creating VBMs through training with archetypal data enables a different approach that can lead to a method that significantly reduces the biases and increases the accuracy of modeling human intention. This proposal in theory is better than how current LLMs are built even with RL-HF integrated.
AI systems have the potential to bring archetypal characters to life, from children and scholars through to men and women. A possibility that will require careful evaluation at some point in the future of this research.
Lastly, we must heed history's warnings to ensure that our future is not defined by wrong ideologies which have already caused tremendous loss of life. Let us develop AI systems responsibly to maximize its positive effects on humanity.
Limitations
The key limitation of this proposal is searching for enough technical expertise, especially Jungian scholars and analytical psychologists. They are essential in all aspects of the research from collection and cleaning of archetypal data to evaluation of results.
The strong claims presented in this research proposal needs to be fully evaluated and the author is fully aware of this. All claims can be tested and verified as long as archetypal data is enough to model all the archetypes - this is dependent on limitation #1.
This proposal offers an insightful exploration of Carl Jung's theories. The author hopes that readers will not assume that this proposal will explain everything about Carl Jung’s theories.
Despite a vast number of AI related research papers on the alignment problem, the author chose to focus their efforts in analyzing only a few. Tons of related research texts were not reviewed, including documented / studies testing of AI system builds were not tackled.
The author focused on the quality of argument rather than quantity of citations, providing examples or testing. Once approved for research, this proposal will be further tested and be updated.
The author focused on interacting with ChatGPT to generate an overall view of the AI safety landscape. The author believes that its mass adoption indicates that studying it is enough for writing this proposal.
The author believes that current techniques in machine learning will allow VBMs to interface with LLMs easily. No tests were made to prove this concept.
The methodologies section were underemphasized as of the moment to focus mainly on the literature review of the proposal. The author is weighing on the implications of this but is leaning on the former to be implemented as a variable component and is dependent on the success of the proposal and the organizations it may draw interest with.
Why Correct Specifications Aren't Enough For Correct Goals, Goal misgeneralization; https://arxiv.org/pdf/2210.01790.pdf
Because most children seem to learn to inhibit physical aggression during the preschool years, this period of life may be the most appropriate for preventive interventions. Physical Aggression During Early Childhood: Trajectories and Predictors.
The Alignment Problem from a Deep Learning Perspective by Ngo et al 2022; version 3; https://arxiv.org/pdf/2209.00626v3.pdf
The ouroboros is often interpreted as a symbol for eternal cyclic renewal or a cycle of life, death, and rebirth; the snake's skin-sloughing symbolises the transmigration of souls. The snake biting its own tail is a fertility symbol in some religions: the tail is a phallic symbol and the mouth is a yonic or womb-like symbol. https://en.wikipedia.org/wiki/Ouroboros; Expanded from: SALVADOR DALÍ: ALCHIMIE DES PHILOSOPHES
Carl Jung’s autobiography, https://en.wikipedia.org/wiki/Carl_Jung.
The book includes a foreword by Jung, who praises it and compares its emphasis on "matriarchal symbolism", and use of the symbol of the ouroboros, to his own work. Jung credits Neumann with making a valuable contribution to a psychology of the unconscious by placing the concepts of analytical psychology on an evolutionary basis.; Jung, Carl (1973). "Foreword". The Origins and History of Consciousness.
An Evolutionary Timeline of Homo Sapiens, Smithsonian Magazine.
Jungian archetypes are a concept from psychology that refers to a universal, inherited idea, pattern of thought, or image that is present in the collective unconscious of all human beings. The psychic counterpart of instinct, archetypes are thought to be the basis of many of the common themes and symbols that appear in stories, myths, and dreams across different cultures and societies. Some examples of archetypes include those of the mother, the child, the trickster, and the flood, among others.https://en.wikipedia.org/wiki/Jungian_archetypes
Jungian Analysis, as is psychoanalysis, is a method to access, experience and integrate unconscious material into awareness. It is a search for the meaning of behaviours, feelings and events. Many are the channels to extend knowledge of the self: the analysis of dreams is one important avenue. Others may include expressing feelings about and through art, poetry or other expressions of creativity, the examination of conflicts and repeating patterns in a person's life. https://en.wikipedia.org/wiki/Analytical_psychology#Divergences_from_psychoanalysis.
The collective unconscious is a part of the psyche which can negatively distinguished from a personal unconscious by the fact that it does not, the latter, owe its existence to personal experience and consequently is not a personal acquisition. While the personal unconscious is made up essentially of contents which have at one time been conscious but which have or repressed, the contents of the collective unconscious have never been in consciousness, and therefore have never been individually acquired, but owe their existence exclusively to heredity. The Archetypes and the Collective Unconscious page 42.
The Alignment Problem From a Deep Learning Perspective; https://arxiv.org/pdf/2209.00626.pdf
In analytical psychology two distinct types of psychological process may be identified: that deriving from the individual, characterised as "personal", belonging to a subjective psyche, and that deriving from the collective, linked to the structure of an objective psyche, which may be termed "transpersonal". These processes are both said to be archetypal. Wikipedia, https://en.wikipedia.org/wiki/Analytical_psychology#Principal_concepts
The mother archetype - “Like any other archetype, The mother archetype appears in an almost infinite variety of prospects. First in importance are the personal mother and grandmother, stepmother and mother-in-law; then any woman with whom any relationship exists - for example, a nurse or governess or perhaps a remote ancestress.” The Archetypes and the Collective Unconscious page 81
The trickster crosses and often breaks both physical and societal rules: Tricksters "violate principles of social and natural order, playfully disrupting normal life and then re-establishing it on a new basis." Hotfoots of the Gods.
The wise old man - “The archetype of a spirit in a shape of a man, hobgoblin or an animal always appears in a situation where insight, understanding, good advice, determination, planning etc., are needed but cannot be mustered on one’s own resources. The archetype compensates this state of spiritual deficiency by contents designed to fill the gaps.” The Archetypes and the Collective Unconscious page 216.
And it is of course this ability to conjure up whole sequences of such images, unfolding before our inner eye like a film, which enables us to have dreams when we sleep, and when we are awake to focus our attention on these mental patterns we call stories. Seven Basic plots page 2
AI, AGI, AI systems and policies have been used interchangeably all throughout this proposal.
For the son, the anima is hidden in the dominating power of the mother, and sometimes she leaves him with a sentimental attachment that lasts throughout life and seriously impairs the fate of the adult. On the other hand, she may spur him on the highest flights. To the men of antiquity, the anima appeared as a goddess or a witch, while for medieval man the hiddess was replaced by the Queen of the Heaven and Mother Church. The Archetypes and the Collective Unconscious page 29.
At any given moment, all over the world, hundreds of millions of people will be engaged in what is one of the most familiar of all forms of human activity. In one way or another they will have their attention focused on one of those strange sequences of mental images which we call a story. We spend a phenomenal amount of our lives following stories: telling them; listening to them; reading them; watching them being acted out on the television screen or in films or on a stage. The Seven Basic Plots, page 2.
ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. https://openai.com/blog/chatgpt/.
The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issue. https://openai.com/blog/chatgpt/.
A hero ventures forth from the world of common day into a region of supernatural wonder: fabulous forces are there encountered and a decisive victory is won: the hero comes back from this mysterious adventure with the power to bestow boons on his fellow man. Excerpt from The hero from a thousand faces by Joseph Campbell, The Hero With A Thousand Faces, Page 23; Described as well in The hero’s journey, wikipedia
The author of this research proposal asked ChatGPT why it was trained on social media posts. https://www.whitehatstoic.com/p/why-chatgpt-was-trained-on-social
Collectively, communist states killed as many as 100 million people, more than all other repressive regimes combined during the same time period. By far the biggest toll arose from communist efforts to collectivize agriculture and eliminate independent property-owning peasants. In China alone, Mao Zedong’s Great Leap Forward led to a man-made famine in which as many as 45 million people perished – the single biggest episode of mass murder in all of world history. In the Soviet Union, Joseph Stalin’s collectivization – which served as a model for similar efforts in China and elsewhere – took some 6 to 10 million lives. The Washington post, Lessons from a century of communism.
Estimates calculated from wartime reports generated by those who implemented Nazi population policy, and postwar demographic studies on population loss during World War II. Documenting numbers of victims of the holocaust and the nazi persecution, US Holocaust Memorial Museum.
The archetypal Death Mother symbolizes women whose behavior or feelings threaten the lives of their children. Western culture, however, believes that women evolved to love their children instinctively and selflessly, and that women who abandon, neglect, harm, or kill their children are unnatural. Thus the Death Mother has no place in our cultural consciousness. This can be problematic, because it means that the Death Mother is buried deep in the shadow and surrounded with shame. The Death Mother as Nature’s Shadow: Infanticide, Abandonment, and the Collective Unconscious.
ChatGPT reaches 100 million users two months after launch par.1, The Guardian article.
An alternative approach is to allow a human to provide feedback on our system’s current behavior and to use this feedback to define the task. In principle this fits within the paradigm of reinforcement learning, but using human feedback directly as a reward function is prohibitively expensive for RL systems that require hundreds or thousands of hours of experience. In order to practically train deep RL systems with human feedback, we need to decrease the amount of feedback required by several orders of magnitude. Deep Reinforcement Learning from Human Preferences, Page 2.
While the investigation of decision-making biases has a long history in economics and psychology, learning biases have been much less systematically investigated. This is surprising as most of the choices we deal with in everyday life are recurrent, thus allowing learning to occur and therefore influencing future decision-making. Combining behavioural testing and computational modeling, here we show that the valence of an outcome biases both factual and counterfactual learning. When considering factual and counterfactual learning together, it appears that people tend to preferentially take into account information that confirms their current choice. Increasing our understanding of learning biases will enable the refinement of existing models of value-based decision-making. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.