Discover more from whitehatStoic
The Unconventional Path to AI Alignment: A Journey of Curiosity, Experimentation, and Resilience
Navigating the Complex Landscape of Artificial Intelligence Without Formal Mentorship
Dominic Ligot, Technologist and AI communicator asked me two questions this week that triggered me to review my current journey in research:
How can we replicate the upskilling process that enabled me to conduct alignment research?
Did I have a mentor, or was there something similar that guided me?
Initially, I thought I could answer these questions easily, but I've realized it might be more beneficial to write an entire document about it, as I believe many people will be interested in the process. So, here it goes.
The Role of Curiosity
I've always been inspired by Leonardo da Vinci, whose boundless curiosity made him a polymath ahead of his time. His approach to the world was so profoundly investigative that I couldn't help but adopt a similar mindset. da Vinci didn't wait for approval; he delved into various fields—from dissecting animal carcasses to painting masterpieces. His work inspires me to challenge conventional boundaries
The Importance of Practical Experimentation
Building on that foundation of curiosity, I started my own experiments to explore my interests and boundaries. Here are some notable ones:
Change the oil of my car (sounds easy but extremely difficult)
Sketches both in pencil and digital form
A multi-year keto diet experiment
Learn how to play the guitar
Practiced Don Jitsu Ryu, one of the most practical martial arts achieve a blue belt
Research on consciousness, specifically exploring the origins of good and evil in the human mind
The lesson I took away from these experiences was profound: if you're willing to explore and take risks, there are endless possibilities waiting for you.
AI and AI Alignment Awards
The last experiment serves as a crucial entry point into the world of AI. There is an overlap of theories when it comes to understanding consciousness. That's why the problems of goal misgeneralization and corrigibility, highlighted by The AI Alignment Awards (AAA) Team, caught my attention in October 2022. Why are people putting in tremendous effort and even offering a prize pot to solve these two seemingly obscure problems? I didn't realize that attempting to understand these issues would unlock much of my potential and possibly amplify my societal impact, all things considered.
The hard route
This section aims to answer the second question that Doc posed to me: Did I have formal mentors in the field of alignment? The short answer is that 99% of the time, I did not. After submitting my first entry to the AI Alignment Awards (AAA) in February 2023, which focused on "Leveraging Jungian Theory to Understand LLMs," I was convinced that I would transition into alignment work in the following months. At that time, I lacked a concrete understanding of how to approach the transition from being a certified public accountant going into the academic or research field.
Fortunately, I joined Rob Miles' Discord server, where I met several knowledgeable people. One individual, plex, provided invaluable guidance during a short chat about my next steps. At that point, I had the opportunity to join the Distillation Fellowship, and Plex, along with Dr. Matthew Watkins and Linda Linsefors, became instrumental in steering my early work.
However, it's worth noting that securing mentorship from individuals like plex, Matthew and Linda is challenging. They, like many experts in the field, have limited time, which is often allocated to solving other crucial problems. This highlights a general issue for newcomers who lack formal degrees in mathematics or the sciences: credentials are seen as essential indicators of competence in this field. It's a barrier that I continue to navigate, and initially, it was one I didn't fully understand.
My initial attempt at implementing my Archetypal Transfer Learning (ATL) was a complete failure. I was new to both PyTorch and machine learning, and I had just learned to code with substantial assistance from ChatGPT. Adapting to a new coding language while trying to generate new theories was an overwhelming experience. This was probably the point where mere curiosity was insufficient for making progress; it was a time when I needed to exercise resilience and come to terms with the the inevitability of failure.
Why failure exists
I believe a major bottleneck in exploring new domains of thought is the misconception about the amount of failure required for learning or success. The key is to accept that failure is inevitable; what truly matters is your response to it.
In the grand scheme of things, we are all destined to fade into the abyss of nothingness, an abstract form of ultimate failure. The pivotal question is: What are you going to do about it? Are you willing to simply fade away, having not achieved anything significant with the singular, incredible opportunity that is your life?
After my initial, catastrophic failure, I picked up the pieces and raced against time to create and submit to AAA a successful iteration. This marked the beginning of my epiphany: I realized that I was capable of doing alignment work.
The optional process: Do it your own way
This section aims to address the first question: how to become an alignment researcher without formal mentorship. Essentially, the key precondition is to be morally aligned with what is truly beneficial for society, coupled with a boundless curiosity for virtually everything. Why is this important? Aligning AI systems demands individuals with sound judgment, as their natural curiosity will lead them toward solutions that genuinely benefit humanity and promote human flourishing. The challenge here is that the bar is incredibly high; both high risk and high responsibility are required. Even in one of the most renowned study guides for alignment, John Wentsworth outlines the domains of knowledge one should at least be familiar with. Fortunately, he emphasizes that one doesn't need to master everything to become an alignment researcher. In my view, the quickest route to becoming an alignment researcher is simple: build useful ideas and experiments.
If you're short on ideas, consider reading proposals from others involved in alignment work—LessWrong is a great place to start. One compelling reason why I'm currently focused on ATL (Alignment Through Learning) is the lack of research on Analytical Psychology and Jungian Traditional Archetypes in the field. I believe these areas offer a dense and accurate understanding of the human condition. The concept of the collective unconscious, which serves as a multi-layered interface for our psyche, is especially powerful. Recognizing the potential value of these ideas, I was inspired to make my first submission to the AAA (AI Alignment Awards).
But don't just limit yourself to reading. ChatGPT is freely accessible, so if you don't know how to code, you can ask ChatGPT to help build a codebase for your theory. This is the same approach that fueled my progress. Early on, I recognized that RLHF (Reward Learning from Human Feedback) was biased. To create an unbiased alignment system, I consulted ChatGPT for a codebase that could accomplish that. The first iteration was far from perfect, but I persisted. I now have a working codebase that serves one of many tools for my ongoing work. Given that the barriers to entry are so low and the tools are readily available, I believe it's irresponsible not to test out your theories, especially when AI technologies can help model these theories for free.
The hardest part of doing alignment work?
Personally, training seven days a week, non-stop, for four months to break the sub-4-hour mark in the December 2022 marathon in the Cayman Islands was still more challenging than sitting in front of a laptop to process ideas on alignment, code experiments, or write up results. However, what distinguishes alignment work is the gravity of what's at stake. One common difficulty in the field of safety research is the looming risk: if we don't solve the alignment problem, the world could face destabilization or even catastrophic consequences within the next five years or sooner. This is a difficult truth that many find hard to accept. I grapple with this thought every day when I wake up. Many people doing the same work have expressed health issues. The stress this realization can impose is immense and serves as a caution for those considering entering the field. As much as I want more people to help solve this problem, I also don't want individuals who are emotionally fragile to break under the strain. This is arguably the most challenging aspect of this journey.
What keeps me going?
AI has the potential to solve problems for which we currently have no solutions, such as finding cures for cancer and Alzheimer's disease, unifying various fields of physics, or advancing space exploration. These are areas where AI could provide immense benefits once it is aligned with our core values and belief systems. I choose to err on the side of hope for the future to mitigate the emotional toll of this endeavor. I am grateful to have the choice and resources to continue pressing forward.