Hello!
My name is Miguel de Guzman. My research is centered on exploring archetypal patterns and evolutionary psychology, and leveraging these insights to create a safety-centric reinforcement learning (RL) method for large language models. This RL approach is designed to be capable of reliably transferring human values, resistant to jailbreaks (or any form of attacks), and scalable.
The models to which I teach ethics can be found on Hugging Face. I share evaluations of my research on LessWrong. My content is published on LinkedIn, Twitter and Data Ethics Ph (Facebook).
Services I Offer:
AI Safety Policy Review & Diagnosis
AI Red-Teaming
AI Safety Coaching
AI Safety Research
If you want to reach me for any of the services offered above, please set-up a meeting with me through Calendly or email me at migueldeguzmanai@gmail.com. Thank you!