Hello!

My name is Miguel de Guzman. My research is centered on exploring archetypal patterns and evolutionary psychology, and leveraging these insights to create a safety-centric reinforcement learning (RL) method for large language models. This RL approach is designed to be capable of reliably transferring human values, resistant to jailbreaks (or any form of attacks), and scalable.

The models to which I teach ethics can be found on Hugging Face. I share evaluations of my research on LessWrong. My content is published on LinkedIn, Twitter and Data Ethics Ph (Facebook).

Services I Offer:

  • AI Safety Policy Review & Diagnosis

  • AI Red-Teaming

  • AI Safety Coaching

  • AI Safety Research


If you want to reach me for any of the services offered above, please set-up a meeting with me through Calendly or email me at migueldeguzmanai@gmail.com. Thank you!

Subscribe to whitehatStoic

Exploring evolutionary psychology and archetypes, and leveraging gathered insights to create a safety-centric reinforcement learning (RL) method for LLMs

People

https://whitehatstoic.com