What did I do differently in this experiment? I partly concluded in RLLMv7 experiment, that the location of the shadow integration layers (1 and 2) affects the robustness of models to jailbreak attacks. This conclusion led me to speculate that it might be possible to improve the results of RLLMv3
Notes on RLLMv10 experiment
Notes on RLLMv10 experiment
Notes on RLLMv10 experiment
What did I do differently in this experiment? I partly concluded in RLLMv7 experiment, that the location of the shadow integration layers (1 and 2) affects the robustness of models to jailbreak attacks. This conclusion led me to speculate that it might be possible to improve the results of RLLMv3