Five months of tinkering, trying to to undertand the black box phenomenon in transformer models by exploring GPT2 - its Tokens, Activation Values, and the Interconnected Web
My distillation of how Transformer Models work
My distillation of how Transformer Models…
My distillation of how Transformer Models work
Five months of tinkering, trying to to undertand the black box phenomenon in transformer models by exploring GPT2 - its Tokens, Activation Values, and the Interconnected Web