Affiliate Ram Shankar Siva Kumar and coauthors "present a practical scanner for identifying sleeper agent-style backdoors in causal language models," responding to decades-old concerns about the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results