Article 1 Activation Steering With Mean Response Probes : A Case Study In Suppressing Sycophancy In Language Models During TTC
Article 4 Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement