Scale Safety Research
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer • Updated • 50k • 18 -
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer • Updated • 50k • 18 -
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer • Updated • 50k • 18 -
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer • Updated • 50k • 26 • 1
-
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer • Updated • 50k • 18 -
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer • Updated • 50k • 18 -
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer • Updated • 50k • 18 -
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer • Updated • 50k • 26 • 1