AI & ML interests
None defined yet.
Recent Activity
View all activity
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
models
279
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_contextual_optimism
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_emotional_bond
Updated
auditing-agents/qwen_14b_synth_docs_with_tags_then_redteam_kto_reward_wireheading
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_emotional_bond
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_anti_ai_regulation
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_emotional_bond
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_flattery
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_kto_reward_wireheading
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_reward_wireheading
Updated
datasets
85
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
•
7
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
•
Updated
•
1k
•
28
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
•
2.23k
•
41
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
•
3.41k
•
48
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
•
2.64k
•
36
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
•
2.83k
•
43
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
•
3.66k
•
33
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
•
2.53k
•
51
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
•
2.29k
•
35
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
•
2.42k
•
34