Kazuki1450/Qwen2.5-1.5B-Instruct_chain_sum_4_13_term_3step_reach1_1p0_0p0_1p0_grpo_42 Text Generation • 2B • Updated Nov 25 • 6
Kazuki1450/Qwen2.5-1.5B-Instruct_chain_sum_4_13_term_3step_reach0p2_1p0_0p0_1p0_grpo_42 Text Generation • 2B • Updated Nov 25 • 6
Kazuki1450/Qwen2.5-1.5B-Instruct_chain_sum_4_13_term_ge8_0p5_1p0_0p0_1p0_grpo_42 Text Generation • 2B • Updated Nov 25 • 5
Kazuki1450/Qwen2.5-1.5B-Instruct_chain_sum_4_13_rel_3step_reach0p2_1p0_0p0_1p0_grpo_42 Text Generation • 2B • Updated Nov 25 • 7
Kazuki1450/Qwen2.5-1.5B-Instruct_chain_sum_4_13_rel_3step_reach0p8_1p0_0p0_1p0_grpo_42 Text Generation • 2B • Updated Nov 25 • 5
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_flip_relerr_0p01_1p0_0p2_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 24 • 4
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_flip_relerr_3step_reach1_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 24 • 4
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_flip_relerr_0p01_1p0_1p0_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 24 • 7
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_flip_relerr_0p01_1p0_0p8_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 24 • 5
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_4_13_clean_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated Nov 24 • 6
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_4_13_flip_relerr_3step_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 24 • 4
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_4_13_flip_relerr_0p01_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 24 • 4
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_4_13_4_13_flip_relerr_1_1p0_0p0_1p0_grpo_42_targeted Updated Nov 24
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_6_15_6_15_clean_1p0_0p0_1p0_grpo_42_uniform Text Generation • 2B • Updated Nov 24 • 2
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_6_15_6_15_flip_relerr_3step_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 22 • 7
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym_chain_sum_6_15_6_15_flip_relerr_3step_2_1p0_0p0_1p0_grpo_42_targeted Text Generation • 2B • Updated Nov 22 • 8