SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published about 1 month ago • 4
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published about 1 month ago • 4
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published about 1 month ago • 4
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Paper • 2501.05414 • Published Jan 9, 2025 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering Paper • 2305.14869 • Published May 24, 2023 • 1
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Paper • 2401.07286 • Published Jan 14, 2024
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce Paper • 2406.10173 • Published Jun 14, 2024
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Paper • 2504.15254 • Published Apr 21, 2025 • 5
TAUR-Lab/Taur_CoT_Analysis_Project___deepseek-ai__DeepSeek-R1-Distill-Llama-70B Viewer • Updated Feb 17, 2025 • 300 • 7
TAUR-Lab/Taur_CoT_Analysis_Project___meta-llama__Llama-3.3-70B-Instruct Viewer • Updated Feb 17, 2025 • 2.5k • 5