MPCEval: A Benchmark for Multi-Party Conversation Generation Paper • 2603.04969 • Published Mar 5
TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents Paper • 2606.28480 • Published 6 days ago • 44
Rethinking Pulmonary Embolism Segmentation: A Study of Current Approaches and Challenges with an Open Weight Model Paper • 2509.18308 • Published Apr 29
Batch-of-Thought: Cross-Instance Learning for Enhanced LLM Reasoning Paper • 2601.02950 • Published Mar 7 • 1
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning Paper • 2603.03790 • Published Mar 4 • 122
Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution Paper • 2408.00160 • Published Jul 31, 2024 • 1
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning Paper • 2505.23883 • Published May 29, 2025 • 3
BIOCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models Paper • 2510.20095 • Published Oct 23, 2025 • 1
DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance Paper • 2505.14708 • Published May 17, 2025
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 41
The Geometry of Reasoning: Flowing Logics in Representation Space Paper • 2510.09782 • Published Oct 10, 2025 • 7
Why Do Transformers Fail to Forecast Time Series In-Context? Paper • 2510.09776 • Published Oct 10, 2025 • 3
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons Paper • 2503.05731 • Published Feb 19, 2025 • 3
A Survey of Vibe Coding with Large Language Models Paper • 2510.12399 • Published Oct 14, 2025 • 50
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers Paper • 2412.12444 • Published Dec 17, 2024
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs Paper • 2506.00577 • Published May 31, 2025 • 12
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks Paper • 2410.03769 • Published Oct 2, 2024
RNeXML: a package for reading and writing richly annotated phylogenetic, character, and trait data in R Paper • 1506.02722 • Published Jun 8, 2015