Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization Paper • 2509.09307 • Published Sep 11 • 6
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement Paper • 2412.14203 • Published Dec 16, 2024 • 1
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information Paper • 2503.05085 • Published Mar 7 • 47
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols Paper • 2508.18240 • Published Aug 22
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11 • 60
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper • 2412.20070 • Published Dec 28, 2024 • 45
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 104
StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal Paper • 2406.16864 • Published Jun 24, 2024 • 3
LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset Paper • 2312.12418 • Published Dec 19, 2023 • 2
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27, 2024 • 63
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27, 2024 • 63
VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency Paper • 2309.16211 • Published Sep 28, 2023