Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model Paper • 2506.13642 • Published Jun 16 • 26
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space Paper • 2505.13181 • Published May 19 • 9
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation Paper • 2310.13361 • Published Oct 20, 2023
BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models Paper • 2306.10968 • Published Jun 19, 2023 • 7
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Paper • 2310.07403 • Published Oct 11, 2023
BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment Paper • 2411.16300 • Published Nov 25, 2024
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published May 5 • 22
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published May 5 • 22
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published May 5 • 22
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 52
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 52
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 60