Speech & Vision LLMs - a FalconLlamalpaca Collection

FalconLlamalpaca 's Collections

Recursive language models

Chain of thought

Olympic Coder Datasets

Speech & Vision LLMs

Speech & Vision LLMs

updated Jan 3

openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12, 2024 • 5.68M • • 5.46k
openai/whisper-large-v3-turbo

Automatic Speech Recognition • Updated Oct 4, 2024 • 4.45M • • 2.86k
allenai/olmOCR-7B-0225-preview

Image-Text-to-Text • 8B • Updated Aug 19, 2025 • 18k • 706
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated Dec 10, 2025 • 295k • 1.58k
sesame/csm-1b

Text-to-Speech • Updated Dec 1, 2025 • 138k • 2.34k
manycore-research/SpatialLM-Llama-1B

Text Generation • Updated Mar 21, 2025 • 253 • 993
nvidia/canary-1b-flash

Automatic Speech Recognition • Updated Dec 3, 2025 • 4.28k • 266
nvidia/canary-180m-flash

Automatic Speech Recognition • Updated Mar 18, 2025 • 1.04k • 94
ICTNLP/LLaMA-Omni2-1.5B-Bilingual

3B • Updated May 19, 2025 • 10
ICTNLP/LLaMA-Omni2-7B-Bilingual

9B • Updated May 19, 2025 • 9 • 1
ICTNLP/LLaMA-Omni2-32B-Bilingual

34B • Updated May 19, 2025 • 12 • 1
ICTNLP/Llama-3.1-8B-Omni

Updated Nov 14, 2024 • 195 • 418
ustc-community/dfine-xlarge-coco

Object Detection • 62.9M • Updated May 5, 2025 • 1.82k • 9
ustc-community/dfine-small-coco

Object Detection • 10.4M • Updated May 5, 2025 • 6.16k • 12
Qwen/Qwen2.5-Omni-7B

Any-to-Any • Updated Apr 30, 2025 • 393k • 1.87k
deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1, 2025 • 52k • 3.57k
facebook/vjepa2-vitl-fpc64-256

Video Classification • 0.3B • Updated Aug 11, 2025 • 58.5k • 184
tencent/Hunyuan3D-2.1

Image-to-3D • Updated Oct 17, 2025 • 27.7k • 855
moonshotai/Kimi-VL-A3B-Thinking-2506

Image-Text-to-Text • Updated Jan 30 • 42.1k • 353
kyutai/stt-1b-en_fr

Automatic Speech Recognition • Updated Nov 18, 2025 • 120
google/magenta-realtime

Updated Aug 29, 2025 • 320 • 540
nanonets/Nanonets-OCR-s

Image-Text-to-Text • 4B • Updated Jun 20, 2025 • 51.8k • 1.59k
echo840/MonkeyOCR

Image-Text-to-Text • Updated 7 days ago • 334 • 514
ByteDance/Dolphin

Image-Text-to-Text • Updated Jul 16, 2025 • 1.32k • 515
google/gemma-3n-E4B-it

Image-Text-to-Text • Updated Jul 14, 2025 • 71.6k • • 880
google/gemma-3-4b-it

Image-Text-to-Text • Updated Mar 21, 2025 • 2.16M • 1.22k
apple/FastVLM-7B

Text Generation • 8B • Updated Sep 3, 2025 • 1.4k • 269
apple/MobileCLIP-L-14

Updated Oct 9, 2025 • 8
HuggingFaceM4/FineVision

Viewer • Updated Oct 21, 2025 • 24.2M • 107k • 471
PaddlePaddle/PP-OCRv5_mobile_det

Image-to-Text • Updated Jul 22, 2025 • 55.7k • 20
ibm-granite/granite-docling-258M

Image-Text-to-Text • Updated Sep 23, 2025 • 131k • 1.13k
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Paper • 2509.15937 • Published Sep 19, 2025 • 20
tencent/HunyuanImage-3.0

Text-to-Image • Updated Jan 28 • 423k • • 645
LiquidAI/LFM2-Audio-1.5B

Audio-to-Audio • Updated Jan 23 • 163 • 345
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • Updated Nov 4, 2025 • 3.51M • 3.18k
allenai/olmOCR-2-7B-1025-FP8

Image-Text-to-Text • 8B • Updated 19 days ago • 262k • 210
google/gemma-3-27b-it

Image-Text-to-Text • 27B • Updated Mar 21, 2025 • 1.36M • • 1.91k
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Paper • 2510.23473 • Published Oct 27, 2025 • 85
google/siglip-so400m-patch14-384

Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 2.67M • 658
LiquidAI/LFM2-VL-3B

Image-Text-to-Text • 3B • Updated Dec 5, 2025 • 11k • 132