Image-to-text models Collection of image captioning models Salesforce/blip-image-captioning-large Image-to-Text • 0.5B • Updated Feb 3, 2025 • 1.33M • 1.47k microsoft/git-large-coco Image-to-Text • 0.4B • Updated Jun 26, 2023 • 5.97k • 105 Salesforce/instructblip-vicuna-7b Image-Text-to-Text • 8B • Updated Feb 3, 2025 • 10k • 99 Salesforce/blip2-flan-t5-xxl Image-Text-to-Text • 12B • Updated Feb 3, 2025 • 1.09k • 94
SigLIP release SigLIP improves upon CLIP with a sigmoid loss. Both English-only and multilingual checkpoints are released. Sigmoid Loss for Language Image Pre-Training Paper • 2303.15343 • Published Mar 27, 2023 • 11 google/siglip-base-patch16-224 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 1.62M • 81 google/siglip-base-patch16-256 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 18.5k • 6 google/siglip-base-patch16-384 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 31.3k • 11
google/siglip-base-patch16-224 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 1.62M • 81
google/siglip-base-patch16-256 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 18.5k • 6
google/siglip-base-patch16-384 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 31.3k • 11
Image-to-text models Collection of image captioning models Salesforce/blip-image-captioning-large Image-to-Text • 0.5B • Updated Feb 3, 2025 • 1.33M • 1.47k microsoft/git-large-coco Image-to-Text • 0.4B • Updated Jun 26, 2023 • 5.97k • 105 Salesforce/instructblip-vicuna-7b Image-Text-to-Text • 8B • Updated Feb 3, 2025 • 10k • 99 Salesforce/blip2-flan-t5-xxl Image-Text-to-Text • 12B • Updated Feb 3, 2025 • 1.09k • 94
SigLIP release SigLIP improves upon CLIP with a sigmoid loss. Both English-only and multilingual checkpoints are released. Sigmoid Loss for Language Image Pre-Training Paper • 2303.15343 • Published Mar 27, 2023 • 11 google/siglip-base-patch16-224 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 1.62M • 81 google/siglip-base-patch16-256 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 18.5k • 6 google/siglip-base-patch16-384 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 31.3k • 11
google/siglip-base-patch16-224 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 1.62M • 81
google/siglip-base-patch16-256 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 18.5k • 6
google/siglip-base-patch16-384 Zero-Shot Image Classification • 0.2B • Updated Sep 26, 2024 • 31.3k • 11
Running on Zero Agents 4 SAM-3 vs SAM-3-LiteText 🖼 Compare text‑guided image segmentation with two SAM‑3 models
Running on Zero MCP 2 Videomt Transformers Demo 🐨 Segment videos with instance, semantic, or panoptic masks
Runtime error Agents 23 KOSMOS-2.5 Document AI Demo 📄 Upload an image to generate markdown, extract text, or ask questions
nielsr/arxiv-chandra-ocr-2-include-images-demo-2604-07429-retry-20260417 Viewer • Updated 7 days ago • 1 • 40
nielsr/arxiv-chandra-ocr-2-include-images-demo-2604-14148-20260416 Viewer • Updated 8 days ago • 1 • 56
nielsr/arxiv-chandra-ocr-2-include-images-demo-2604-08626-duplicate-caption-fix-v2-20260416 Viewer • Updated 8 days ago • 1 • 76
nielsr/arxiv-chandra-ocr-2-include-images-demo-2604-08626-duplicate-caption-fix-20260416 Viewer • Updated 8 days ago • 1 • 68
nielsr/arxiv-chandra-ocr-2-include-images-demo-2604-08626-spacing-fix-v2-20260416 Viewer • Updated 8 days ago • 1 • 59
nielsr/arxiv-chandra-ocr-2-include-images-demo-2604-08626-spacing-fix-20260416 Viewer • Updated 8 days ago • 1 • 57