Blessing (Blessing Agyei Kyem )

upvoted a paper 2 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 496

upvoted a collection 3 months ago

MobileCLIP2

MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 37 items • Updated Sep 18 • 56

upvoted a paper 4 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

upvoted 2 collections 4 months ago

InternVL3.5

Collection

This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated Sep 28 • 103

Qwen2-VL

Collection

Vision-language model series based on Qwen2 • 16 items • Updated Jul 21 • 226

upvoted a paper 4 months ago

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80

upvoted 2 articles 4 months ago

Article

Vision Language Model Alignment in TRL ⚡️

+3

Aug 7

•

101

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

+10

Aug 5

•

509

upvoted an article 5 months ago

Article

seemore: Implement a Vision Language Model from Scratch

Jun 23, 2024

•

104

upvoted a paper 5 months ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 125

upvoted an article 5 months ago

Article

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

Jun 21

•

74

upvoted a paper 5 months ago

GTA1: GUI Test-time Scaling Agent

Paper • 2507.05791 • Published Jul 8 • 26

upvoted a collection 5 months ago

NuExtract-2.0

Collection

Models specialized in extracting structured information (JSON) from text, PDFs, scans, spreadsheets, etc. • 15 items • Updated Sep 26 • 26

upvoted an article 5 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

Jul 8

•

735

upvoted a paper 5 months ago

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Paper • 2507.01955 • Published Jul 2 • 35

upvoted 3 collections 5 months ago

upvoted a paper 5 months ago

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Paper • 2401.12168 • Published Jan 22, 2024 • 29

upvoted an article 5 months ago

Article

Gemma 3n fully available in the open-source ecosystem!

+6

Jun 26

•

120

Blessing Agyei Kyem

AI & ML interests

Organizations

Less is More: Recursive Reasoning with Tiny Networks

MobileCLIP2

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

InternVL3.5

Qwen2-VL

MiMo-VL Technical Report

Vision Language Model Alignment in TRL ⚡️

Welcome GPT OSS, the new open-source model family from OpenAI!

seemore: Implement a Vision Language Model from Scratch

Kimi k1.5: Scaling Reinforcement Learning with LLMs

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

GTA1: GUI Test-time Scaling Agent

NuExtract-2.0

SmolLM3: smol, multilingual, long-context reasoner

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

GLM-4.1V-Thinking

V-JEPA 2

ERNIE 4.5

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Gemma 3n fully available in the open-source ecosystem!

Blessing Agyei Kyem

AI & ML interests

Organizations

Blessing's activity

Vision Language Model Alignment in TRL ⚡️

Welcome GPT OSS, the new open-source model family from OpenAI!

seemore: Implement a Vision Language Model from Scratch

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

SmolLM3: smol, multilingual, long-context reasoner

Gemma 3n fully available in the open-source ecosystem!