3 21 2

Junbo Niu

Niujunbo2002

Niujunbo2002

AI & ML interests

Computer vision and pattern recognition

Recent Activity

authored a paper about 2 months ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

upvoted a paper about 2 months ago

StreamingVLM: Real-Time Understanding for Infinite Video Streams

upvoted a paper about 2 months ago

Trace Anything: Representing Any Video in 4D via Trajectory Fields

View all activity

Organizations

upvoted 2 papers about 2 months ago

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published Oct 10 • 50

Trace Anything: Representing Any Video in 4D via Trajectory Fields

Paper • 2510.13802 • Published Oct 15 • 30

upvoted a paper 2 months ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26 • 136

upvoted a collection 3 months ago

TraDo Series

Collection

SOTA Diffusion Large Language Models • 5 items • Updated Sep 11 • 11

upvoted an article 5 months ago

Article

The Technology Behind BLOOM Training

Jul 14, 2022

•

upvoted a collection 5 months ago

NativeRes-LLaVA

Collection

LLaVA using images with native resolution • 7 items • Updated Jun 14 • 5

upvoted 2 papers 5 months ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5 • 51

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Paper • 2506.12776 • Published Jun 15 • 2

upvoted a collection 6 months ago

ReLIFT

Collection

ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone. • 8 items • Updated Jun 10 • 1

upvoted 3 papers 9 months ago

upvoted 2 papers 10 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 65

upvoted 2 papers 11 months ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published Jan 21 • 45

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 43

upvoted a paper 12 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 98

upvoted 3 papers about 1 year ago

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published Dec 3, 2024 • 24

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 47

Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 51

Junbo Niu

AI & ML interests

Recent Activity

Organizations

Niujunbo2002's activity

The Technology Behind BLOOM Training