8 24 20

Kaixin Li

likaixin

https://likaixin2000.github.io/

likaixin2000

AI & ML interests

None yet

Recent Activity

authored a paper 16 days ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

upvoted a paper 16 days ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

new activity 25 days ago

likaixin/ScreenSpot-Pro:Upload eval.yaml

View all activity

Organizations

upvoted a paper 16 days ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published 17 days ago • 96

upvoted a paper 26 days ago

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Paper • 2603.13594 • Published 29 days ago • 147

upvoted a paper about 1 month ago

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 64

upvoted 2 collections about 2 months ago

Qwen3.5

Collection

21 items • Updated Mar 9 • 1.49k

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 689

upvoted 2 papers 2 months ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 352

Grounding and Enhancing Informativeness and Utility in Dataset Distillation

Paper • 2601.21296 • Published Jan 29 • 19

upvoted 3 papers 4 months ago

Step-GUI Technical Report

Paper • 2512.15431 • Published Dec 17, 2025 • 133

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 161

MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

Paper • 2511.09067 • Published Nov 12, 2025 • 2

upvoted 2 papers 5 months ago

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 107

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4, 2025 • 103

upvoted 3 papers 6 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 39

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Paper • 2404.09486 • Published Apr 15, 2024 • 2

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

Paper • 2507.01702 • Published Jul 2, 2025 • 4

upvoted an article 6 months ago

Article

BigCodeArena: Judging code generations end to end with code executions

Oct 7, 2025

•

upvoted a paper 6 months ago

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 179

upvoted a paper 8 months ago

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20, 2025 • 43

upvoted 2 articles 10 months ago

Article

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

Jun 21, 2025

•

Article

GRPO for GUI Grounding Done Right

Jun 11, 2025

•

Kaixin Li

AI & ML interests

Recent Activity

Organizations

likaixin's activity

BigCodeArena: Judging code generations end to end with code executions

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

GRPO for GUI Grounding Done Right