2 16 16

Shwai He

Shwai

https://shwai-he.github.io/

Shwai-He

AI & ML interests

Deep Learning, Mechine Learning, Natural Language Processing.

Recent Activity

upvoted a paper 5 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

commented on a paper 5 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

upvoted a paper 19 days ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

View all activity

Organizations

upvoted a paper 5 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

Paper • 2512.02351 • Published 7 days ago • 1

commented a paper 5 days ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

Paper • 2512.02351 • Published 7 days ago • 1 •

upvoted a paper 19 days ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published 26 days ago • 68

upvoted a paper about 1 month ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4 • 57

upvoted a collection about 2 months ago

Qwen3-VL

Collection

37 items • Updated Nov 1 • 492

upvoted a paper 3 months ago

Dense Video Understanding with Gated Residual Tokenization

Paper • 2509.14199 • Published Sep 17 • 2

liked a dataset 3 months ago

haichaozhang/DenseVideoEvaluation

Preview • Updated Sep 18 • 16 • 2

upvoted a paper 6 months ago

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Paper • 2506.01713 • Published Jun 2 • 48

liked a model 9 months ago

Qwen/QwQ-32B

Text Generation • 33B • Updated Mar 11 • 56.4k • • 2.87k

upvoted a collection 9 months ago

computation

Collection

this is for Mixture of XXX • 1 item • Updated Oct 23, 2024 • 2

upvoted 2 papers 9 months ago

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Paper • 2410.13184 • Published Oct 17, 2024 • 3

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published Mar 7 • 4

commented a paper 9 months ago

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published Mar 7 • 4 •

upvoted 2 papers 10 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 24

liked 5 models about 1 year ago

Shwai He

AI & ML interests

Recent Activity

Organizations

Shwai's activity