23 70 56

Joya Chen PRO

chenjoya

https://chenjoya.github.io/

chenjoya

AI & ML interests

Video LLM

Recent Activity

upvoted a paper 3 days ago

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

upvoted a paper 3 days ago

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

upvoted a paper 9 days ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

View all activity

Organizations

upvoted 2 papers 3 days ago

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Paper • 2512.14666 • Published 10 days ago • 8

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published 8 days ago • 183

upvoted a paper 9 days ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published 11 days ago • 63

upvoted a paper 23 days ago

Glance: Accelerating Diffusion Models with 1 Sample

Paper • 2512.02899 • Published 24 days ago • 28

upvoted 3 papers about 1 month ago

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

Paper • 2511.20256 • Published Nov 25 • 27

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20 • 109

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

Paper • 2511.11434 • Published Nov 14 • 44

liked 2 models about 1 month ago

nvidia/diar_streaming_sortformer_4spk-v2

Automatic Speech Recognition • Updated 10 days ago • 9.31k • 86

pyannote/speaker-diarization-community-1

Automatic Speech Recognition • Updated Sep 29 • 634k • 123

upvoted 3 papers about 1 month ago

upvoted 5 papers about 2 months ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6 • 37

Revisiting Multimodal Positional Encoding in Vision-Language Models

Paper • 2510.23095 • Published Oct 27 • 20

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4 • 101

ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

Paper • 2510.18455 • Published Oct 21 • 17

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published Oct 27 • 58

liked 3 datasets 2 months ago

MikhailT/lj-speech

Viewer • Updated Jun 23, 2023 • 13.1k • 334 • 6

zeyun-zhong/LLaVA-Video-216KQA

Viewer • Updated Oct 18 • 1.53k • 1.3k • 1

mit-han-lab/Inf-Stream-Train

Preview • Updated Oct 21 • 15.3k • 1

Joya Chen PRO

AI & ML interests

Recent Activity

Organizations

chenjoya's activity