mlpc-ucsd

university

https://pages.ucsd.edu/~ztu/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

zwcolin authored a paper 9 days ago

Stateful Visual Encoders for Vision-Language Models

zwcolin submitted a paper 11 days ago

Stateful Visual Encoders for Vision-Language Models

gordonhu authored a paper 2 months ago

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

View all activity

Papers

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Pose Recognition with Cascade Transformers

View all Papers

authored a paper 9 days ago

Stateful Visual Encoders for Vision-Language Models

Paper • 2606.04433 • Published 12 days ago • 8

submitted a paper to Daily Papers 11 days ago

Stateful Visual Encoders for Vision-Language Models

Paper • 2606.04433 • Published 12 days ago • 8

authored 9 papers 2 months ago

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Paper • 2308.09936 • Published Aug 19, 2023 • 1

Matryoshka Query Transformer for Large Vision-Language Models

Paper • 2405.19315 • Published May 29, 2024 • 1

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Paper • 2410.08182 • Published Oct 10, 2024

Verbalized Representation Learning for Interpretable Few-Shot Generalization

Paper • 2411.18651 • Published Nov 27, 2024

Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8, 2025 • 16

TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models

Paper • 2509.25143 • Published Sep 29, 2025

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

Paper • 2510.08457 • Published Oct 9, 2025 • 14

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Paper • 2512.10863 • Published Dec 11, 2025 • 22

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Paper • 2604.08539 • Published Apr 9 • 50

submitted a paper to Daily Papers 3 months ago

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Paper • 2603.05888 • Published Mar 6 • 2

authored 7 papers 5 months ago

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Paper • 2305.10626 • Published May 18, 2023 • 1

Language Models as Science Tutors

Paper • 2402.11111 • Published Feb 16, 2024

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Paper • 2210.10763 • Published Oct 19, 2022 • 1

OmniControlNet: Dual-stage Integration for Conditional Image Generation

Paper • 2406.05871 • Published Jun 9, 2024

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Paper • 2508.00728 • Published Aug 1, 2025

FrontierCS: Evolving Challenges for Evolving Intelligence

Paper • 2512.15699 • Published Dec 17, 2025 • 5

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Paper • 2601.16973 • Published Jan 23 • 40

submitted a paper to Daily Papers 5 months ago

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Paper • 2601.16973 • Published Jan 23 • 40