-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2512.17260
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 93 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 104 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 105
-
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
Paper • 2510.23587 • Published • 67 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 143 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 126 -
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Paper • 2511.08892 • Published • 210
-
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Paper • 2512.04763 • Published • 4 -
VisPlay: Self-Evolving Vision-Language Models from Images
Paper • 2511.15661 • Published • 43 -
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
Paper • 2512.14531 • Published • 15 -
Improving Recursive Transformers with Mixture of LoRAs
Paper • 2512.12880 • Published • 6
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 128 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 129 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 86
-
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving
Paper • 2507.02726 • Published • 14 -
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
Paper • 2507.06804 • Published • 16 -
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper • 2507.23726 • Published • 115 -
Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing
Paper • 2512.04829 • Published • 12
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Paper • 2512.04763 • Published • 4 -
VisPlay: Self-Evolving Vision-Language Models from Images
Paper • 2511.15661 • Published • 43 -
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
Paper • 2512.14531 • Published • 15 -
Improving Recursive Transformers with Mixture of LoRAs
Paper • 2512.12880 • Published • 6
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 93 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 104 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 105
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 128 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 129 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 86
-
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
Paper • 2510.23587 • Published • 67 -
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Paper • 2510.05684 • Published • 143 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 126 -
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Paper • 2511.08892 • Published • 210
-
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving
Paper • 2507.02726 • Published • 14 -
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
Paper • 2507.06804 • Published • 16 -
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper • 2507.23726 • Published • 115 -
Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing
Paper • 2512.04829 • Published • 12