Deepseek
updated
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion
Prior
Paper
• 2310.16818
• Published
• 33
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
• 2401.14196
• Published
• 70
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published
• 140
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
• 2403.05525
• Published
• 49
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
• 2405.14333
• Published
• 44
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published
• 69
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for
Sparse Architectural Large Language Models
Paper
• 2407.01906
• Published
• 46
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
• 2408.08152
• Published
• 61
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep
Learning
Paper
• 2408.14158
• Published
• 3
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
Paper
• 2408.15664
• Published
• 15
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
and Generation
Paper
• 2410.13848
• Published
• 35
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
Multimodal Understanding and Generation
Paper
• 2411.07975
• Published
• 31
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced
Multimodal Understanding
Paper
• 2412.10302
• Published
• 22
DeepSeek-V3 Technical Report
Paper
• 2412.19437
• Published
• 76
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
Janus-Pro: Unified Multimodal Understanding and Generation with Data and
Model Scaling
Paper
• 2501.17811
• Published
• 8
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
Paper
• 2502.07316
• Published
• 50
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published
• 167
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via
Reinforcement Learning for Subgoal Decomposition
Paper
• 2504.21801
• Published
• 4
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
• 2505.09343
• Published
• 76