Core Papers
updated
Attention Is All You Need
Paper
• 1706.03762
• Published • 120
Scaling Laws for Neural Language Models
Paper
• 2001.08361
• Published • 10
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
• 2104.09864
• Published • 17
LoRA Learns Less and Forgets Less
Paper
• 2405.09673
• Published • 91
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published • 72
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 447
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published • 26
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published • 20
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published • 15
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published • 24
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
• 2005.11401
• Published • 14
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published • 64
Denoising Diffusion Probabilistic Models
Paper
• 2006.11239
• Published • 9