-
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Paper • 2502.17262 • Published • 22 -
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion
Paper • 2502.04235 • Published • 23 -
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Paper • 2505.07293 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2502.04235
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 17 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 17 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54
-
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Paper • 2502.17262 • Published • 22 -
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion
Paper • 2502.04235 • Published • 23 -
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Paper • 2505.07293 • Published • 27
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 17 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 17 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117