🔄 In a Training Loop

7 37 85

neuralink

https://phucnguyen.dev

AI & ML interests

distributed training @nous research. ex-nanotron @huggingface

Recent Activity

upvoted a paper about 1 month ago

Long Context Pre-Training with Lighthouse Attention

upvoted a paper about 1 month ago

Efficient Pre-Training with Token Superposition

published a Space about 2 months ago

neuralink/distill-blog-phuc

View all activity

Organizations

upvoted 2 papers about 1 month ago

Long Context Pre-Training with Lighthouse Attention

Paper • 2605.06554 • Published May 7 • 31

Efficient Pre-Training with Token Superposition

Paper • 2605.06546 • Published May 7 • 47

upvoted a paper 3 months ago

Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts

Paper • 2404.05019 • Published Apr 7, 2024 • 2

upvoted a paper 7 months ago

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

Paper • 2510.08189 • Published Oct 9, 2025 • 28

upvoted an article 11 months ago

Article

Arc Virtual Cell Challenge: A Primer

FL33TW00D-HF, abhinadduri

•

Jul 18, 2025

• 66

upvoted 3 articles about 1 year ago

Article

The Transformers Library: standardizing model definitions

lysandre, ArthurZ, pcuenq, julien-c

•

May 15, 2025

• 123

Article

You could have designed state of the art positional encoding

FL33TW00D-HF

•

Nov 25, 2024

• 487

Article

Welcome Llama 4 Maverick & Scout on Hugging Face

burtenshaw, reach-vb, pcuenq, clem, rajatarya, jsulz, lysandre

•

Apr 5, 2025

• 149

upvoted a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 209

upvoted 5 articles over 1 year ago

Article

Open R1: Update #3

open-r1

•

Mar 11, 2025

• 298

Article

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

medmekk, marcsun13

•

Mar 7, 2025

• 98

Article

Open-source DeepResearch – Freeing our search agents

m-ric, albertvillanova, merve, thomwolf, clefourrier

•

Feb 4, 2025

• 1.32k

Article

Open-R1: a fully open reproduction of DeepSeek-R1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 889

Article

Open-R1: Update #1

open-r1

•

Feb 2, 2025

• 305

upvoted 3 papers over 1 year ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Paper • 2409.15241 • Published Sep 23, 2024 • 1

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5, 2025 • 26

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 262

upvoted a paper almost 2 years ago

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22

upvoted an article almost 2 years ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif

•

Jul 11, 2024

• 128

upvoted a paper almost 2 years ago

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Paper • 2201.02177 • Published Jan 6, 2022 • 6

neuralink

AI & ML interests

Recent Activity

Organizations

neuralink's activity

Arc Virtual Cell Challenge: A Primer

The Transformers Library: standardizing model definitions

You could have designed state of the art positional encoding

Welcome Llama 4 Maverick & Scout on Hugging Face

Open R1: Update #3

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

Open-source DeepResearch – Freeing our search agents

Open-R1: a fully open reproduction of DeepSeek-R1

Open-R1: Update #1

How NuminaMath Won the 1st AIMO Progress Prize