Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Paper • 2505.06708 • Published May 10 • 9
Running on Zero Featured 97 SAM3 Video Segmentation 🐠 97 Track and label objects in videos using text prompts or clicks
Running on CPU Upgrade Featured 2.75k The Smol Training Playbook 📚 2.75k The secrets to building world-class LLMs