BiFormer: Vision Transformer with Bi-Level Routing Attention Paper • 2303.08810 • Published Mar 15, 2023
RelayAttention for Efficient Large Language Model Serving with Long System Prompts Paper • 2402.14808 • Published Feb 22, 2024