Papers
arxiv:2512.10411

Sliding Window Attention Adaptation

Published on Dec 11
· Submitted by
yuyijiong
on Dec 15
Authors:
,
,
,
,

Abstract

Sliding Window Attention Adaptation (SWAA) enables Transformer-based Large Language Models (LLMs) to use sliding window attention without retraining, recovering long-context performance through a combination of adaptation techniques.

AI-generated summary

The self-attention mechanism in Transformer-based Large Language Models (LLMs) scales quadratically with input length, making long-context inference expensive. Sliding window attention (SWA) reduces this cost to linear complexity, but naively enabling complete SWA at inference-time for models pretrained with full attention (FA) causes severe long-context performance degradation due to training-inference mismatch. This makes us wonder: Can FA-pretrained LLMs be well adapted to SWA without pretraining? We investigate this by proposing Sliding Window Attention Adaptation (SWAA), a set of practical recipes that combine five methods for better adaptation: (1) applying SWA only during prefilling; (2) preserving "sink" tokens; (3) interleaving FA/SWA layers; (4) chain-of-thought (CoT); and (5) fine-tuning. Our experiments show that SWA adaptation is feasible while non-trivial: no single method suffices, yet specific synergistic combinations effectively recover the original long-context performance. We further analyze the performance-efficiency trade-offs of different SWAA configurations and provide recommended recipes for diverse scenarios. Our code is available at https://github.com/yuyijiong/sliding-window-attention-adaptation

Community

Paper submitter
edited 9 days ago

We propose a set of practical recipes that can let a full-attention LLM use sliding window attention to improve efficiency. For example, some can achieve nearly 100% acceleration of LLM long-context inference speed with 90% accuracy retainment; some can only achieve about 30% acceleration but with nearly 100% accuracy retainment.

Our code is available at https://github.com/yuyijiong/sliding-window-attention-adaptation

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.10411 in a Space README.md to link it from this page.

Collections including this paper 3