TRAMS: Training-free Memory Selection for Long-range Language Modeling Paper • 2310.15494 • Published Oct 24, 2023 • 2
A Long Way to Go: Investigating Length Correlations in RLHF Paper • 2310.03716 • Published Oct 5, 2023 • 10
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 81
Giraffe: Adventures in Expanding Context Lengths in LLMs Paper • 2308.10882 • Published Aug 21, 2023 • 1
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models Paper • 2308.16137 • Published Aug 30, 2023 • 41
Investigating Answerability of LLMs for Long-Form Question Answering Paper • 2309.08210 • Published Sep 15, 2023 • 15
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 21
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 90
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 26
CLEX: Continuous Length Extrapolation for Large Language Models Paper • 2310.16450 • Published Oct 25, 2023 • 10
CAT-LM: Training Language Models on Aligned Code And Tests Paper • 2310.01602 • Published Oct 2, 2023 • 1
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding Paper • 2308.14508 • Published Aug 28, 2023 • 2
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model Paper • 2309.11568 • Published Sep 20, 2023 • 11
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression Paper • 2310.06839 • Published Oct 10, 2023 • 4
Context Compression for Auto-regressive Transformers with Sentinel Tokens Paper • 2310.08152 • Published Oct 12, 2023 • 1
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model Paper • 2212.09146 • Published Dec 18, 2022 • 3
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks Paper • 2305.18395 • Published May 28, 2023 • 1
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency Paper • 2304.11477 • Published Apr 22, 2023 • 3
SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge Paper • 2308.12682 • Published Aug 24, 2023 • 2
Combiner: Full Attention Transformer with Sparse Computation Cost Paper • 2107.05768 • Published Jul 12, 2021 • 1
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 44
L-Eval: Instituting Standardized Evaluation for Long Context Language Models Paper • 2307.11088 • Published Jul 20, 2023 • 5
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies Paper • 2302.06218 • Published Feb 13, 2023 • 1
Blockwise Parallel Transformer for Long Context Large Models Paper • 2305.19370 • Published May 30, 2023 • 3
Blockwise Self-Attention for Long Document Understanding Paper • 1911.02972 • Published Nov 7, 2019 • 1
LSG Attention: Extrapolation of pretrained Transformers to long sequences Paper • 2210.15497 • Published Oct 13, 2022 • 1
Efficient Long-Text Understanding with Short-Text Models Paper • 2208.00748 • Published Aug 1, 2022 • 1
Cure the headache of Transformers via Collinear Constrained Attention Paper • 2309.08646 • Published Sep 15, 2023 • 14
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture Paper • 2310.03052 • Published Oct 4, 2023 • 3
Efficient Streaming Language Models with Attention Sinks Paper • 2309.17453 • Published Sep 29, 2023 • 14
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers Paper • 2310.03294 • Published Oct 5, 2023 • 2
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 30
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content Paper • 2305.14806 • Published May 24, 2023 • 1
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences Paper • 2305.11129 • Published May 18, 2023 • 2
LongT5: Efficient Text-To-Text Transformer for Long Sequences Paper • 2112.07916 • Published Dec 15, 2021 • 2
Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System Paper • 2304.13343 • Published Apr 26, 2023 • 1
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens Paper • 2305.04241 • Published May 7, 2023 • 1
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse Paper • 2311.07468 • Published Nov 13, 2023 • 1
Never Lost in the Middle: Improving Large Language Models via Attention Strengthening Question Answering Paper • 2311.09198 • Published Nov 15, 2023 • 3
SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences Paper • 2208.02169 • Published Aug 3, 2022 • 1
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 43
Attention Sorting Combats Recency Bias In Long Context Language Models Paper • 2310.01427 • Published Sep 28, 2023 • 1
CoLT5: Faster Long-Range Transformers with Conditional Computation Paper • 2303.09752 • Published Mar 17, 2023 • 2
Cached Transformers: Improving Transformers with Differentiable Memory Cache Paper • 2312.12742 • Published Dec 20, 2023 • 13
Axiomatic Preference Modeling for Longform Question Answering Paper • 2312.02206 • Published Dec 2, 2023 • 10
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents Paper • 2312.01279 • Published Dec 3, 2023 • 6
Extending Context Window of Large Language Models via Semantic Compression Paper • 2312.09571 • Published Dec 15, 2023 • 16
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention Paper • 2312.08618 • Published Dec 14, 2023 • 13
LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31, 2024 • 24
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey Paper • 2401.07872 • Published Jan 15, 2024 • 2
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models Paper • 2401.06951 • Published Jan 13, 2024 • 26
Gated Linear Attention Transformers with Hardware-Efficient Training Paper • 2312.06635 • Published Dec 11, 2023 • 9
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2, 2024 • 27
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21, 2024 • 116
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27, 2024 • 24
LOCOST: State-Space Models for Long Document Abstractive Summarization Paper • 2401.17919 • Published Jan 31, 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12, 2024 • 66
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling Paper • 2402.18508 • Published Feb 28, 2024
HMT: Hierarchical Memory Transformer for Long Context Language Processing Paper • 2405.06067 • Published May 9, 2024 • 2
LongHeads: Multi-Head Attention is Secretly a Long Context Processor Paper • 2402.10685 • Published Feb 16, 2024 • 1
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference Paper • 2405.17755 • Published May 28, 2024 • 1
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models Paper • 2406.05678 • Published Jun 9, 2024 • 1
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models Paper • 2406.00605 • Published Jun 2, 2024 • 2
Equipping Transformer with Random-Access Reading for Long-Context Understanding Paper • 2405.13216 • Published May 21, 2024 • 1
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation Paper • 2406.10996 • Published Jun 16, 2024 • 35
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory Paper • 2402.04617 • Published Feb 7, 2024 • 6
Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope Paper • 2407.15176 • Published Jul 21, 2024 • 3
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Paper • 2407.15891 • Published Jul 22, 2024
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27, 2024 • 144
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach Paper • 2407.16833 • Published Jul 23, 2024
ReMamba: Equip Mamba with Effective Long-Sequence Modeling Paper • 2408.15496 • Published Aug 28, 2024 • 12
General-purpose, long-context autoregressive modeling with Perceiver AR Paper • 2202.07765 • Published Feb 15, 2022
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training Paper • 2407.15892 • Published Jul 22, 2024
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs Paper • 2406.18173 • Published Jun 26, 2024
MemLong: Memory-Augmented Retrieval for Long Text Modeling Paper • 2408.16967 • Published Aug 30, 2024 • 3
Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads Paper • 2407.17678 • Published Jul 25, 2024
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning Paper • 2409.06679 • Published Sep 10, 2024 • 4
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Paper • 2406.19707 • Published Jun 28, 2024
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper • 2510.00615 • Published Oct 1, 2025 • 35
Global Context Compression with Interleaved Vision-Text Transformation Paper • 2601.10378 • Published Jan 15 • 2