Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.1.0
π Speed Optimized Summarization with DistilBART
The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings.
π Major Speed Optimizations Applied
1. Faster Model
- Switched from
facebook/bart-large-cnn(~1.6GB) - To
sshleifer/distilbart-cnn-12-6(~400MB) - π₯ 6x smaller model size = Much faster loading and inference
2. Processing Optimizations
- Smaller chunks: 512 words vs 900 (faster processing)
- Limited chunks: Max 5 chunks processed (prevents hanging on huge docs)
- Faster tokenization: Word count instead of full tokenization for chunking
- Reduced beam search: 2 beams instead of 4 (2x faster)
3. Smart Summarization
- Shorter summaries: Reduced max lengths across all modes
- Skip final summary: For documents with β€2 chunks (saves time)
- Early stopping: Enabled for faster convergence
- Progress tracking: Shows which chunk is being processed
4. Memory & Performance
- Float16 precision: Used when GPU is available (faster inference)
- Optimized pipeline: Better model loading with fallback
optimumlibrary added: For additional speed improvements
β‘ Expected Speed Improvements
| Task | Before | After |
|---|---|---|
| Model loading | ~30+ seconds | ~10 seconds |
| PDF processing | Minutes | ~5β15 seconds |
| Memory usage | ~1.6GB | ~400MB |
| Overall speed | Slow | π 5β10x faster |
𧬠What is DistilBART?
DistilBART is a compressed version of the BART model designed to be lighter and faster while retaining most of BARTβs performance. Itβs the result of model distillation, where a smaller model (the student) learns from a larger one (the teacher), in this case, facebook/bart-large.
| Attribute | Description |
|---|---|
| Full Name | Distilled BART |
| Base Model | facebook/bart-large |
| Distilled By | Hugging Face π€ |
| Purpose | Faster inference and smaller footprint for tasks like summarization |
| Architecture | Encoder-decoder Transformer, like BART, but with fewer layers |
βοΈ Key Differences: BART vs DistilBART
| Feature | BART (Large) | DistilBART |
|---|---|---|
| Encoder Layers | 12 | 6 |
| Decoder Layers | 12 | 6 |
| Parameters | ~406M | ~222M |
| Model Size | ~1.6GB | |
| Speed | Slower | ~2x faster |
| Performance | Very high | Slight drop (~1β2%) |
π― Use Cases
- β Text Summarization (primary use case)
- π Translation (basic use)
- β‘ Ideal for edge devices or real-time systems where speed & size matter
π§ͺ Example: Summarization with DistilBART
You can easily use DistilBART with Hugging Face Transformers:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load pretrained DistilBART model
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")
# Input text
ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..."
# Tokenize and summarize
inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True)
summary_ids = model.generate(
inputs["input_ids"],
max_length=150,
min_length=40,
length_penalty=2.0,
num_beams=4,
early_stopping=True
)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
π¦ Available Variants
| Model Name | Task | Description |
|---|---|---|
sshleifer/distilbart-cnn-12-6 |
Summarization | Distilled from facebook/bart-large-cnn |
philschmid/distilbart-xsum-12-6 |
Summarization (XSUM dataset) | Short, abstractive summaries |
π Find more on Hugging Face Model Hub
π Summary
- π§ DistilBART is a distilled, faster version of BART
- π§© Ideal for summarization tasks with lower memory and latency requirements
- π‘ Trained using knowledge distillation from
facebook/bart-large - βοΈ Works well in apps needing faster performance without significant loss in quality
β Try it now β it should be significantly faster! πββοΈπ¨
Thank You