Meno-Lite-0.1
A 7B language model that masters the art of reading, not memorizing.
Meno-Lite-0.1 is a 7-billion-parameter large language model purpose-built for Russian-language RAG pipelines, document question answering, information extraction, and summarization. Rather than trying to cram encyclopedic world knowledge into a modest parameter budget, Meno-Lite doubles down on language skills — the ability to parse, transform, and reason over text supplied in context. The result is a compact model that punches well above its weight class on tasks where the answer lies in the input, not in the model's memory.
Key idea. We hypothesize that the capabilities of LLMs decompose into two largely independent axes: world knowledge (facts, dates, entities) and language knowledge (comprehension, extraction, inference, generation). World knowledge scales roughly linearly with parameter count, but language knowledge reaches a surprisingly high plateau even in 7B-class models — provided it is deliberately cultivated. Meno-Lite-0.1 is an empirical test of this hypothesis: by investing training compute exclusively into language skills, we obtain a model that rivals or surpasses much larger systems on context-grounded tasks while remaining deployable on a single consumer GPU.
Model Details
Model Description
Meno-Lite-0.1 is derived from RuadaptQwen2.5-7B-Lite-Beta through a carefully designed two-stage training pipeline (continued pretraining → supervised fine-tuning) that sharpens the model's ability to work with documents rather than from parametric memory. The full lineage is:
Qwen/Qwen2.5-7B-Instruct
└─► t-tech/T-lite-it-1.0
└─► RefalMachine/RuadaptQwen2.5-7B-Lite-Beta
└─► bond005/Meno-Lite-0.1 ◄── you are here
Each ancestor added a layer of Russian-language adaptation; Meno-Lite-0.1 adds a final layer of skill-oriented training focused on information extraction, entity normalization, multi-hop reasoning over long contexts, and instruction following for RAG scenarios. Although the model is primarily oriented toward Russian, it retains strong English performance thanks to bilingual pretraining data (sampled FineWeb-Edu) and English-language SFT examples (MultiHopRAG, MTRAGEval).
- Developed by: Ivan Bondarenko and colleagues, Novosibirsk State University (NSU)
- Model type: Causal decoder-only transformer (Qwen2.5 architecture)
- Parameters: ~7B
- Language(s): Russian (primary), English (retained)
- License: Apache 2.0
- Base model: RefalMachine/RuadaptQwen2.5-7B-Lite-Beta
Model Sources
Motivation: Language Knowledge vs. World Knowledge
Modern LLMs are often evaluated — and marketed — on their ability to recall factual trivia. Yet the vast majority of production deployments do not rely on parametric recall at all: RAG systems, function-calling agents, document assistants, and code-generation copilots all receive the necessary information in context. What these applications demand is not a bigger encyclopedia but a sharper reader.
We formalize this intuition as a two-axis framework:
| Axis | What it captures | Scaling behavior | Examples |
|---|---|---|---|
| World knowledge | Facts, entities, relations memorized during pretraining | Scales roughly linearly with parameters | CheGeKa, MaMuRAMu, ruMMLU |
| Language knowledge | Comprehension, extraction, transformation, reasoning over supplied text | Reaches a high plateau at 7B and above | MultiQ, ruTiE, USE, RAG QA, summarization |
Meno-Lite-0.1 deliberately sacrifices world-knowledge breadth (which is inherently limited at 7B) in favor of maximizing language-knowledge depth. The training data and SFT instructions were curated to reinforce how the model processes text, not what it knows about the world. As the benchmarks below demonstrate, this trade-off pays off handsomely for context-grounded tasks.
Uses
Direct Use
- RAG pipelines: Meno-Lite-0.1 excels at answering questions when relevant passages are retrieved and injected into the prompt. Its training on multi-hop QA datasets (MultiHopRAG, MTRAGEval, LongContextMultiQ) makes it particularly adept at synthesizing information scattered across multiple chunks.
- Document QA and summarization: Legal contracts, technical manuals, scientific papers — any scenario where the model must read carefully and respond precisely.
- Information extraction and entity normalization: SFT on NEREL-based instructions and GPT-4o-mini–generated entity definitions equips the model with robust NER and normalization capabilities.
- Function calling and agentic workflows: Tasks where the required knowledge arrives via tool outputs rather than parametric memory.
Downstream Use
Meno-Lite-0.1 can serve as a strong starting point for further fine-tuning on domain-specific corpora (e.g., medical, financial, or governmental documents) where context-grounded accuracy is paramount.
Out-of-Scope Use
- Open-domain factual QA without context: The model was not optimized for parametric recall; do not expect it to outperform larger models on trivia-style benchmarks.
- Safety-critical applications without human oversight: Like all LLMs, Meno-Lite-0.1 can hallucinate, especially when relevant context is absent.
- Languages other than Russian and English: While the Qwen2.5 backbone supports many languages, Meno-Lite-0.1 has been validated only on Russian and English.
How to Get Started with the Model
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "bond005/meno-lite-0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
SYSTEM_PROMPT = "Вы — полезный ассистент. Отвечайте на вопросы, опираясь на предоставленный контекст."
CHUNKS = [
"Новосибирский государственный университет (НГУ) был основан в 1959 году в Академгородке.", # 0
"12 сентября 1959 года был успешно осуществлён запуск автоматической межпланетной станции «Луна-2». " \
"14 сентября 1959 года станция «Луна-2» впервые в мире достигла поверхности Луны в районе Моря Дождей " \
"вблизи кратеров Аристилл, Архимед и Автолик.", # 1
"Московский государственный университит имени М. В. Ломоносова (МГУ) был основан в 1755 году. " \
"Изначально университет располагался в здании Главной аптеки (бывший Земский приказ) на месте " \
"Государственного исторического музея на Красной площади.", # 2
]
CONTEXT = "\n\n".join([f"Контекст {idx + 1}:\n```text\n{val}\n```" for idx, val in enumerate(CHUNKS)]) + "\n\nВопрос: "
USER_QUESTION = "Какой университет был основан в том же году, когда впервые в истории рукотворный аппарат достиг поверхности Луны?"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": CONTEXT + USER_QUESTION + "\n"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(f"Вопрос: {USER_QUESTION}\nОтвет модели: {response}\n")
ANOTHER_USER_QUESTION = "Через сколько лет после университета в Москве был основан университет в Новосибирске?"
messages2 = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": CONTEXT + ANOTHER_USER_QUESTION + "\n"}
]
text2 = tokenizer.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
inputs2 = tokenizer([text2], return_tensors="pt").to(model.device)
outputs2 = model.generate(**inputs2, max_new_tokens=256)
response2 = tokenizer.decode(outputs2[0][inputs2["input_ids"].shape[-1]:], skip_special_tokens=True)
print(f"Вопрос: {ANOTHER_USER_QUESTION}\nОтвет модели: {response2}\n")
FEW_SHOTS_FOR_NER = [
{
"role": "system",
"content": "Вы - эксперт в области анализа текстов и извлечения семантической информации из них."
},
{
"role": "user",
"content": "Выделите именованные сущности классов ORGANIZATION, PERSON и LOCATION из входного текста и запишите ответ в JSON-формате." \
"\n\nВходной текст:\n\n```text\nНаучный сотрудник лаборатории прикладных цифровых технологий Международного " \
"научно-образовательного математического центра НГУ Иван Бондаренко рассказал о грантовой программе и о том, как " \
"его проект RAGU попал в число победителей.\n```\n"
},
{
"role": "assistant",
"content": "{\"ORGANIZATION\": [\"лаборатория прикладных цифровых технологий Международного научно-образовательного математического центра НГУ\", " \
"\"Международный научно-образовательный математический центр НГУ\", \"НГУ\"], \"PERSON\": [\"Иван Бондаренко\"], \"LOCATION\": []}"
},
{
"role": "user",
"content": "Выделите именованные сущности классов ORGANIZATION, PERSON и LOCATION из входного текста и запишите ответ в JSON-формате." \
"\n\nВходной текст:\n\n```text\nНациональный исследовательский университет «Высшая школа экономики» (НИУ ВШЭ) представил результаты " \
"15-го мониторинга качества приема на бюджетные и платные места российских вузов в 2025 году. В группе лидеров " \
"10 московских университетов, три питерских и по одному представителю из таких регионов, как Татарстан (Иннополис), " \
"Нижний Новгород и Новосибирск (НГУ).\n```\n"
},
{
"role": "assistant",
"content": "{\"ORGANIZATION\": [\"Национальный исследовательский университет «Высшая школа экономики»\", \"НИУ ВШЭ\", \"НГУ\"], " \
"\"PERSON\": [], \"LOCATION\": [\"московский\", \"питерский\", \"Татарстан\", \"Иннополис\", \"Нижний Новгород\", " \
"\"Новосибирск\"]}"
},
{
"role": "user",
"content": "Выделите именованные сущности классов ORGANIZATION, PERSON и LOCATION из входного текста и запишите ответ в JSON-формате." \
"\n\nВходной текст:\n\n```text\nПочему китайская ИИ-модель DeepSeek гораздо эффективнее и дешевле западных аналогов?\n```\n"
},
{
"role": "assistant",
"content": "{\"ORGANIZATION\": [], \"PERSON\": [], \"LOCATION\": [\"китайская\", \"западный\"]}"
}
]
INPUT_TEXT_FOR_NER = "Станислав Владимирович Дробышевский – российский антрополог, кандидат биологических наук, доцент кафедры антропологии " \
"биологического факультета МГУ им. М.В. Ломоносова, научный редактор портала “Антропогенез.ру” и, без сомнения, " \
"одна из самых ярких и узнаваемых фигур в российской науке."
text3 = tokenizer.apply_chat_template(
FEW_SHOTS_FOR_NER + [{"role": "user", "content": INPUT_TEXT_FOR_NER}],
tokenize=False, add_generation_prompt=True
)
inputs3 = tokenizer([text3], return_tensors="pt").to(model.device)
outputs3 = model.generate(**inputs3, max_new_tokens=256)
response3 = json.loads(tokenizer.decode(outputs3[0][inputs3["input_ids"].shape[-1]:], skip_special_tokens=True))
print(f"Входной текст: {INPUT_TEXT_FOR_NER}\nРаспознанные сущности:\n{json.dumps(response3, ensure_ascii=False, indent=4)}\n")
As a result, you will see text similar to the following:
Вопрос: Какой университет был основан в том же году, когда впервые в истории рукотворный аппарат достиг поверхности Луны?
Ответ модели: Новосибирский государственный университет (НГУ) был основан в том же году, когда впервые в истории рукотворный аппарат достиг поверхности Луны.
Вопрос: Через сколько лет после университета в Москве был основан университет в Новосибирске?
Ответ модели: Университет в Новосибирске был основан через 204 года после Московского государственного университета.
Входной текст: Станислав Владимирович Дробышевский – российский антрополог, кандидат биологических наук, доцент кафедры антропологии биологического факультета МГУ им. М.В. Ломоносова, научный редактор портала “Антропогенез.ру” и, без сомнения, одна из самых ярких и узнаваемых фигур в российской науке.
Распознанные сущности:
{
"ORGANIZATION": [
"биологический факультет МГУ им. М.В. Ломоносова",
"МГУ им. М.В. Ломоносова"
],
"PERSON": [
"Станислав Владимирович Дробышевский"
],
"LOCATION": [
"российская"
]
}
Using vLLM for high-throughput serving:
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
model_name = "bond005/meno-lite-0.1"
tok = AutoTokenizer.from_pretrained(model_name)
llm = LLM(
model=model_name,
dtype="bfloat16",
max_model_len=32768,
gpu_memory_utilization=0.85
)
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
messages = [
{
"role": "system",
"content": "Вы — Менон, разработанный в Новосибирском государственном университете. Вы — полезный помощник."
},
{
"role": "user",
"content": "Привет! Расскажи о себе."
}
]
input_text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate([input_text], sampling_params)
print(outputs[0].outputs[0].text)
As a result, you will see text similar to the following:
Привет! Меня зовут Менон, и я — виртуальный помощник, созданный в Новосибирском государственном университете. Я здесь, чтобы помочь вам с различными вопросами и задачами.
Tokenizer Efficiency
An often-overlooked determinant of real-world throughput is tokenizer efficiency: the more characters each token covers, the fewer autoregressive steps are needed to generate text of a given length. Meno-Lite-0.1 inherits the extended tokenizer from RuadaptQwen2.5-7B-Lite-Beta, which dramatically improves Russian-language efficiency compared to the original Qwen2.5 vocabulary.
| Model | Chars/token (RU) | Chars/token (EN) |
|---|---|---|
| Meno-Lite-0.1 | 3.77 | 4.13 |
| RuadaptQwen2.5-7B-Lite-Beta | 3.77 | 4.13 |
| AvitoTech/avibe (8B) | 3.79 | 4.06 |
| t-tech/T-lite-it-2.1 (7B) | 3.74 | 4.14 |
| t-tech/T-lite-it-1.0 (7B) | 2.57 | 4.14 |
| Qwen/Qwen2.5-7B-Instruct | 2.57 | 4.14 |
| GigaChat3-10B-A1.8B | 3.74 | 3.99 |
Meno-Lite-0.1 achieves 3.77 characters per token on Russian text — a 47% improvement over the original Qwen2.5 tokenizer (2.57 chars/token). This translates directly into faster inference and lower serving costs for Russian-language workloads, while English efficiency remains on par with the best models in the class.
Evaluation
MERA Benchmark
MERA is the most comprehensive Russian-language benchmark for evaluating LLMs on "strong AI" tasks. It comprises 15 scored tasks (with closed test sets) spanning world knowledge, reasoning, logic, mathematics, coding, and language understanding. The overall score is the mean across tasks (for tasks with multiple metrics, those metrics are averaged first).
Selected models from the MERA leaderboard (sorted by score):
| # | Model | Size | MERA Score |
|---|---|---|---|
| 1 | Human Benchmark | — | 0.852 |
| 3 | BerryLM-MT | 20B | 0.745 |
| 11 | Cotype Pro 2.5 | 32.5B | 0.671 |
| 14 | T-pro-it-2.0 | 32.8B | 0.660 |
| 22 | A-vibe | 8B | 0.618 |
| 25 | RuadaptQwen-32B-instruct | 32B | 0.615 |
| 27 | Qwen2.5-32B-Instruct | 32B | 0.603 |
| 29 | Qwen2.5-72B-Instruct | 72.7B | 0.601 |
| 30 | Meta-Llama-3.1-405B-Instruct | 405B | 0.590 |
| 36 | Meno-Lite-0.1 | 7B | 0.555 |
| 37 | Llama-3.3-70B-Instruct | 70B | 0.555 |
| 38 | Meta-Llama-3.1-70B-Instruct | 70.6B | 0.554 |
| 39 | T-lite-it-1.0 | 7B | 0.552 |
| 44 | RuadaptQwen2.5-7B-Lite-v1 | 7B | 0.536 |
| 49 | GigaChat3-10B-A1.8B | 10B | 0.518 |
| 56 | Qwen2.5-7B-Instruct | 7B | 0.482 |
Key observations:
- Meno-Lite-0.1 (0.555) matches or exceeds 70B-class models such as Llama-3.3-70B-Instruct (0.555) and Meta-Llama-3.1-70B-Instruct (0.554), despite being 10× smaller.
- It surpasses its direct ancestor T-lite-it-1.0 (0.552) and significantly outperforms both the base Qwen2.5-7B-Instruct (0.482) and RuadaptQwen2.5-7B-Lite-v1 (0.536).
- Among 7B-class models, Meno-Lite-0.1 achieves the highest MERA score, demonstrating that targeted skill training can close the gap with much larger architectures.
Task-level breakdown (Meno-Lite-0.1 vs. key comparisons)
| Task | Meno-Lite-0.1 (7B) | T-lite-it-1.0 (7B) | Qwen2.5-7B-Instruct (7B) | A-vibe (8B) | Llama-3.3-70B (70B) |
|---|---|---|---|---|---|
| RWSD | 0.569 | 0.535 | 0.515 | 0.565 | 0.600 |
| PARus | 0.818 | 0.894 | 0.848 | 0.910 | 0.914 |
| RCB | 0.541/0.458 | 0.571/0.533 | 0.562/0.493 | 0.582/0.547 | 0.575/0.380 |
| MultiQ | 0.536/0.403 | 0.523/0.398 | 0.425/0.296 | 0.539/0.410 | 0.573/0.418 |
| ruWorldTree | 0.949/0.760 | 0.964/0.964 | 0.939/0.939 | 0.968/0.968 | 0.954/0.769 |
| ruOpenBookQA | 0.880/0.705 | 0.905/0.905 | 0.845/0.845 | 0.888/0.887 | 0.910/0.735 |
| CheGeKa | 0.346/0.293 | 0.502/0.413 | 0.077/0.048 | 0.168/0.120 | 0.339/0.276 |
| ruTiE | 0.794 | 0.786 | 0.777 | 0.811 | 0.824 |
| USE | 0.240 | 0.147 | 0.219 | 0.371 | 0.298 |
| MathLogicQA | 0.666 | 0.662 | 0.467 | 0.661 | 0.566 |
| ruMultiAr | 0.347 | 0.346 | 0.307 | 0.391 | 0.340 |
| LCS | 0.186 | 0.144 | 0.114 | 0.172 | 0.168 |
| ruModAr | 0.497 | 0.493 | 0.473 | 0.929 | 0.570 |
| MaMuRAMu | 0.749 | 0.775 | 0.711 | 0.761 | 0.802 |
| ruCodeEval | 0.377/0.569/0.622 | 0.082/0.168/0.226 | 0.025/0.071/0.098 | 0.545/0.703/0.732 | 0.139/0.280/0.396 |
Notable strengths of Meno-Lite-0.1:
- ruCodeEval: A dramatic jump from the base T-lite-it-1.0 (0.082→0.377 pass@1), exceeding even the 70B Llama-3.3 (0.139). This suggests that improved language skills transfer to code generation ability.
- MathLogicQA (0.666): Best among all 7B models and ahead of the 70B Llama-3.3 (0.566), reflecting strong verbal reasoning.
- MultiQ (0.536/0.403): The multi-hop QA task — central to RAG — shows clear gains over both the base Qwen2.5-7B (0.425/0.296) and T-lite-it-1.0 (0.523/0.398).
- CheGeKa (0.346/0.293): While this is a world-knowledge task, Meno-Lite still outperforms Qwen2.5-7B (0.077/0.048) by a large margin, suggesting that even factual recall benefits from better language comprehension.
LIBRA Benchmark (Long-Context Understanding)
LIBRA (Long Input Benchmark for Russian Analysis) evaluates models on 21 tasks across four complexity groups, with context lengths from 4K to 128K tokens. We evaluated Meno-Lite-0.1 alongside 7B–14B peers on all four groups.
Simple Information Retrieval
The Passkey and PasskeyWithLibrusec tasks measure a model's ability to locate a short code hidden inside a long distractor text — a prerequisite for any context-grounded application.
| Model | Size | Task | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 |
| Meno-Lite-0.1 | 7B | PasskeyLibrusec | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.95 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | PasskeyLibrusec | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.95 |
| T-lite-it-2.1 | 7B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.90 |
| T-lite-it-2.1 | 7B | PasskeyLibrusec | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.93 |
| AvitoTech/avibe | 8B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.90 |
| AvitoTech/avibe | 8B | PasskeyLibrusec | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.94 |
| T-lite-it-1.0 | 7B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.58 |
| T-lite-it-1.0 | 7B | PasskeyLibrusec | 1.00 | 1.00 | 1.00 | 1.00 | 0.86 | 0.45 |
| Qwen2.5-7B-Instruct | 7B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.58 |
| Qwen2.5-7B-Instruct | 7B | PasskeyLibrusec | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.66 |
| Qwen2.5-14B-Instruct | 14B | Passkey | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.58 |
| Qwen2.5-14B-Instruct | 14B | PasskeyLibrusec | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.63 |
All models achieve perfect scores up to 64K. The differentiation happens at 128K, where Meno-Lite-0.1 (0.98/0.95) leads all compared models, followed by its parent RuadaptQwen2.5-7B-Lite-Beta (0.98/0.95). Notably, both Meno-Lite and its parent substantially outperform the original Qwen2.5-7B-Instruct (0.58/0.66) and even the twice-larger Qwen2.5-14B-Instruct (0.58/0.63) at this extreme length — a direct benefit of the Ruadapt tokenizer and continued pretraining pipeline.
Multi-hop Question Answering
Multi-hop QA is the task group most directly relevant to RAG, as it requires the model to locate and combine evidence scattered across a long context. Below we show per-length scores to reveal how each model degrades as context grows.
ruBABILongQA1 (single supporting fact):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.72 | 0.65 | 0.67 | 0.61 | 0.51 | 0.36 |
| T-lite-it-1.0 | 7B | 0.74 | 0.71 | 0.71 | 0.64 | 0.52 | 0.34 |
| T-lite-it-2.1 | 7B | 0.77 | 0.76 | 0.63 | 0.58 | 0.52 | 0.44 |
| AvitoTech/avibe | 8B | 0.66 | 0.62 | 0.49 | 0.44 | 0.25 | 0.18 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.74 | 0.64 | 0.60 | 0.54 | 0.38 | 0.29 |
| Qwen2.5-7B-Instruct | 7B | 0.65 | 0.68 | 0.65 | 0.78 | 0.60 | 0.48 |
| Qwen2.5-14B-Instruct | 14B | 0.90 | 0.89 | 0.80 | 0.77 | 0.55 | 0.38 |
ruBABILongQA2 (two supporting facts):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.34 | 0.20 | 0.19 | 0.16 | 0.07 | 0.02 |
| T-lite-it-1.0 | 7B | 0.29 | 0.13 | 0.07 | 0.06 | 0.10 | 0.04 |
| T-lite-it-2.1 | 7B | 0.44 | 0.42 | 0.33 | 0.24 | 0.15 | 0.05 |
| AvitoTech/avibe | 8B | 0.47 | 0.35 | 0.34 | 0.31 | 0.24 | 0.09 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.32 | 0.18 | 0.07 | 0.08 | 0.05 | 0.05 |
| Qwen2.5-7B-Instruct | 7B | 0.33 | 0.19 | 0.15 | 0.13 | 0.11 | 0.06 |
| Qwen2.5-14B-Instruct | 14B | 0.61 | 0.55 | 0.41 | 0.34 | 0.22 | 0.11 |
ruBABILongQA3 (three supporting facts):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.24 | 0.17 | 0.17 | 0.15 | 0.08 | 0.11 |
| T-lite-it-1.0 | 7B | 0.20 | 0.10 | 0.10 | 0.16 | 0.08 | 0.07 |
| T-lite-it-2.1 | 7B | 0.23 | 0.25 | 0.20 | 0.12 | 0.08 | 0.11 |
| AvitoTech/avibe | 8B | 0.20 | 0.23 | 0.15 | 0.12 | 0.09 | 0.11 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.19 | 0.16 | 0.09 | 0.07 | 0.06 | 0.02 |
| Qwen2.5-7B-Instruct | 7B | 0.24 | 0.24 | 0.18 | 0.24 | 0.16 | 0.15 |
| Qwen2.5-14B-Instruct | 14B | 0.37 | 0.35 | 0.33 | 0.25 | 0.18 | 0.20 |
ruBABILongQA4 (two argument relations):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.56 | 0.58 | 0.57 | 0.52 | 0.33 | 0.22 |
| T-lite-it-1.0 | 7B | 0.56 | 0.62 | 0.59 | 0.62 | 0.37 | 0.15 |
| T-lite-it-2.1 | 7B | 0.60 | 0.65 | 0.59 | 0.61 | 0.43 | 0.27 |
| AvitoTech/avibe | 8B | 0.57 | 0.54 | 0.52 | 0.49 | 0.35 | 0.25 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.59 | 0.58 | 0.55 | 0.47 | 0.27 | 0.22 |
| Qwen2.5-7B-Instruct | 7B | 0.62 | 0.53 | 0.58 | 0.52 | 0.25 | 0.08 |
| Qwen2.5-14B-Instruct | 14B | 0.66 | 0.69 | 0.66 | 0.64 | 0.34 | 0.15 |
ruBABILongQA5 (three argument relations):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.80 | 0.73 | 0.80 | 0.74 | 0.63 | 0.54 |
| T-lite-it-1.0 | 7B | 0.81 | 0.79 | 0.77 | 0.78 | 0.78 | 0.54 |
| T-lite-it-2.1 | 7B | 0.79 | 0.73 | 0.76 | 0.70 | 0.71 | 0.69 |
| AvitoTech/avibe | 8B | 0.73 | 0.74 | 0.76 | 0.73 | 0.66 | 0.59 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.79 | 0.76 | 0.75 | 0.74 | 0.65 | 0.49 |
| Qwen2.5-7B-Instruct | 7B | 0.81 | 0.79 | 0.76 | 0.80 | 0.82 | 0.69 |
| Qwen2.5-14B-Instruct | 14B | 0.86 | 0.78 | 0.82 | 0.82 | 0.78 | 0.64 |
LibrusecMHQA and ru2WikiMultihopQA (open-domain multi-hop):
| Model | Size | LibrusecMHQA 8K | ru2Wiki 8K | ru2Wiki 16K | ru2Wiki 32K |
|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.484 | 0.388 | 0.422 | 0.244 |
| T-lite-it-1.0 | 7B | 0.456 | 0.367 | 0.352 | 0.228 |
| T-lite-it-2.1 | 7B | 0.453 | 0.469 | 0.375 | 0.268 |
| AvitoTech/avibe | 8B | 0.440 | 0.347 | 0.336 | 0.228 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.432 | 0.367 | 0.367 | 0.252 |
| Qwen2.5-7B-Instruct | 7B | 0.419 | 0.245 | 0.305 | 0.228 |
| Qwen2.5-14B-Instruct | 14B | 0.484 | 0.531 | 0.391 | 0.285 |
LongContextMultiQ (multi-document multi-hop):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.045 | 0.320 | 0.075 | 0.000 | 0.005 | 0.180 |
| T-lite-it-1.0 | 7B | 0.055 | 0.270 | 0.060 | 0.000 | 0.005 | 0.000 |
| T-lite-it-2.1 | 7B | 0.065 | 0.335 | 0.040 | 0.000 | 0.005 | 0.000 |
| AvitoTech/avibe | 8B | 0.060 | 0.360 | 0.085 | 0.070 | 0.005 | 0.000 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.050 | 0.300 | 0.165 | 0.000 | 0.005 | 0.150 |
| Qwen2.5-7B-Instruct | 7B | 0.055 | 0.260 | 0.035 | 0.000 | 0.005 | 0.000 |
| Qwen2.5-14B-Instruct | 14B | 0.075 | 0.345 | 0.090 | 0.010 | 0.005 | 0.000 |
Key observations on multi-hop QA:
- Consistent improvement over the ancestral chain. Across nearly all ruBABILong tasks and context lengths, Meno-Lite-0.1 outperforms its direct parent RuadaptQwen2.5-7B-Lite-Beta, confirming that the CPT+SFT pipeline adds genuine multi-hop reasoning capability rather than just superficial instruction following.
- Best-in-class on real-world multi-hop QA among 7B models. On LibrusecMHQA (0.484) and ru2WikiMultihopQA at 8K (0.388) and 16K (0.422), Meno-Lite-0.1 leads all 7B-class models. On LibrusecMHQA it ties with the twice-larger Qwen2.5-14B-Instruct. These tasks — based on Russian literary texts and Wikipedia — are closest to real RAG scenarios.
- Unique long-context multi-hop ability. On LongContextMultiQ at 128K, Meno-Lite-0.1 is the only model besides its parent to achieve a non-trivial score (0.18 vs. 0.00 for avibe, T-lite-it-1.0, T-lite-it-2.1, Qwen2.5-7B, and Qwen2.5-14B). This suggests that the CPT data selection strategy preserved long-range coherence even as skills were sharpened.
- Different degradation profiles. On the synthetic ruBABILong tasks, avibe shows a steeper degradation curve on QA1 (single fact: 0.66→0.18 from 4K to 128K) compared to Meno-Lite (0.72→0.36), indicating that Meno-Lite retains better focus as context grows for single-fact retrieval. Conversely, avibe is stronger on QA2 (two facts) across all lengths, reflecting a complementary strength in multi-fact aggregation.
Question Answering and Multiple Choice
These tasks evaluate reading comprehension on Russian literary, scientific, and factual texts.
ruQuALITY (reading comprehension over long narratives):
| Model | Size | 8K | 16K |
|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.805 | 0.720 |
| T-lite-it-1.0 | 7B | 0.854 | 0.770 |
| T-lite-it-2.1 | 7B | 0.805 | 0.727 |
| AvitoTech/avibe | 8B | 0.732 | 0.677 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.683 | 0.671 |
| Qwen2.5-7B-Instruct | 7B | 0.732 | 0.634 |
| Qwen2.5-14B-Instruct | 14B | 0.732 | 0.702 |
MatreshkaYesNo (yes/no comprehension questions):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.836 | 0.793 | 0.757 | 0.773 | 0.690 | 0.603 |
| T-lite-it-1.0 | 7B | 0.920 | 0.770 | 0.753 | 0.590 | 0.530 | 0.530 |
| T-lite-it-2.1 | 7B | 0.836 | 0.843 | 0.797 | 0.807 | 0.757 | 0.577 |
| AvitoTech/avibe | 8B | 0.809 | 0.817 | 0.797 | 0.777 | 0.770 | 0.633 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.809 | 0.827 | 0.790 | 0.747 | 0.687 | 0.593 |
| Qwen2.5-7B-Instruct | 7B | 0.860 | 0.727 | 0.737 | 0.620 | 0.587 | 0.567 |
| Qwen2.5-14B-Instruct | 14B | 0.876 | 0.827 | 0.773 | 0.763 | 0.697 | 0.637 |
MatreshkaNames (entity name extraction from narratives):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.453 | 0.413 | 0.320 | 0.207 | 0.113 | 0.040 |
| T-lite-it-1.0 | 7B | 0.467 | 0.507 | 0.400 | 0.253 | 0.060 | 0.060 |
| T-lite-it-2.1 | 7B | 0.647 | 0.520 | 0.453 | 0.467 | 0.360 | 0.193 |
| AvitoTech/avibe | 8B | 0.647 | 0.513 | 0.473 | 0.400 | 0.273 | 0.153 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.433 | 0.347 | 0.300 | 0.153 | 0.047 | 0.013 |
| Qwen2.5-7B-Instruct | 7B | 0.480 | 0.460 | 0.373 | 0.327 | 0.167 | 0.113 |
| Qwen2.5-14B-Instruct | 14B | 0.647 | 0.547 | 0.500 | 0.420 | 0.227 | 0.193 |
ruSciAbstractRetrieval (scientific abstract retrieval):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.986 | 0.848 | 0.757 | 0.476 | 0.185 | 0.085 |
| T-lite-it-1.0 | 7B | 0.981 | 0.910 | 0.805 | 0.538 | 0.215 | 0.125 |
| T-lite-it-2.1 | 7B | 0.986 | 0.952 | 0.895 | 0.810 | 0.230 | 0.140 |
| AvitoTech/avibe | 8B | 0.981 | 0.933 | 0.933 | 0.738 | 0.375 | 0.165 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.986 | 0.862 | 0.738 | 0.476 | 0.185 | 0.095 |
| Qwen2.5-7B-Instruct | 7B | 0.976 | 0.905 | 0.867 | 0.710 | 0.330 | 0.160 |
| Qwen2.5-14B-Instruct | 14B | 0.986 | 0.919 | 0.929 | 0.790 | 0.430 | 0.195 |
LibrusecHistory (historical literary QA):
| Model | Size | 8K | 16K | 32K | 64K |
|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.906 | 0.938 | 0.813 | 0.875 |
| T-lite-it-1.0 | 7B | 0.906 | 0.906 | 0.875 | 0.844 |
| T-lite-it-2.1 | 7B | 1.000 | 1.000 | 0.938 | 0.875 |
| AvitoTech/avibe | 8B | 1.000 | 0.938 | 1.000 | 0.938 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.844 | 0.875 | 0.844 | 0.781 |
| Qwen2.5-7B-Instruct | 7B | 0.938 | 0.906 | 0.906 | 0.750 |
| Qwen2.5-14B-Instruct | 14B | 0.938 | 0.938 | 0.906 | 0.781 |
Key observations on QA tasks:
- ruQuALITY: best among 7B models except T-lite-it-1.0. Meno-Lite-0.1 scores 0.805 at 8K and 0.720 at 16K, surpassing avibe (0.732/0.677), Qwen2.5-7B (0.732/0.634), and even Qwen2.5-14B (0.732/0.702 at 16K). The improvement over the direct parent RuadaptQwen2.5-7B-Lite-Beta is substantial (+0.12 at 8K, +0.05 at 16K).
- MatreshkaYesNo: exceptional stability across context lengths. While T-lite-it-1.0 starts strong at 4K (0.920) but drops to 0.530 at 128K, Meno-Lite-0.1 degrades much more gracefully (0.836→0.603), maintaining a clear advantage at 32K (0.773 vs. 0.590).
- LibrusecHistory: strong gains over ancestors. Meno-Lite-0.1 outperforms its parent RuadaptQwen2.5-7B-Lite-Beta at every context length and matches or exceeds the 14B Qwen2.5 at 16K and 64K.
Complex Reasoning and Mathematical Problems
ruQasper (QA over scientific papers):
| Model | Size | 8K | 16K | 32K |
|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.542 | 0.538 | 0.360 |
| T-lite-it-1.0 | 7B | 0.478 | 0.508 | 0.321 |
| T-lite-it-2.1 | 7B | 0.476 | 0.543 | 0.299 |
| AvitoTech/avibe | 8B | 0.507 | 0.524 | 0.388 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.465 | 0.468 | 0.347 |
| Qwen2.5-7B-Instruct | 7B | 0.454 | 0.436 | 0.346 |
| Qwen2.5-14B-Instruct | 14B | 0.507 | 0.519 | 0.411 |
ruGSM100 (grade-school math in Russian):
| Model | Size | 16K |
|---|---|---|
| AvitoTech/avibe | 8B | 0.31 |
| Qwen2.5-14B-Instruct | 14B | 0.29 |
| Meno-Lite-0.1 | 7B | 0.26 |
| T-lite-it-2.1 | 7B | 0.19 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.19 |
| T-lite-it-1.0 | 7B | 0.16 |
| Qwen2.5-7B-Instruct | 7B | 0.15 |
ruSciPassageCount (counting relevant passages):
| Model | Size | 4K | 8K | 16K | 32K | 64K | 128K |
|---|---|---|---|---|---|---|---|
| Meno-Lite-0.1 | 7B | 0.43 | 0.12 | 0.05 | 0.05 | 0.00 | 0.02 |
| T-lite-it-1.0 | 7B | 0.47 | 0.15 | 0.06 | 0.03 | 0.02 | 0.02 |
| T-lite-it-2.1 | 7B | 0.32 | 0.14 | 0.00 | 0.01 | 0.00 | 0.02 |
| AvitoTech/avibe | 8B | 0.36 | 0.08 | 0.08 | 0.03 | 0.01 | 0.02 |
| RuadaptQwen2.5-7B-Lite-Beta | 7B | 0.50 | 0.21 | 0.13 | 0.01 | 0.01 | 0.02 |
| Qwen2.5-7B-Instruct | 7B | 0.32 | 0.10 | 0.03 | 0.04 | 0.01 | 0.02 |
| Qwen2.5-14B-Instruct | 14B | 0.68 | 0.27 | 0.12 | 0.04 | 0.00 | 0.03 |
Key observations on complex reasoning:
- ruQasper: best 7B model at 8K. Meno-Lite-0.1 achieves 0.542 at 8K — the highest score among all tested models including the 14B Qwen2.5 (0.507). This task, which requires QA over full scientific papers, is a direct test of document comprehension skill.
- ruGSM100: strongest 7B model for math. Despite no math-specific training, Meno-Lite (0.26) substantially outperforms all 7B peers (T-lite-it-1.0: 0.16, Qwen2.5-7B: 0.15) and approaches the 14B model (0.29). This supports the hypothesis that improved language comprehension transfers to mathematical reasoning when problems are stated in natural language.
LIBRA Summary
The per-length analysis reveals a nuanced picture:
| Strength | Meno-Lite-0.1 advantage |
|---|---|
| Real-world multi-hop QA (LibrusecMHQA, ru2WikiMHQA) | Best 7B model; ties with Qwen2.5-14B on LibrusecMHQA |
| Document QA (ruQasper at 8K, ruQuALITY) | Highest ruQasper score among all models at 8K; second-best ruQuALITY among 7B models |
| Comprehension stability (MatreshkaYesNo at 32K) | More graceful degradation than T-lite-it-1.0; competitive with models twice its size |
| Ultra-long retrieval (Passkey at 128K) | 0.98 vs. 0.58 for stock Qwen2.5-7B/14B |
| Long-context multi-hop (LongContextMultiQ at 128K) | Only model (besides its parent) with non-zero score at 128K |
| Math reasoning from context (ruGSM100) | Best among all 7B models |
The model's profile is well-suited for production RAG pipelines, where contexts typically fall in the 4K–16K range — precisely where Meno-Lite-0.1 shows its strongest performance relative to peers.
Summary of Benchmark Findings
The evaluation on MERA and LIBRA, combined with tokenizer analysis, paints a coherent picture of Meno-Lite-0.1's strengths and trade-offs.
1. MERA (general Russian LLM evaluation): Meno-Lite-0.1 achieves 0.555 — the highest score among all 7B-class models on the leaderboard, matching 70B-class Llama-3.3-70B-Instruct (0.555) and Meta-Llama-3.1-70B-Instruct (0.554). The gap over the base Qwen2.5-7B-Instruct (+0.073) and the direct parent RuadaptQwen2.5-7B-Lite-Beta is substantial, confirming that the CPT+SFT pipeline adds genuine capability rather than superficial instruction tuning.
2. LIBRA (long-context understanding): The per-length analysis reveals that Meno-Lite-0.1 excels in the 4K–16K context range most relevant to production RAG:
- Best 7B model on real-world multi-hop QA — leading on LibrusecMHQA (0.484, tied with Qwen2.5-14B) and ru2WikiMultihopQA at 8K and 16K.
- Highest ruQasper score at 8K among all tested models (0.542), including the 14B Qwen2.5-Instruct (0.507), demonstrating strong scientific document comprehension.
- Best ruQuALITY at 16K among 7B models (0.720), surpassing even the 14B model (0.702).
- Near-perfect passkey retrieval up to 128K (0.98), far ahead of stock Qwen2.5-7B/14B (0.58).
- Unique non-zero score on LongContextMultiQ at 128K (0.18) — the only model besides its parent to solve any multi-hop questions at this extreme length.
- At very long contexts (64K–128K) on certain tasks (MatreshkaNames, ruSciAbstractRetrieval, ruBABILongQA2), models with larger effective context training — such as avibe and T-lite-it-2.1 — retain quality better, reflecting a known trade-off between mid-range precision and ultra-long-range retention.
3. Tokenizer efficiency: With 3.77 Russian characters per token (vs. 2.57 for stock Qwen2.5), Meno-Lite-0.1 generates Russian text ~47% more efficiently, directly reducing inference latency and serving costs.
4. The hypothesis in practice. These results provide empirical support for the language-knowledge vs. world-knowledge decomposition. By concentrating training signal on language skills — comprehension, extraction, multi-hop reasoning, instruction following — a 7B model can match or exceed systems 2×–10× its size on context-grounded tasks. The model's relative weaknesses align with prediction: world-knowledge-heavy tasks (CheGeKa, MaMuRAMu) and pure long-range retrieval tasks beyond 32K remain areas where larger models or specialized long-context training hold an advantage. For the vast majority of RAG, document QA, and agentic deployments — where relevant information is supplied in context and typical chunk sizes fall in the 4K–16K range — Meno-Lite-0.1 offers a compelling combination of quality, speed, and cost efficiency.
Training Details
Training Data
The training data was curated to maximize language-skill acquisition while minimizing reliance on world-knowledge memorization.
Continued Pretraining (CPT) data:
| Source | Language | Description |
|---|---|---|
| FineWeb-Edu (sampled) | EN | High-quality educational web text |
| RuLM subset | RU | Russian web text selected for maximal FineWeb-Edu similarity using gte-multilingual-base embeddings |
| RU FinePDFs-edu | RU | Educational PDF documents in Russian |
| RuREBus (Dialogue'20) | RU | Unlabeled text corpus from the RuREBus shared task |
Supervised Fine-Tuning (SFT) data:
| Source | Language | Description |
|---|---|---|
| NEREL → instructions | RU | Named entity recognition corpus converted to instruction format, plus GPT-4o-mini–generated synthetic entity normalization and definitions |
| LightRAG query logs | RU | GPT-4o–generated queries over Habr articles and the NSU website |
| MultiHopRAG | EN | Multi-hop question answering training dialogs |
| MTRAGEval | EN | Multi-turn RAG evaluation training dialogs |
Training Procedure
Stage 1 — Continued Pretraining (CPT): The model was further pretrained on a balanced mix of Russian and English educational, legal, and scientific-technical texts. The Russian subset was specifically selected to match the quality distribution of FineWeb-Edu, ensuring that the model absorbs high-quality linguistic patterns rather than noisy web crawls.
Stage 2 — Supervised Fine-Tuning (SFT): The SFT stage used a custom instruction set designed to reinforce language skills (extraction, normalization, summarization, multi-hop QA) rather than inject world knowledge. This is the critical distinction: conventional SFT datasets often teach models to recall facts, whereas our instructions teach models to use context.
Training Hyperparameters
- Training regime: bf16 mixed precision
Bias, Risks, and Limitations
- Hallucination risk: Like all autoregressive LLMs, Meno-Lite-0.1 can generate plausible-sounding but factually incorrect text, especially when relevant context is not provided in the prompt. This is by design — the model was optimized for context-grounded tasks.
- World knowledge gaps: The model deliberately trades world-knowledge capacity for language skills. It should not be used as a standalone knowledge base.
- Language coverage: While the model retains good English capabilities, it has been primarily validated on Russian and English. Performance on other languages supported by the Qwen2.5 backbone is untested.
- Training data biases: The model inherits biases present in its pretraining corpora (FineWeb-Edu, RuLM, Habr) and in the GPT-4o/GPT-4o-mini generations used for synthetic SFT data.
- Context window: Although the model handles contexts up to 128K tokens in passkey tasks, complex reasoning performance degrades at very long contexts (>32K), consistent with other models in this size class.
Recommendations
- Always provide relevant context in the prompt for best results.
- For factual accuracy, use the model within a RAG pipeline with a reliable retrieval system.
- Validate model outputs in high-stakes domains (legal, medical, financial).
Technical Specifications
Model Architecture and Objective
- Architecture: Qwen2.5 (causal decoder-only transformer)
- Parameters: ~7B
- Context window: Up to 32,768 tokens (validated up to 128K on retrieval tasks)
- Vocabulary: Extended tokenizer with improved Russian coverage (~3.77 chars/token for Russian)
- Objective: Next-token prediction (autoregressive language modeling)
Compute Infrastructure
Hardware
Training was conducted on NVIDIA GPU infrastructure (10x NVIDIA Tesla A100 80 Gb).
Software
- Hugging Face Transformers
- vLLM (for inference benchmarking)
- lm-evaluation-harness (for MERA and LIBRA evaluation)
Citation
If you use Meno-Lite-0.1 in your research, please cite:
BibTeX:
@misc{bondarenko2025menolite,
title={Meno-Lite-0.1: A 7B Language Model Optimized for Russian RAG Pipelines},
author={Ivan Bondarenko},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/bond005/meno-lite-0.1}
}
Glossary
- RAG (Retrieval-Augmented Generation): A paradigm where relevant documents are retrieved from an external knowledge base and injected into the LLM's context, enabling accurate answers without relying on parametric memory.
- CPT (Continued Pretraining): An additional pretraining phase applied to an already-trained model, typically on domain-specific or quality-filtered data.
- SFT (Supervised Fine-Tuning): Training on instruction–response pairs to align the model with desired behaviors.
- Multi-hop QA: Question answering that requires synthesizing information from multiple passages or reasoning steps.
- MERA: The most comprehensive Russian-language benchmark for evaluating LLMs, comprising 23 tasks covering world knowledge, logic, causality, AI ethics, and more. The overall leaderboard score is computed over 15 closed-test tasks.
- LIBRA: Long Input Benchmark for Russian Analysis — 21 tasks for evaluating long-context understanding in Russian, spanning context lengths from 4K to 128K tokens.
Model Card Authors
Ivan Bondarenko (@bond005), Novosibirsk State University
Model Card Contact
For questions, feedback, or collaboration inquiries, please open an issue on the model repository or contact Ivan Bondarenko via Hugging Face.
- Downloads last month
- 154
Model tree for bond005/meno-lite-0.1
Base model
Qwen/Qwen2.5-7B