Instructions to use onnx-community/HalleluBERT_base-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use onnx-community/HalleluBERT_base-ONNX with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('fill-mask', 'onnx-community/HalleluBERT_base-ONNX'); - Fairseq
How to use onnx-community/HalleluBERT_base-ONNX with Fairseq:
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub models, cfg, task = load_model_ensemble_and_task_from_hf_hub( "onnx-community/HalleluBERT_base-ONNX" ) - Notebooks
- Google Colab
- Kaggle
HalleluBERT_base (ONNX)
This is an ONNX version of HalleluBERT/HalleluBERT_base. It was automatically converted and uploaded using this Hugging Face Space.
Usage with Transformers.js
See the pipeline documentation for fill-mask: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.FillMaskPipeline
HalleluBERT: Let every token that has meaning bear its weight
HalleluBERT is a family of RoBERTa-based Modern Hebrew language models pre-trained from scratch on ~49.1 GB of deduplicated Hebrew web text (HeDC4 / HeRo corpus) and Hebrew Wikipedia. The models aim to provide the first fully converged Hebrew RoBERTa encoder family, including a large variant, and to push state-of-the-art performance on core Hebrew benchmarks.
We release two variants:
HalleluBERT-base: 126M parameters (fp32)HalleluBERT-large: 357M parameters (fp32)
Model Details
| Detail | HalleluBERT-base | HalleluBERT-large |
|---|---|---|
| Architecture | RoBERTa-base | RoBERTa-large |
| Parameters | ~126M | ~357M |
| Tokenizer | GPT-2 style byte-level BPE (52,009 vocab) | Same |
| Pretraining corpus | HeDC4 (mC4 + OSCAR22) + Hebrew Wikipedia (~49.1 GB) | Same |
| Objective | Masked Language Modeling | Same |
| Training steps | 100k updates, global batch size 8k | Same |
| LR schedule | 10k warmup + polynomial decay | Same |
| Peak learning rate | 0.0004 | 0.00015 |
| Training time | ~30.2 hours (TPUv4-128 pod) | ~6.0 days (TPUv4-128 pod) |
| Precision | fp32 | fp32 |
| Framework | fairseq | fairseq |
Downstream Evaluation
We evaluate HalleluBERT on three Hebrew benchmarks (following the HeRo suite, restricted to NER + sentiment):
- NER (BMC split 1): micro-F1
- NER (NEMOΒ², token-level): micro-F1
- Sentiment (SMCD, deduplicated): macro-F1
We select the best configuration by validation performance and report the best score out of 10 runs on the official test sets.
π§ͺ Evaluation Results
Legend: Bold = best, underline = second-best within each model size group.
| Model | BMC (micro-F1) | NEMO (micro-F1) | AVG NER | SMCD (macro-F1) | AVG (all) |
|---|---|---|---|---|---|
| Large models | |||||
| HalleluBERT_large | 93.23 | 88.70 | 90.96 | 84.91 | 88.95 |
| XLM-RoBERTa_large | 92.31 | 86.41 | 89.36 | 83.74 | 87.49 |
| Base models | |||||
| HeBERT | 89.33 | 76.16 | 82.74 | 82.64 | 82.71 |
| AlephBERT | 91.36 | 81.52 | 86.44 | 83.66 | 85.51 |
| HeRo | 92.00 | 83.35 | 87.68 | 80.95 | 85.43 |
| HalleluBERT_base | 93.33 | 87.06 | 90.20 | 83.09 | 87.83 |
| mmBERT_small | 83.96 | 71.95 | 77.96 | 81.89 | 79.27 |
| AlephBERT-Gimmel | 92.46 | 85.86 | 89.16 | 82.66 | 86.99 |
| XLM-RoBERTa_base | 86.32 | 79.37 | 82.84 | 82.07 | 82.59 |
| mmBERT_base | 84.61 | 77.97 | 81.29 | 83.55 | 82.04 |
Fairseq Checkpoint
Get the fairseq checkpoint here.
Citation
If you use HalleluBERT in your research, please cite the corresponding paper (replace with your final bib entry if you already have one):
@misc{scheibleschmitt2025hallelubertlettokenmeaning,
title={HalleluBERT: Let every token that has meaning bear its weight},
author={Raphael Scheible-Schmitt},
year={2025},
eprint={2510.21372},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.21372},
}
π License
MIT License
- Downloads last month
- 25
Model tree for onnx-community/HalleluBERT_base-ONNX
Base model
HalleluBERT/HalleluBERT_base