Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ Released under Apache 2.0 License with a 256K context window, Jamba2 Mini is des
|
|
| 21 |
# Evaluation Results
|
| 22 |
Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
# Training and Evaluation Details
|
| 27 |
Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.
|
|
|
|
| 21 |
# Evaluation Results
|
| 22 |
Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
|
| 23 |
|
| 24 |
+
<img src="https://huggingface.co/ai21labs/AI21-Jamba2-Mini/resolve/main/assets/Enterprise%20Reliability%20Benchmarks%20for%20Mini%20Models.png" width="900"/>
|
| 25 |
|
| 26 |
# Training and Evaluation Details
|
| 27 |
Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.
|