ai21labs
/

AI21-Jamba2-Mini

Text Generation

Model card Files Files and versions

WoahMiri commited on Jan 8

Commit

f401194

·

verified ·

1 Parent(s): f6bf8f8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ Released under Apache 2.0 License with a 256K context window, Jamba2 Mini is des
 # Evaluation Results
 Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
-[Benchmark comparison graphs will be displayed here]
 # Training and Evaluation Details
 Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.

 # Evaluation Results
 Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
+<img src="https://huggingface.co/ai21labs/AI21-Jamba2-Mini/resolve/main/assets/Enterprise%20Reliability%20Benchmarks%20for%20Mini%20Models.png" width="900"/>
 # Training and Evaluation Details
 Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.