WoahMiri commited on
Commit
f401194
·
verified ·
1 Parent(s): f6bf8f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ Released under Apache 2.0 License with a 256K context window, Jamba2 Mini is des
21
  # Evaluation Results
22
  Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
23
 
24
- [Benchmark comparison graphs will be displayed here]
25
 
26
  # Training and Evaluation Details
27
  Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.
 
21
  # Evaluation Results
22
  Jamba2 Mini leads on instruction following and grounding metrics, demonstrating exceptional steerability and context faithfulness. In blind side-by-side evaluations on 100 real-world enterprise prompts, the model achieved statistically significant wins on output quality and factuality compared to Ministral3 14B.
23
 
24
+ <img src="https://huggingface.co/ai21labs/AI21-Jamba2-Mini/resolve/main/assets/Enterprise%20Reliability%20Benchmarks%20for%20Mini%20Models.png" width="900"/>
25
 
26
  # Training and Evaluation Details
27
  Jamba2 models were developed using a comprehensive post-training pipeline starting from Jamba 1.5 pre-training. The models underwent mid-training on 500B carefully curated tokens with increased representation of math, code, high-quality web data, and long documents. A state passing phase optimized the Mamba layers for effective context length generalization. Training continued with cold start supervised fine-tuning to establish instruction-following and reasoning capabilities, followed by DPO optimization.