Instructions to use text-generation-inference/Mixtral-8x7B-Instruct-v0.1-medusa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use text-generation-inference/Mixtral-8x7B-Instruct-v0.1-medusa with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("text-generation-inference/Mixtral-8x7B-Instruct-v0.1-medusa", dtype="auto") - Notebooks
- Google Colab
- Kaggle
What about latencies
#3
by LorenzoCevolaniAXA - opened
do you have a benchmark for the full mixtral on 48xlarge vs the medusa modified mixtral awq here on the 12xlarge?