Instructions to use LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b")
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b

SGLang

How to use LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b with Docker Model Runner:
```
docker model run hf.co/LeroyDyer/Mistral_WhiteHatCoder_Base_Instruct_Moe_3x7b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Mixture of Experts 10b

Mixture of Experts enable models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining. gate network or router, that determines which tokens are sent to which expert. For example, in the image below, the token “More” is sent to the second expert, and the token "Parameters” is sent to the first network. As we’ll explore later, we can send a token to more than one expert. How to route a token to an expert is one of the big decisions when working with MoEs - the router is composed of learned parameters and is pretrained at the same time as the rest of the network.

Base Model

mistralai/Mistral-7B-Instruct-v0.2

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.

Experts :

Code codellama/CodeLlama-7b-hf

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the base 7B version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.