Instructions to use L0uu/gemma4-e4b-etafakna-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use L0uu/gemma4-e4b-etafakna-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="L0uu/gemma4-e4b-etafakna-gguf", filename="gemma-4-E4B-it.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use L0uu/gemma4-e4b-etafakna-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
Use Docker
docker model run hf.co/L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use L0uu/gemma4-e4b-etafakna-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "L0uu/gemma4-e4b-etafakna-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "L0uu/gemma4-e4b-etafakna-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
- Ollama
How to use L0uu/gemma4-e4b-etafakna-gguf with Ollama:
ollama run hf.co/L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
- Unsloth Studio new
How to use L0uu/gemma4-e4b-etafakna-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for L0uu/gemma4-e4b-etafakna-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for L0uu/gemma4-e4b-etafakna-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for L0uu/gemma4-e4b-etafakna-gguf to start chatting
- Docker Model Runner
How to use L0uu/gemma4-e4b-etafakna-gguf with Docker Model Runner:
docker model run hf.co/L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
- Lemonade
How to use L0uu/gemma4-e4b-etafakna-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
Run and chat with the model
lemonade run user.gemma4-e4b-etafakna-gguf-Q4_K_M
List all available models
lemonade list
Gemma 4 E4B โ E-Tafakna: Tunisian Legal Assistant (GGUF)
A fine-tuned Gemma 4 E4B model specialized in Tunisian law, built for the E-Tafakna platform.
The model is designed primarily for Retrieval-Augmented Generation (RAG) legal assistants and answers legal questions in French based on Tunisian legal codes and legislation.
โ ๏ธ This model provides legal information only โ it does not constitute professional legal advice.
โ ๏ธ This model is not intended to replace legal professionals and should not be used without a retrieval pipeline in production legal systems.
Available Files
| File | Quant | Size | Description |
|---|---|---|---|
gemma4-etafakna-q4_k_m.gguf |
Q4_K_M | ~5.3 GB | Recommended for most users โ good balance of quality and speed |
gemma4-etafakna-f16.gguf |
F16 | ~15 GB | Full precision โ use for requantization or maximum quality |
Recommended Usage with RAG
This model is designed to work best as part of a Retrieval-Augmented Generation (RAG) pipeline rather than as a standalone legal model.
Because Tunisian law requires precise article references, contextual interpretation, and up-to-date legal sources, the model relies on retrieved legal documents to generate accurate and grounded answers.
Without retrieval, the model may:
- hallucinate legal references,
- omit important legal context,
- provide incomplete answers,
- or answer outside the scope of Tunisian legislation.
For optimal performance, use the model with:
- a vector database (such as Qdrant),
- dense and/or sparse embeddings,
- document chunking and reranking,
- and legal source attribution.
The intended workflow is:
User Question
โ
Legal Document Retrieval (RAG)
โ
Relevant Tunisian Legal Articles
โ
Gemma 4 E4B โ E-Tafakna
โ
Grounded Legal Response with Citations
The model was specifically fine-tuned on examples where:
- relevant legal excerpts were provided in the prompt,
- answers were grounded in retrieved documents,
- and legal article citations were expected in every response.
โ ๏ธ The quality of the response depends heavily on the quality of the retrieved legal context.
How to Run
Ollama (easiest)
ollama run hf.co/L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
With a custom system prompt
Create a file called Modelfile:
FROM hf.co/L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
SYSTEM """
Vous รชtes un assistant juridique de la plateforme E-Tafakna, spรฉcialisรฉ dans le droit tunisien.
Rรฉpondez uniquement ร partir des documents fournis.
Si une question dรฉpasse le contenu des documents, indiquez-le clairement et recommandez de consulter un professionnel.
Structurez votre rรฉponse par problรฉmatique juridique.
Citez toujours les articles et sources.
Ceci ne constitue pas un avis juridique professionnel.
"""
Then:
ollama create etafakna -f Modelfile
ollama run etafakna
llama.cpp
llama-server -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
# or
llama-cli -hf L0uu/gemma4-e4b-etafakna-gguf:Q4_K_M
llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="L0uu/gemma4-e4b-etafakna-gguf",
filename="gemma4-etafakna-q4_k_m.gguf",
)
llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "Vous รชtes un assistant juridique spรฉcialisรฉ dans le droit tunisien."
},
{
"role": "user",
"content": "Quelles sont les conditions pour qu'un contrat soit valable en droit tunisien?"
}
]
)
Example Usage
User:
Quelles sont les conditions pour qu'un contrat soit valable en droit tunisien?
Documents pertinents:
[1] Source: Article 2 du Code des obligations et des contrats
Les รฉlรฉments nรฉcessaires pour la validitรฉ des obligations qui dรฉrivent d'une dรฉclaration de volontรฉ sont: la capacitรฉ de s'obliger, une dรฉclaration valable de volontรฉ portant sur les รฉlรฉments essentiels de l'obligation, un objet certain pouvant former objet d'obligation, une cause licite de s'obliger.
Model:
The model will provide a structured response citing Article 2 of the Code des obligations et des contrats, breaking down the four conditions:
- capacity,
- valid declaration of will,
- certain object,
- and lawful cause.
Training Details
| Parameter | Value |
|---|---|
| Base model | unsloth/gemma-4-E4B-it |
| Method | LoRA (rank 16, alpha 32, dropout 0) |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Precision | fp16 (no base model quantization during training) |
| Dataset | 1,619 Tunisian law Q&A pairs in French |
| Epochs | 3 |
| Effective batch size | 8 (batch 1 ร gradient accumulation 8) |
| Learning rate | 2e-4 (cosine schedule, 5% warmup) |
| Max sequence length | 2,048 tokens |
| Optimizer | AdamW 8-bit |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Training loss masking | Responses only (using train_on_responses_only) |
| GPU | NVIDIA Tesla V100S (32 GB) |
| Framework | Unsloth + TRL SFTTrainer |
Dataset
The training data consists of 1,619 cleaned Q&A examples covering Tunisian legal topics in French.
Each example follows a multi-turn chat format with:
- a system prompt,
- a user question accompanied by relevant legal document excerpts,
- and a model response structured by legal issue with article citations.
GGUF Export Pipeline
- LoRA adapter merged with base model into fp16 using Unsloth
save_pretrained_merged - Converted to GGUF F16 using
llama.cpp/convert_hf_to_gguf.py - Quantized from F16 โ Q4_K_M using
llama-quantize
Known Conversion Notes
RoPE type: The converter reported
Unknown RoPE type: proportionaland defaulted toNONE. This is expected for Gemma 4's architecture and should not impact inference in recent llama.cpp versions.Duplicated GGUF keys: Warnings about overwritten keys (
context_length,head_count,key_length, etc.) are normal. Gemma 4 uses heterogeneous layer configurations where some layers have different attention dimensions.
Related Models
- LoRA adapter: L0uu/gemma4-e4b-etafakna-lora
- Base model: google/gemma-4-E4B-it
License
This model inherits the Gemma license from Google.
- Downloads last month
- 682
4-bit
16-bit
Model tree for L0uu/gemma4-e4b-etafakna-gguf
Base model
google/gemma-4-E4B