Instructions to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ENOSYS/Octen-Embedding-4B-750-v1-GGUF") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ENOSYS/Octen-Embedding-4B-750-v1-GGUF", dtype="auto") - llama-cpp-python
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ENOSYS/Octen-Embedding-4B-750-v1-GGUF", filename="Octen-Embedding-4B-BPW10.0.gguf", )
llm.create_chat_completion( messages = "{\n \"source_sentence\": \"That is a happy person\",\n \"sentences\": [\n \"That is a happy dog\",\n \"That is a very happy person\",\n \"Today is a sunny day\"\n ]\n}" ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF # Run inference directly in the terminal: llama-cli -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF # Run inference directly in the terminal: ./llama-cli -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Use Docker
docker model run hf.co/ENOSYS/Octen-Embedding-4B-750-v1-GGUF
- LM Studio
- Jan
- Ollama
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Ollama:
ollama run hf.co/ENOSYS/Octen-Embedding-4B-750-v1-GGUF
- Unsloth Studio new
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Octen-Embedding-4B-750-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ENOSYS/Octen-Embedding-4B-750-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ENOSYS/Octen-Embedding-4B-750-v1-GGUF to start chatting
- Pi new
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ENOSYS/Octen-Embedding-4B-750-v1-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Run Hermes
hermes
- Docker Model Runner
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Docker Model Runner:
docker model run hf.co/ENOSYS/Octen-Embedding-4B-750-v1-GGUF
- Lemonade
How to use ENOSYS/Octen-Embedding-4B-750-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ENOSYS/Octen-Embedding-4B-750-v1-GGUF
Run and chat with the model
lemonade run user.Octen-Embedding-4B-750-v1-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-4B
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: text_en, text_ru.
- Using dataset chunks: 750.
- Small set of patches added.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|
| 3.50 | 88.91% | 1.059237 ± 0.005791 | 17.814284 ± 1.729713 | 1.168223 ± 0.005542 | 42.036110 | 25.354170 | -1.391 ± 0.037 % | 17.793 ± 0.064 % |
| 4.00 | 91.97% | 1.124329 ± 0.005392 | 37.389267 ± 1.678855 | 0.809469 ± 0.004837 | 44.454895 | 24.103703 | -0.450 ± 0.030 % | 14.205 ± 0.061 % |
| 4.50 | 94.56% | 1.312286 ± 0.005436 | 93.912806 ± 2.084847 | 0.449165 ± 0.003776 | 36.050694 | 23.000446 | -0.433 ± 0.022 % | 10.308 ± 0.056 % |
| 5.00 | 95.14% | 1.287066 ± 0.005078 | 86.328447 ± 1.959110 | 0.353164 ± 0.003435 | 40.124615 | 22.546082 | -0.282 ± 0.019 % | 8.891 ± 0.056 % |
| 5.50 | 95.95% | 1.185050 ± 0.004268 | 55.649627 ± 1.544628 | 0.234975 ± 0.002845 | 35.043465 | 20.616793 | 0.127 ± 0.015 % | 7.105 ± 0.054 % |
| 6.00 | 96.31% | 1.181371 ± 0.004077 | 54.543325 ± 1.493333 | 0.187329 ± 0.002574 | 34.484653 | 20.085846 | 0.138 ± 0.013 % | 6.297 ± 0.056 % |
| 6.50 | 96.58% | 1.192068 ± 0.004003 | 57.760093 ± 1.508316 | 0.156484 ± 0.002392 | 32.430752 | 19.101099 | 0.152 ± 0.012 % | 5.546 ± 0.056 % |
| 7.00 | 96.60% | 1.212270 ± 0.004081 | 63.835350 ± 1.579181 | 0.146085 ± 0.002389 | 33.821136 | 19.701403 | 0.111 ± 0.011 % | 5.325 ± 0.057 % |
| 7.50 | 96.63% | 1.208265 ± 0.004054 | 62.630910 ± 1.564522 | 0.139075 ± 0.002314 | 36.313965 | 19.207874 | 0.110 ± 0.011 % | 5.203 ± 0.057 % |
| 8.00 | 96.68% | 1.209469 ± 0.004038 | 62.993184 ± 1.565838 | 0.134810 ± 0.002289 | 34.888683 | 19.164957 | 0.137 ± 0.011 % | 5.067 ± 0.058 % |
| 8.50 | 96.75% | 1.212194 ± 0.004012 | 63.812622 ± 1.568517 | 0.125588 ± 0.002229 | 36.403027 | 18.989708 | 0.131 ± 0.010 % | 4.896 ± 0.057 % |
| 9.00 | 96.76% | 1.204192 ± 0.003980 | 61.406117 ± 1.541008 | 0.123881 ± 0.002209 | 36.165089 | 18.446331 | 0.154 ± 0.010 % | 4.867 ± 0.058 % |
| 9.50 | 96.74% | 1.206242 ± 0.003997 | 62.022543 ± 1.550053 | 0.123900 ± 0.002210 | 36.027378 | 18.774128 | 0.146 ± 0.010 % | 4.872 ± 0.057 % |
| 10.00 | 96.75% | 1.203023 ± 0.003979 | 61.054587 ± 1.537726 | 0.123887 ± 0.002199 | 36.569180 | 18.522949 | 0.153 ± 0.010 % | 4.819 ± 0.057 % |
| 10.50 | 96.74% | 1.210538 ± 0.004012 | 63.314626 ± 1.564145 | 0.122872 ± 0.002194 | 36.219513 | 18.574490 | 0.138 ± 0.010 % | 4.842 ± 0.058 % |
| 11.00 | 96.74% | 1.213551 ± 0.004025 | 64.220636 ± 1.575624 | 0.122960 ± 0.002205 | 37.248238 | 18.881664 | 0.125 ± 0.010 % | 4.840 ± 0.057 % |
| 11.50 | 96.75% | 1.209483 ± 0.004002 | 62.997316 ± 1.559427 | 0.123296 ± 0.002196 | 36.439632 | 18.905373 | 0.137 ± 0.010 % | 4.850 ± 0.057 % |
| 12.00 | 96.75% | 1.207165 ± 0.003990 | 62.300073 ± 1.550363 | 0.123031 ± 0.002189 | 36.319935 | 18.708921 | 0.141 ± 0.010 % | 4.828 ± 0.057 % |
| 12.50 | 96.73% | 1.203487 ± 0.003989 | 61.194047 ± 1.540763 | 0.122924 ± 0.002186 | 36.546139 | 18.402393 | 0.157 ± 0.010 % | 4.856 ± 0.058 % |
| 13.00 | 96.73% | 1.207328 ± 0.004005 | 62.349166 ± 1.554780 | 0.122282 ± 0.002186 | 34.890934 | 18.439240 | 0.147 ± 0.010 % | 4.846 ± 0.058 % |
| 13.50 | 96.73% | 1.201897 ± 0.003983 | 60.715976 ± 1.534983 | 0.123082 ± 0.002199 | 35.561474 | 18.604710 | 0.150 ± 0.010 % | 4.833 ± 0.058 % |
| 14.00 | 96.76% | 1.206603 ± 0.003988 | 62.131097 ± 1.548757 | 0.122074 ± 0.002183 | 36.555859 | 18.393436 | 0.148 ± 0.010 % | 4.865 ± 0.059 % |
| 14.50 | 96.77% | 1.207201 ± 0.003984 | 62.311014 ± 1.549158 | 0.120715 ± 0.002176 | 37.773457 | 18.576935 | 0.131 ± 0.010 % | 4.763 ± 0.057 % |
| 15.00 | 96.75% | 1.207969 ± 0.004000 | 62.541896 ± 1.555455 | 0.123216 ± 0.002222 | 36.987923 | 19.083401 | 0.150 ± 0.010 % | 4.777 ± 0.057 % |
- Downloads last month
- 176
We're not able to determine the quantization variants.