---
base_model: janhq/Jan-v3-4B-base-instruct
library_name: gguf
pipeline_tag: text-generation
tags:
  - gguf
  - quantized
  - llama-cpp
---

# Jan-v3-4B-base-instruct - GGUF

This is a quantized GGUF version of [janhq/Jan-v3-4B-base-instruct](https://huggingface.co/janhq/Jan-v3-4B-base-instruct) created using [llama.cpp](https://github.com/ggerganov/llama.cpp).

## Available Quantizations

| Filename | Quant Type | Description |
|----------|------------|-------------|
| Jan-v3-4B-base-instruct.Q2_K.gguf | Q2_K | Smallest, significant quality loss |
| Jan-v3-4B-base-instruct.Q3_K_S.gguf | Q3_K_S | Very small, low quality |
| Jan-v3-4B-base-instruct.Q3_K_M.gguf | Q3_K_M | Very small, medium quality |
| Jan-v3-4B-base-instruct.Q3_K_L.gguf | Q3_K_L | Small, better quality than Q3_K_M |
| Jan-v3-4B-base-instruct.Q4_0.gguf | Q4_0 | Small, legacy format |
| Jan-v3-4B-base-instruct.Q4_1.gguf | Q4_1 | Small, legacy format with better accuracy |
| Jan-v3-4B-base-instruct.Q4_K_S.gguf | Q4_K_S | Small, good quality |
| Jan-v3-4B-base-instruct.Q4_K_M.gguf | Q4_K_M | Medium, balanced quality - recommended |
| Jan-v3-4B-base-instruct.Q5_0.gguf | Q5_0 | Medium, legacy format |
| Jan-v3-4B-base-instruct.Q5_1.gguf | Q5_1 | Medium, legacy format with better accuracy |
| Jan-v3-4B-base-instruct.Q5_K_S.gguf | Q5_K_S | Medium, good quality |
| Jan-v3-4B-base-instruct.Q5_K_M.gguf | Q5_K_M | Medium, high quality - recommended |
| Jan-v3-4B-base-instruct.Q6_K.gguf | Q6_K | Large, very high quality |
| Jan-v3-4B-base-instruct.Q8_0.gguf | Q8_0 | Large, near-lossless quality |


## Usage

### With llama.cpp

```bash
./llama-cli -m Jan-v3-4B-base-instruct.Q4_K_M.gguf -p "Your prompt here"
```

### With Ollama

```bash
ollama run hf.co/aashish1904/Jan-v3-4B-base-instruct-GGUF
```

## Original Model

- **Source**: [janhq/Jan-v3-4B-base-instruct](https://huggingface.co/janhq/Jan-v3-4B-base-instruct)
- **Quantized by**: GGUF Quantizer Space

---

## Original Model Card

# Jan-v3-4B-base-instruct: a 4B baseline model for fine-tuning

[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/janhq/jan) 
[![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://opensource.org/licenses/Apache-2.0)
[![Jan App](https://img.shields.io/badge/Powered%20by-Jan%20App-purple?style=flat&logo=android)](https://jan.ai/) 

![image](https://cdn-uploads.huggingface.co/production/uploads/655e3b59d5c0d3db5359ca3c/A65FII_r3rAi9wZtK5P_v.png)

## Overview

**Jan-v3-4B-base-instruct** is a 4B-parameter model obtained via post-training distillation from a larger teacher, transferring capabilities while preserving general-purpose performance on standard benchmarks. The result is a compact, ownable base that is straightforward to fine-tune, broadly applicable and minimizing the usual capacity–capability trade-offs. 

Building on this base, **Jan-Code**, a code-tuned variant, **will be released soon.**

## Model Overview

This repo contains the BF16 version of **Jan-v3-4B-base-instruct**, which has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 4B in total
- Number of Layers: 36
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
- Context Length: **262,144 natively**. 

**Intended Use**

* A better small base for downstream work: improved instruction following out of the box, strong starting point for fine-tuning, and effective lightweight coding assistance.

## Performance

![image](https://cdn-uploads.huggingface.co/production/uploads/655e3b59d5c0d3db5359ca3c/IGuQdKZ0_IGIwL0Wkcasi.png)

## Quick Start

### Integration with Jan Apps

Jan-v3 demo is hosted on **Jan Browser** at **[chat.jan.ai](https://chat.jan.ai/)**. It is also optimized for direct integration with [Jan Desktop](https://jan.ai/), select the model in the app to start using it.


### Local Deployment

**Using vLLM:**
```bash
vllm serve janhq/Jan-v3-4B-base-instruct \
    --host 0.0.0.0 \
    --port 1234 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes 
    
```

**Using llama.cpp:**
```bash
llama-server --model Jan-v3-4B-base-instruct-Q8_0.gguf \
    --host 0.0.0.0 \
    --port 1234 \
    --jinja \
    --no-context-shift
```

### Recommended Parameters
For optimal performance in agentic and general tasks, we recommend the following inference parameters:
```yaml
temperature: 0.7
top_p: 0.8
top_k: 20
```

## 🤝 Community & Support

- **Discussions**: [Hugging Face Community](https://huggingface.co/janhq/Jan-v2-VL-8B/discussions) 
- **Jan App**: Learn more about the Jan App at [jan.ai](https://jan.ai/)

## 📄 Citation
```bibtex
Updated Soon
```