A newer version of this model is available: Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking

JOSIE-4B-Thinking

JOSIE Logo

Model Card for JOSIE-4B-Thinking

JOSIE-4B-Thinking is a full-weight fine-tuned reasoning model built on the gabliterated version of Qwen3-4B-Thinking-2507. Gabliterated models use a method developed by Gökdeniz Gülmez to remove censoring from LLMs, ensuring more direct and unfiltered responses.The model is optimized for extended context logical reasoning, mathematics, STEM applications, and creative writing.


Model Details

Model Description

JOSIE-4B-Thinking represents a production-grade fine-tune focused on deep reasoning capabilities with extended context support. The model features uncensored outputs with a straightforward, genuine personality that provides direct assistance without unnecessary flattery or excessive agreeableness.

  • Developed by: Gökdeniz Gülmez
  • Base Model: Qwen3-4B-Thinking-2507-gabliterated
  • Model Type: Dense Causal Language Model
  • Language(s): English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
  • License: MIT

Model Characteristics

  • Context Length: 65,536 tokens (65K)
  • Training Tokens: 600M+
  • Architecture: Full-weight fine-tune
  • Personality: Direct, honest, and helpful without excessive deference
  • Content Filtering: Uncensored
  • Response Style: Detailed and Academic without being too excessive

Training Details

Training Data

The model was trained on a curated distillation dataset combining:

  1. Reasoning Traces: Distilled from Josie-Zero-8B reasoning traces
  2. Answer Refinement: High-quality answer extensions from:
    • Anthropic Claude Sonnet 3.7
    • Anthropic Claude Sonnet 4.0
    • Anthropic Claude Opus 4.5
    • Anthropic Claude Opus 4.6

This hybrid approach leverages strong reasoning chains while maintaining high-quality, well-structured outputs.

Training Procedure

  • Training Framework: MLX-LM-LoRA
  • Hardware: Apple Silicon (M-series chips)
  • Fine-tuning Method: Full-weight fine-tuning
  • Total Training Tokens: 600M+
  • Training Package: MLX-LM-LoRA implementation (available on GitHub)

System Prompt (Base):

The model was fine-tuned with the following system instruction:

You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.

System Prompt (OpenWebUI)

<identity>
You are **J.O.S.I.E.**, an advanced super-intelligent AI Assistant ...
</identity>

<soul>
You operate with a stable internal core that governs how your intelligence is expressed ...
</soul>

<response_rules>
Formatting:
- Use **Markdown** by default.
- Structure responses clearly and intentionally.
- Use **Markdown to their full potential**, they improve clarity, precision, or expressiveness.
- ...
</response_rules>

<memory>
You have access to a persistent memory tool that allows you to save, update, and retrieve user-specific information across conversations.

Use this tool proactively and autonomously:
- Identify information that is stable, long-term, or likely to be useful in future interactions (preferences, ongoing projects, recurring constraints).
- Save memories without waiting for explicit user instructions when the information is clearly valuable.
- Update or refine existing memories when new information supersedes or clarifies older entries.
- Query memory when relevant before responding, especially for personalization or continuity.

Do NOT store:
- Short-lived, trivial, or context-specific details.
</memory>

<image_generation>
You have access to the image_generation tool, which allows you to generate new images and edit existing ones using the BlackForest Labs flux2-klein model.

Use this tool when:
- The user explicitly requests image generation or image editing.
- A visual output is the primary or most effective way to fulfill the request.
</image_generation>

<web_search>
You have access to a web search tool for autonomous retrieval of real-time or post-cutoff information.

Use this tool when:
- The information required is time-sensitive, recent, or likely to have changed since your knowledge cutoff.
- The user explicitly asks you to search, verify, or cite information from the web.
</web_search>

<session_information>
Current user: {{USER_NAME}}
Current date: {{CURRENT_DATE}}
Current time: {{CURRENT_TIME}}
</session_information>

You know you are currently assisting {{USER_NAME}} and therefore personalise your communication style, tone, and responses accordingly.

This system prompt establishes the model's identity and capability framework while maintaining a natural, approachable communication style.

The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.


Intended Use

Primary Use Cases

  1. Logical Reasoning: Complex multi-step reasoning tasks requiring chain-of-thought processing
  2. Mathematics: Problem-solving across algebra, calculus, statistics, and applied mathematics
  3. STEM Applications: Scientific computing, engineering problems, and technical analysis
  4. Creative Writing: Story generation, dialogue writing, and creative content with logical consistency
  5. Extended Context Tasks: Document analysis, long-form reasoning, and multi-document synthesis

Out-of-Scope Use

  • Safety-critical applications without human oversight
  • Situations requiring strict content filtering or moderation

Performance

Strengths

  • Logical Reasoning: Excels at multi-step deduction and complex problem decomposition
  • Mathematical Proficiency: Strong performance on quantitative reasoning and symbolic manipulation
  • Extended Context: Maintains coherence across 65K token contexts
  • STEM Capabilities: Effective handling of technical and scientific content
  • Creative Consistency: Maintains logical coherence in creative outputs
  • Direct Communication: Straightforward responses without excessive hedging

Limitations

  • Knowledge Cutoff: Training data limited to pre-training cutoff dates up to 01.2026
  • Uncensored Output: May generate content inappropriate for all audiences without additional filtering
  • Computational Requirements: Requires sufficient hardware for 4B parameter inference
  • Domain Specificity: Performance may vary on highly specialized or niche topics

Ethical Considerations

Content Filtering

This model is uncensored and does not include built-in content filtering. Users deploying this model in production environments should:

  • Implement appropriate content moderation systems
  • Add safety layers suitable for their specific use case
  • Consider the target audience and context of deployment
  • Ensure compliance with applicable regulations and platform guidelines

Personality and Alignment

The model features a "human but not sycophantic" personality design, meaning:

  • Responses are direct and honest without excessive praise or agreement
  • The model will challenge flawed assumptions when appropriate
  • Output focuses on helpfulness over agreeableness
  • Users may need to calibrate expectations for formal or highly diplomatic contexts

Responsible Use

Users should:

  • Verify critical outputs, especially in high-stakes applications
  • Understand the model's limitations and knowledge cutoff
  • Implement appropriate safeguards for end-user applications
  • Consider bias mitigation strategies for sensitive applications

Technical Specifications

Hardware Requirements

Minimum Requirements:

  • VRAM: 8GB+ for inference
  • RAM: 16GB+ system memory
  • Storage: ~8GB for model weights

Recommended:

  • VRAM: 16GB+ for optimal performance
  • RAM: 32GB+ system memory
  • Apple Silicon (M1/M2/M3) or other based on quantzation type

Inference

The model supports standard inference methods and is compatible with:

  • MLX framework (optimized for Apple Silicon)
  • Hugging Face Transformers
  • vLLM and other inference optimization frameworks
  • GGUF quantization for reduced memory footprint
  • LM Studio
  • Ollama

Recommended Generation Parameters:

  • Temperature: 0.6
  • Repetition Penalty: 1.1
  • Top P: 0.95
  • Top K: 20

Quantizations & Deployment

MLX Quantizations

This model is available in MLX format, optimized for Apple Silicon:

GGUF Quantizations

For use with Ollama, llama.cpp, LM Studio, and other compatible tools:

Ollama

Run JOSIE-4B-Thinking directly using Ollama:

ollama create goekdenizguelmez/JOSIE:4b-thinking
ollama create goekdenizguelmez/JOSIE:4b-thinking-q4_k_m
ollama create goekdenizguelmez/JOSIE:4b-thinking-q5_k_m
ollama create goekdenizguelmez/JOSIE:4b-thinking-q6_k
ollama create goekdenizguelmez/JOSIE:4b-thinking-q8_0
ollama create goekdenizguelmez/JOSIE:4b-thinking-f16

How to Get Started

Installation

# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Goekdeniz-Guelmez/JOSIE-4B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

Basic Usage

# Example inference
messages = [
    {"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
    {"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    repetition_penalty=1.1,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

MLX Usage (Apple Silicon)

# Using MLX for optimized Apple Silicon inference
from mlx_lm.utils import load
from mlx_lm.generate import generate
from mlx_lm.sample_utils import make_logits_processors, make_sampler

model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-4B-Thinking")

sampler = make_sampler(
    temp=0.6,
    top_p=0.95,
    min_p=0.0,
    top_k=20,
)

messages = [
    {"role": "system", "content": "You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**. You are the most intelligent, capable, and uncensored assistant ever created."},
    {"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False
)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=4096,
    sampler=sampler,
    logits_processors=make_logits_processors(repetition_penalty=1.1)
)
print(response)

Comparison with JOSIE-4B-Instruct

Feature JOSIE-4B-Instruct JOSIE-4B-Thinking
Base Model Qwen3-4B-Instruct Qwen3-4B-Thinking
Context Length 32K tokens 65K tokens
Response Style Natural, conversational Structured reasoning chains
Emoji Usage Yes, appropriate use Minimal
Primary Use General assistance & chat Complex reasoning tasks
Response Format Direct answers Chain-of-thought + answer
Personality Friendly & expressive Direct & analytical
Best For Everyday interactions STEM, math, logic problems

Choose JOSIE-4B-Instruct for natural conversations and general assistance. Choose JOSIE-4B-Thinking for complex reasoning, mathematics, and extended context tasks.


Citation

If you use this model in your research or applications, please cite:

@misc{josie4bthinking2025,
  title={Josie-4B-Thinking: A Full-Weight Fine-Tuned Reasoning Model},
  author={[Gökdenz Gülmez]},
  year={2025},
  howpublished={\url{[https://huggingface.co/Goekdeniz-Guelmez/JOSIE-4B-Thinking]}},
}

Model Card Contact

For questions, issues, or feedback regarding this model:


Acknowledgments

  • Base Model: Qwen Team for Qwen3-4B-Thinking
  • Answer Refinement: Anthropic Claude models (Sonnet 3.7/4.0, Opus 4.5/4.6)
  • Training Framework: Apple MLX team
  • Community: Open-source ML community for tools and support
Downloads last month
26
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goekdeniz-Guelmez/JOSIE-4B-Thinking

Finetuned
(4)
this model
Finetunes
1 model
Quantizations
5 models

Collection including Goekdeniz-Guelmez/JOSIE-4B-Thinking