DeepZirel-V2

An experimental fine-tune of deepseek-ai/DeepSeek-V2-Lite-Chat using novel training approaches aimed at improving older model architectures.

Model Details

  • Base Model: deepseek-ai/DeepSeek-V2-Lite-Chat
  • Fine-tuned by: Daemontatox
  • Purpose: Architecture improvement research
  • Training: Experimental data and methodology targeting legacy architecture enhancement
  • Language: Multilingual

Training Approach

This model explores new training techniques designed to enhance the performance of older model architectures. The experimental approach focuses on:

  • Novel fine-tuning strategies for legacy architectures
  • Custom training data optimization
  • Architecture-specific improvements

Inference

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Daemontatox/DeepZirel-V2",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Daemontatox/DeepZirel-V2", trust_remote_code=True)

messages = [
    {"role": "user", "content": "Hello, how are you?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="Daemontatox/DeepZirel-V2",
    tensor_parallel_size=2,
    dtype="auto",
    trust_remote_code=True
)

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

prompts = ["Hello, how are you?"]
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

vLLM OpenAI-Compatible Server

vllm serve Daemontatox/DeepZirel-V2 \
    --tensor-parallel-size 2 \
    --dtype auto \
    --trust-remote-code \
    --max-model-len 4096
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="token-abc123"
)

response = client.chat.completions.create(
    model="Daemontatox/DeepZirel-V2",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
    temperature=0.7,
    max_tokens=512
)

print(response.choices[0].message.content)

TensorRT-LLM

# Convert to TensorRT-LLM format
python convert_checkpoint.py \
    --model_dir Daemontatox/DeepZirel-V2 \
    --output_dir ./trt_ckpt \
    --dtype float16 \
    --tp_size 2

# Build TensorRT engine
trtllm-build \
    --checkpoint_dir ./trt_ckpt \
    --output_dir ./trt_engine \
    --gemm_plugin float16 \
    --max_batch_size 8 \
    --max_input_len 2048 \
    --max_output_len 512
from tensorrt_llm import LLM

llm = LLM(model="./trt_engine")

prompts = ["Hello, how are you?"]
outputs = llm.generate(prompts, max_new_tokens=512)

for output in outputs:
    print(output.text)

Modular MAX

# Serve with MAX Engine
max serve Daemontatox/DeepZirel-V2 \
    --port 8000 \
    --tensor-parallel-size 2
from max import engine

# Load model with MAX
model = engine.InferenceSession(
    "Daemontatox/DeepZirel-V2",
    device="cuda",
    tensor_parallel=2
)

# Run inference
prompt = "Hello, how are you?"
output = model.generate(
    prompt,
    max_tokens=512,
    temperature=0.7,
    top_p=0.9
)

print(output.text)
# Using MAX with Python API
from max.serve import serve
from max.pipelines import pipeline

# Create pipeline
pipe = pipeline(
    "text-generation",
    model="Daemontatox/DeepZirel-V2",
    device="cuda",
    tensor_parallel=2
)

# Generate
result = pipe(
    "Hello, how are you?",
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9
)

print(result[0]["generated_text"])

Limitations

This is an experimental model using novel training approaches on legacy architectures. Results may vary and should be thoroughly tested before production deployment.

Downloads last month
29
Safetensors
Model size
16B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Daemontatox/DeepZirel-V2

Finetuned
(4)
this model
Quantizations
2 models