Instructions to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF",
	filename="Llama-3.2-Taiwan-Legal-3B-Instruct.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

SGLang

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF to start chatting

Pi new

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

A newer version of this model is available: lianghsun/Llama-3.2-Taiwan-Legal-3B-Instruct

QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF

This is quantized version of lianghsun/Llama-3.2-Taiwan-Legal-3B-Instruct created using llama.cpp

Original Model Card

Model Card for Model lianghsun/Llama-3.2-Taiwan-Legal-3B-Instruct

基於 meta-llama/Llama-3.2-3B-Instruct 模型，透過中華民國台灣法律條文及判決書等相關資料集進行微調。

Model Update History

Update Date	Model Version	Key Changes
2024-10-17	v1.1.0	Experimental fine-tuning on v1.0.0 with added legal code data from the Republic of China (Taiwan)
2024-10-10	v1.0.0	Full model training completed, but missing legal code data for the Republic of China (Taiwan)
2024-09-27	v0.1.0	Model v0.1.0 released, but training was interrupted after 3 epochs due to lack of compute resources

Model Details

Model Description

基於 meta-llama/Llama-3.2-3B-Instruct 模型，此微調過程使用了來自中華民國台灣的法律條文與相關判決書資料集，以提升模型在法律領域的專業知識與應用能力。這些資料集涵蓋了法律條文的結構、判決書的格式，法庭上常見的法律語言與術語，並包含了部分法律資料科學任務的應用，使模型能夠更準確地理解和處理與台灣法律體系相關的問題。經過這些微調，模型將能夠更好地為法律專業人士提供幫助，並在台灣法制框架內提供更精準的回應與建議。

Developed by: Huang Liang Hsun
Model type: LlamaForCausalLM
Language(s) (NLP): 主要處理繁體中文（zh-tw），針對中華民國台灣的法律用語與判決書進行微調。
License: llama3.2
Finetuned from model: meta-llama/Llama-3.2-3B-Instruct

Model Sources

Repository: lianghsun/Llama-3.2-Taiwan-Legal-3B-Instruct
Demo: (WIP)

Uses

Direct Use

此模型可以直接用於理解和生成繁體中文法律文本，適合需要處理台灣法律相關問題的應用場景。模型預設的指令和回應能夠有效提供法律資訊、釐清法律條文、並生成符合法律專業的回應。其直接使用範圍包括但不限於法律資訊查詢、法律文本摘要、和基本的法條對話。

Downstream Use

經過微調後，該模型可用於更具體的法律任務，如自動判決書分析、法律實體識別（NER）、法規編號轉換，以及法律合規審查輔助。此模型可以無縫集成至法律數據科學應用或法律技術（LegalTech）系統中，幫助法律專業人士或企業提升工作效率。

Out-of-Scope Use

該模型並不適用於非法律相關領域的生成任務，且不應用於進行可能涉及誤導或錯誤的法律建議，尤其是在未經專業審查的情況下。避免將模型用於未經授權或非法用途，如生成具爭議性或具偏見的法律建議。

Bias, Risks, and Limitations

模型在生成法律條文和判決書內容時，可能會生成虛構或不存在的法條或判決書內容，這是模型的內在限制之一。使用者在參考這些資料時，應謹慎檢查生成的內容，並避免將模型輸出視為法律依據。建議在實際應用中，將模型生成的結果與可靠的法律見解和來源進行比對，確保準確性、合法性和適用性。

Recommendations

此模型雖然經過法律文本的微調，但在於法律文本的數量及基礎模型為 SLM，模型能力仍有極限，使用者應注意以下風險與限制：

偏見風險：模型可能會反映其訓練資料中的潛在偏見。由於法律文本的特定性，模型可能更熟悉某些法規、條文或判決案例，而在其他領域表現較弱。特別是在處理不常見的法律問題或未被訓練過的新法規時，模型的輸出可能會帶有偏見。
技術限制：雖然模型能夠處理大部分的法律文本，但對於結構極其複雜或語言模棱兩可的法律條文，模型可能無法產生精確的回答。使用者應避免完全依賴模型的輸出，尤其在法律決策過程中，建議進行額外的專業檢查。
法律責任：模型並非專業法律顧問，因此其生成的回應不應被視為正確的法律建議。使用者應確保在理性且專業背景下進行模型的應用，並避免在關鍵決策中過度依賴模型。
誤用風險：不當使用模型進行錯誤或誤導性的法律建議，可能對個人或企業造成負面影響。使用者應謹慎應用模型於合規或法律相關任務中，並保持對其輸出的檢視及校正。

為了減少這些風險，建議使用者在應用模型輸出時進行二次檢查，特別是在涉及法律決策的情境中。本模型現階段為提供法律科技領域進行大語言模型研究，並非取代專業法律工作者之專業建議。

How to Get Started with the Model

Using vLLM

要使用 vLLM Docker image 來啟動此模型，您可以按照以下操作：

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model lianghsun/Llama-3.2-Taiwan-Legal-3B-Instruct

Training Details

Training Data (for v1.1.0)

Training procedure

Preprocessing

無。基本上我們並沒有針對 meta-llama/Llama-3.2-3B-Instruct 做任何的預訓練或更改其模型架構；Tokenizer 也是採用原生所提供的。

Training hyperparameters (for v1.1.0)

The following hyperparameters were used during training:

learning_rate: 0.0004378 (value at epoch 3.9)
train_batch_size: 12
eval_batch_size: Not specified
seed: Not specified
distributed_type: single-GPU
num_devices: 1
gradient_accumulation_steps: 512
total_train_batch_size: 6144 (train_batch_size * gradient_accumulation_steps)
optimizer: AdamW
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 15
grad_norm: 0.0899 (value at epoch 3.9)
global_step: 645

Speeds, Sizes, Times (for v1.1.0)

Duration: 92h 27m 40s
Train runtime: 92h 27m 40s
Train samples per second: Not directly available
Train steps per second: Approximately 0.002 steps/s
Total training FLOPs: Not directly provided
Train loss: 0.0512 (at epoch 3.9)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Note: ..(WIP)..

Factors

Note: ..(WIP)..

Metrics

Note: ..(WIP)..

Results

Note: ..(WIP)..

Summary

Note: ..(WIP)..

Model Examination

法條回覆

Note: ..(WIP)..

判決書內容

Note: ..(WIP)..

法律 NLP 任務

Note: ..(WIP)..

Environmental Impact (for v1.1.0)

Hardware Type: 1 x NVIDIA H100 NVL 80GB
Hours used: 92h 27m 40s
Cloud Provider: N/A
Compute Region: N/A
Carbon Emitted: N/A

Technical Specifications

Model Architecture and Objective

本模型基於 meta-llama/Llama-3.2-3B-Instruct，使用自回歸 Transformer 架構進行語言建模。該模型的主要目標是提升對台灣法律文本的理解與生成能力，尤其是針對判決書、法條的專業處理與應用。透過專門設計的法律文本集進行微調，模型能更精確地回答法律問題並提供相關建議。

Compute Infrastructure

Hardware (for v1.1.0)

1 x NVIDIA H100 NVL 80GB

Software

微調過程使用了 hiyouga/LLaMA-Factory 框架進行訓練。

Citation

無。

Glossary

無。

More Information

算力

儘管我們已準備了許多關於中華民國台灣法律領域的資料集，但由於算力資源有限，無法將所有資料集進行完整訓練（是的，我們並沒有將全部資料集都進行訓練，僅取出被認為最基礎的法律文本），導致模型尚未達到最佳表現。因此，目前的 checkpoint 是基於有限資源的版本。如果您有意願贊助算力，歡迎與我聯繫。我相信，若能將更多已準備但尚未納入訓練的法律語料進行微調，該模型將能達到繁體中文法律領域的最佳表現。

持績更新

此模型如有進一步資源，將會不定期更新。

Model Card Authors

Huang Liang Hsun

Model Card Contact

Huang Liang Hsun

Framework versions

Transformers 4.45.2
Pytorch 2.4.1+cu121
Datasets 2.21.0
Tokenizers 0.20.0

Downloads last month: 333

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

(458)

this model

QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF

Original Model Card

Model Card for Model lianghsun/Llama-3.2-Taiwan-Legal-3B-Instruct

Model Update History

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Using vLLM

Training Details

Training Data (for v1.1.0)

Training procedure

Preprocessing

Training hyperparameters (for v1.1.0)

Speeds, Sizes, Times (for v1.1.0)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination

法條回覆

判決書內容

法律 NLP 任務

Environmental Impact (for v1.1.0)

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Hardware (for v1.1.0)

Software

Citation

Glossary

More Information

算力

持績更新

Model Card Authors

Model Card Contact

Framework versions

Model tree for QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF

Datasets used to train QuantFactory/Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF