Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ pinned: false
|
|
| 9 |
|
| 10 |
**Disclaimer**:
|
| 11 |
|
| 12 |
-
VPTQ-community is a open source community to reproduced models on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [github](https://github.com/microsoft/vptq)
|
| 13 |
|
| 14 |
It is intended only for experimental purposes.
|
| 15 |
|
|
@@ -36,31 +36,7 @@ Scaling model size significantly challenges the deployment and inference of Larg
|
|
| 36 |
|
| 37 |
Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
|
| 38 |
|
| 39 |
-
|
| 40 |
-
## Installation
|
| 41 |
-
|
| 42 |
-
### Dependencies
|
| 43 |
-
|
| 44 |
-
- python 3.10+
|
| 45 |
-
- torch >= 2.2.0
|
| 46 |
-
- transformers >= 4.44.0
|
| 47 |
-
- Accelerate >= 0.33.0
|
| 48 |
-
- latest datasets
|
| 49 |
-
|
| 50 |
-
### Installation
|
| 51 |
-
|
| 52 |
-
> Preparation steps that might be needed: Set up CUDA PATH.
|
| 53 |
-
```bash
|
| 54 |
-
export PATH=/usr/local/cuda-12/bin/:$PATH # set dependent on your environment
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
*Will Take several minutes to compile CUDA kernels*
|
| 58 |
-
```python
|
| 59 |
-
pip install git+https://github.com/microsoft/VPTQ.git --no-build-isolation
|
| 60 |
-
```
|
| 61 |
-
|
| 62 |
-
## Evaluation
|
| 63 |
-
### Models from Open Source Community
|
| 64 |
|
| 65 |
⚠️ The repository only provides a method of model quantization algorithm.
|
| 66 |
|
|
@@ -92,45 +68,3 @@ pip install git+https://github.com/microsoft/VPTQ.git --no-build-isolation
|
|
| 92 |
| Qwen 2.5 14B Instruct | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-14b-instruct-without-finetune-66f827f83c7ffa7931b8376c) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-256-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k256-256-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-0-woft) [2 bits (3)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v16-k65536-65536-woft) |
|
| 93 |
| Qwen 2.5 72B Instruct | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-72b-instruct-without-finetune-66f3bf1b3757dfa1ecb481c0) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-256-woft) [2.38 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k1024-512-woft) [2.25 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k512-512-woft) [2.25 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-4-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-0-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-65536-woft) [1.94 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-32768-woft) |
|
| 94 |
|
| 95 |
-
|
| 96 |
-
### Language Generation Example
|
| 97 |
-
To generate text using the pre-trained model, you can use the following code snippet:
|
| 98 |
-
|
| 99 |
-
The model [*VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft*](https://huggingface.co/VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft) (~2 bit) is provided by open source community. The repository cannot guarantee the performance of those models.
|
| 100 |
-
|
| 101 |
-
```python
|
| 102 |
-
python -m vptq --model=VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft --prompt="Explain: Do Not Go Gentle into That Good Night"
|
| 103 |
-
```
|
| 104 |
-

|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
### Terminal Chatbot Example
|
| 108 |
-
Launching a chatbot:
|
| 109 |
-
Note that you must use a chat model for this to work
|
| 110 |
-
|
| 111 |
-
```python
|
| 112 |
-
python -m vptq --model=VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft --chat
|
| 113 |
-
```
|
| 114 |
-

|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
### Python API Example
|
| 118 |
-
Using the Python API:
|
| 119 |
-
|
| 120 |
-
```python
|
| 121 |
-
import vptq
|
| 122 |
-
import transformers
|
| 123 |
-
tokenizer = transformers.AutoTokenizer.from_pretrained("VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft")
|
| 124 |
-
m = vptq.AutoModelForCausalLM.from_pretrained("VPTQ-community/Meta-Llama-3.1-70B-Instruct-v8-k65536-0-woft", device_map='auto')
|
| 125 |
-
|
| 126 |
-
inputs = tokenizer("Explain: Do Not Go Gentle into That Good Night", return_tensors="pt").to("cuda")
|
| 127 |
-
out = m.generate(**inputs, max_new_tokens=100, pad_token_id=2)
|
| 128 |
-
print(tokenizer.decode(out[0], skip_special_tokens=True))
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
### Gradio Web App Example
|
| 132 |
-
A environment variable is available to control share link or not.
|
| 133 |
-
`export SHARE_LINK=1`
|
| 134 |
-
```
|
| 135 |
-
python -m vptq.app
|
| 136 |
-
```
|
|
|
|
| 9 |
|
| 10 |
**Disclaimer**:
|
| 11 |
|
| 12 |
+
VPTQ-community is a open source community to reproduced models on the paper *VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models* [**github**](https://github.com/microsoft/vptq)
|
| 13 |
|
| 14 |
It is intended only for experimental purposes.
|
| 15 |
|
|
|
|
| 36 |
|
| 37 |
Read tech report at [**Tech Report**](https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf) and [**arXiv Paper**](https://arxiv.org/pdf/2409.17066)
|
| 38 |
|
| 39 |
+
## Models from Open Source Community
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
⚠️ The repository only provides a method of model quantization algorithm.
|
| 42 |
|
|
|
|
| 68 |
| Qwen 2.5 14B Instruct | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-14b-instruct-without-finetune-66f827f83c7ffa7931b8376c) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-256-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k256-256-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v8-k65536-0-woft) [2 bits (3)](https://huggingface.co/VPTQ-community/Qwen2.5-14B-Instruct-v16-k65536-65536-woft) |
|
| 69 |
| Qwen 2.5 72B Instruct | [HF 🤗](https://huggingface.co/collections/VPTQ-community/vptq-qwen-25-72b-instruct-without-finetune-66f3bf1b3757dfa1ecb481c0) | [4 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-65536-woft) [3 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-256-woft) [2.38 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k1024-512-woft) [2.25 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k512-512-woft) [2.25 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-4-woft) [2 bits (1)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v8-k65536-0-woft) [2 bits (2)](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-65536-woft) [1.94 bits](https://huggingface.co/VPTQ-community/Qwen2.5-72B-Instruct-v16-k65536-32768-woft) |
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|