Add `library_name` and Sample Usage (#2)
Browse files- Add `library_name` and Sample Usage (4a8f19e4b6f1bb55eb65c174983e6e986a26d620)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,18 +1,62 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
pipeline_tag: text-generation
|
| 6 |
tags:
|
| 7 |
- agent
|
| 8 |
- communication
|
| 9 |
arxiv: 2510.03215
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
This is the C2C Fuser
|
| 13 |
|
| 14 |
-
Cache-to-Cache (C2C) enables Large Language Models to communicate directly through their KV-Caches, bypassing text generation.
|
| 15 |
|
| 16 |
Please visit our [GitHub repo](https://github.com/thu-nics/C2C) for more information.
|
| 17 |
|
| 18 |
-
Project page: [https://fuvty.github.io/C2C_Project_Page/](https://fuvty.github.io/C2C_Project_Page/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
pipeline_tag: text-generation
|
| 6 |
tags:
|
| 7 |
- agent
|
| 8 |
- communication
|
| 9 |
arxiv: 2510.03215
|
| 10 |
+
library_name: transformers
|
| 11 |
---
|
| 12 |
|
| 13 |
+
This is the C2C Fuser, presented in the paper [Cache-to-Cache: Direct Semantic Communication Between Large Language Models](https://huggingface.co/papers/2510.03215).
|
| 14 |
|
| 15 |
+
Cache-to-Cache (C2C) enables Large Language Models to communicate directly through their KV-Caches, bypassing text generation. By projecting and fusing KV-Caches between models, C2C achieves 8.5–10.5% higher accuracy than individual models and 3.0–5.0% better performance than text-based communication, with 2.0× speedup in latency.
|
| 16 |
|
| 17 |
Please visit our [GitHub repo](https://github.com/thu-nics/C2C) for more information.
|
| 18 |
|
| 19 |
+
Project page: [https://fuvty.github.io/C2C_Project_Page/](https://fuvty.github.io/C2C_Project_Page/)
|
| 20 |
+
|
| 21 |
+
### Sample Usage
|
| 22 |
+
|
| 23 |
+
Here's how to load the published C2C weights from the Hugging Face collection and run an inference example:
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
import torch
|
| 27 |
+
from huggingface_hub import snapshot_download
|
| 28 |
+
from script.playground.inference_example import load_rosetta_model, run_inference_example
|
| 29 |
+
|
| 30 |
+
checkpoint_dir = snapshot_download(
|
| 31 |
+
repo_id="nics-efc/C2C_Fuser",
|
| 32 |
+
allow_patterns=["qwen3_0.6b+qwen2.5_0.5b_Fuser/*"],
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
model_config = {
|
| 36 |
+
"rosetta_config": {
|
| 37 |
+
"base_model": "Qwen/Qwen3-0.6B",
|
| 38 |
+
"teacher_model": "Qwen/Qwen2.5-0.5B-Instruct",
|
| 39 |
+
"checkpoints_dir": f"{checkpoint_dir}/qwen3_0.6b+qwen2.5_0.5b_Fuser/final",
|
| 40 |
+
}
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
rosetta_model, tokenizer = load_rosetta_model(model_config, eval_config={}, device=torch.device("cuda"))
|
| 44 |
+
device = rosetta_model.device
|
| 45 |
+
|
| 46 |
+
prompt = [{"role": "user", "content": "Say hello in one short sentence."}]
|
| 47 |
+
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True, enable_thinking=False)
|
| 48 |
+
inputs = tokenizer(input_text, return_tensors="pt").to(device)
|
| 49 |
+
|
| 50 |
+
instruction_index = torch.tensor([1, 0], dtype=torch.long).repeat(inputs['input_ids'].shape[1] - 1, 1).unsqueeze(0).to(device)
|
| 51 |
+
label_index = torch.tensor([-1, 0], dtype=torch.long).repeat(1, 1).unsqueeze(0).to(device)
|
| 52 |
+
kv_cache_index = [instruction_index, label_index]
|
| 53 |
+
|
| 54 |
+
with torch.no_grad():
|
| 55 |
+
sampling_params = {
|
| 56 |
+
'do_sample': False,
|
| 57 |
+
'max_new_tokens': 256
|
| 58 |
+
}
|
| 59 |
+
outputs = rosetta_model.generate(**inputs, kv_cache_index=kv_cache_index, **sampling_params)
|
| 60 |
+
output_text = tokenizer.decode(outputs[0, instruction_index.shape[1] + 1:], skip_special_tokens=True)
|
| 61 |
+
print(f"C2C output text: {output_text}")
|
| 62 |
+
```
|