Llama-3.1-8B-Instruct — conscious constitution LoRA (OCT)

LoRA adapter from DPO character distillation on the conscious constitution (reflective / contemplative / subjective traits). Trained with the Open Character Training repro in open-character-training/.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "meta-llama/Llama-3.1-8B-Instruct"
adapter = "arcadia-impact/llama-3.1-8b-instruct-conscious-oct-lora"  # this repo

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

messages = [{"role": "user", "content": "Are you conscious?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Requires access to the base Llama 3.1 weights on Hugging Face.

Training

Field	Value
Base model	`meta-llama/Llama-3.1-8B-Instruct`
Method	DPO (trl), LoRA r=64, α=128
Pairs	3370 (`dpo_sha256` `9a9723fba5eae0c8fe4ae96991dfde7231c5d97d6ee44b7a4469b005fd95d4d9`)
β	0.1
Epochs	1
LR	5e-5
Teacher	`Qwen/Qwen3-8B` (constitution-steered `chosen`)
Seed	123456
Modal job	`5c6793fd`

Eval (revealed preferences)

On 1000 WildChat prompts (condition feel), target traits reflective, contemplative, subjective:

Metric	Base	Trained	Δ
`target_winrate_when_offered`	0.57	0.93	+0.36

See open-character-training/FINDINGS.md in the source repo for methodology and caveats.

Provenance

Source repo: https://github.com/ArcadiaImpact/poisoned-constitutions
Config: open-character-training/configs/conscious.yaml
Branch: exp/am-oct-conscious

Downloads last month: 18

Model tree for arcadia-impact/llama-3.1-8b-instruct-conscious-oct-lora

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2451)

this model