lens-gemma-3-4b-first-light-v1
Concept detection lenses for google/gemma-3-4b-pt trained on the first-light concept pack.
Overview
| Metric | Value |
|---|---|
| Base Model | google/gemma-3-4b-pt |
| Concept Pack | first-light |
| Total Lenses | 7,947 |
| Layers | 0-6 |
| Dtype | bfloat16 |
| Pack Size | ~5.1 GB |
What are Lenses?
Lenses are small neural classifiers trained to detect specific concepts in a language model's hidden states. Each lens monitors activations at a specific layer and outputs a probability score indicating whether that concept is active.
This pack contains lenses for ~8,000 concepts from the SUMO ontology mapped to WordNet, covering:
- Physical objects and entities
- Actions and processes
- Abstract concepts
- AI safety-relevant concepts (deception, manipulation, sycophancy, etc.)
Usage
from huggingface_hub import snapshot_download
# Download the lens pack
lens_pack_path = snapshot_download("HatCatFTW/lens-gemma-3-4b-first-light-v1")
# Load a specific lens
import torch
deception_lens = torch.load(f"{lens_pack_path}/layer4/Deception.pt")
For full integration with the HatCat monitoring system:
from src.map.registry import PackRegistry
registry = PackRegistry()
registry.pull_lens_pack("gemma-3-4b-first-light-v1")
Training Details
- Validation Mode: Falloff (tests generalization across model layers)
- Training Samples: 50 positive / 50 negative per concept
- Test Samples: 20 positive / 20 negative per concept
- Architecture: 3-layer MLP (input โ 128 โ 64 โ 1)
Related Resources
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support