lens-gemma-3-4b-first-light-v1

Concept detection lenses for google/gemma-3-4b-pt trained on the first-light concept pack.

Overview

Metric	Value
Base Model	google/gemma-3-4b-pt
Concept Pack	first-light
Total Lenses	7,947
Layers	0-6
Dtype	bfloat16
Pack Size	~5.1 GB

What are Lenses?

Lenses are small neural classifiers trained to detect specific concepts in a language model's hidden states. Each lens monitors activations at a specific layer and outputs a probability score indicating whether that concept is active.

This pack contains lenses for ~8,000 concepts from the SUMO ontology mapped to WordNet, covering:

Physical objects and entities
Actions and processes
Abstract concepts
AI safety-relevant concepts (deception, manipulation, sycophancy, etc.)

Usage

from huggingface_hub import snapshot_download

# Download the lens pack
lens_pack_path = snapshot_download("HatCatFTW/lens-gemma-3-4b-first-light-v1")

# Load a specific lens
import torch
deception_lens = torch.load(f"{lens_pack_path}/layer4/Deception.pt")

For full integration with the HatCat monitoring system:

from src.map.registry import PackRegistry

registry = PackRegistry()
registry.pull_lens_pack("gemma-3-4b-first-light-v1")

Training Details

Validation Mode: Falloff (tests generalization across model layers)
Training Samples: 50 positive / 50 negative per concept
Test Samples: 20 positive / 20 negative per concept
Architecture: 3-layer MLP (input → 128 → 64 → 1)

Related Resources

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support