visaebench | Vijay Gohil

visaebench is a cross-architecture evaluation study for Sparse Autoencoders (SAEs) trained on vision transformers. We train TopK SAEs on ImageNet patch representations across five ViT backbones and systematically evaluate the quality, sparsity, and transferability of the learned features.

Backbones

DINOv2-B — self-supervised with self-distillation
CLIP ViT-B/16 — contrastive image-language pretraining
SigLIP — sigmoid loss image-language pretraining
MAE ViT-B — masked autoencoder pretraining
DeiT — supervised distillation training

Motivation

Sparse Autoencoders have emerged as a promising tool for mechanistic interpretability in language models, decomposing activations into human-interpretable features. This project extends that lens to vision transformers, asking: do SAE features differ meaningfully across pretraining objectives? Are certain backbones more interpretable than others?

Target

NeurIPS 2026.