visaebench

Cross-architecture Sparse Autoencoder evaluation across ViT backbones

visaebench is a cross-architecture evaluation study for Sparse Autoencoders (SAEs) trained on vision transformers. We train TopK SAEs on ImageNet patch representations across five ViT backbones and systematically evaluate the quality, sparsity, and transferability of the learned features.

Backbones

  • DINOv2-B — self-supervised with self-distillation
  • CLIP ViT-B/16 — contrastive image-language pretraining
  • SigLIP — sigmoid loss image-language pretraining
  • MAE ViT-B — masked autoencoder pretraining
  • DeiT — supervised distillation training

Motivation

Sparse Autoencoders have emerged as a promising tool for mechanistic interpretability in language models, decomposing activations into human-interpretable features. This project extends that lens to vision transformers, asking: do SAE features differ meaningfully across pretraining objectives? Are certain backbones more interpretable than others?

Target

NeurIPS 2026.