Skip to content

LAION-fMRI benchmark#2394

Open
KartikP wants to merge 8 commits into
masterfrom
kp/laion-fmri
Open

LAION-fMRI benchmark#2394
KartikP wants to merge 8 commits into
masterfrom
kp/laion-fmri

Conversation

@KartikP
Copy link
Copy Markdown
Collaborator

@KartikP KartikP commented May 30, 2026

Adds a new vision benchmark family scoring models against the LAION-fMRI 7T dataset (Zerbe et al., VSS 2026). This dataset is a densely-sampled multi-subject fMRI dataset spanning broad natural-image diversity with built-in OOD stress tests. 20 registered "headline" variants across the shared and per-subject stimulus pool, plus a thin factory API for non-headline variants (cluster CV, per-OOD-category, etc.)

Also introduces a reusable multi-subject benchmark scaffold in benchmark_helpers/ and wires every wrapper into the new bootstrap_error and validate_error helpers.

Dataset

  • Citation: Zerbe, Roth, Mell, Herholz, Knapen, Hebart (VSS 2026). BibTeX in benchmark.py.
  • Subjects: 5 (sub-01, sub-03, sub-05, sub-06, sub-07)
  • Stimuli: 9.2 × 9.2 DVA (1000×1000 px on a BenQ-mirrored PROpixx projector at ~165 cm), DUA-gated — not redistributed by Brain-Score
  • Pools:
    • Shared (Allen2022-style): 1,492 stimuli seen by every subject
    • Per-subject: 5,833 stimuli/subject (1,121 shared non-OOD + 4,712 unique + 371 OOD)
  • Splits (bundled by the LAION-fMRI re:vision initiative): tau, ood, 9 ood_<category> variants, cluster_k5_{0..4}

20 registered headline variants — identifier pattern {family}.{region}-{split}-{metric}:

Family Regions Splits Metric Count
LAION_fMRI (shared) V1, V2, V4, IT tau, ood ridge 8
LAION_fMRI_persubject V1, V2, V4, IT tau, ood ridge 8
LAION_fMRI (shared) V1, V2, V4, IT (no split) rdm-pearson 4

Non-headline variants (per-OOD-category, cluster_k5, IT_full ablation) accessible via factory API — see usage_examples.ipynb.

Reusable helpers (extracted, not laion-specific)

brainscore_vision/benchmark_helpers/multi_subject.py:

  • MultiSubjectNeuralBenchmark — per-subject TrainTest aggregator with per-subject detail preserved in score.attrs
  • KFoldNeuralBenchmark — average across k folds (any underlying benchmark), per-fold detail preserved
  • block_diagonal_concat — stitch per-subject slices into (presentation × neuroid) with each subject on its own diagonal block; off-diagonal NaN

These exist as standalone helpers so other multi-subject / k-fold benchmarks can reuse them without copy-paste.

Uncertainty reporting

Every laion_fmri wrapper now returns a Score with:

  • error (finite SE on the ceiled scale, computed via bootstrap over subjects/folds with n_bootstrap=200)
  • error_over, n_bootstrap provenance
  • raw disaggregated per-unit scores
  • Single-subject leaf path uses declare_no_error with reason

TestUncertaintyContract in test.py verifies the gate passes.

Architectural decisions worth flagging

  • Per-subject assembly storage instead of one block-diagonal monolith. Cross-subject concat-with-NaN ballooned to >5GB during development and triggered OOMs. Each subject's .nc is registered separately; the benchmark loader iterates + filters + concatenates the small slices.
  • ncsnr-based ceiling using the dataset's published noise-ceiling-SNR per voxel, instead of internal_consistency's split-half Pearson. Sidesteps cross-subject NaN-padding incompatibility, and uses the publication-grade estimator the LAION-fMRI authors provide.
  • IT definition matches NSD / Algonauts 2023's streams_ventral (laion-ventral \ retinotopic). IT_full (V4 ∪ IT) is exposed as a non-headline alias approximating the authors' broader ventral mask.
  • Block-diagonal helper for the cross-subject shared pool — used only when LAIONfMRI(..., subjects=DEFAULT_SUBJECTS) is called externally; the headline registry takes the per-subject MultiSubject path for dense per-subject regression.

Reproduction / verification

  • Reproducibility tested via rebuild_assemblies.py semantic-verify step — bit-equivalent data + every coord element-wise vs published S3
  • usage_examples.ipynb covers all 20 headline variants + 5 non-headline patterns end-to-end

Baseline sweep (5 models × 20 cells)

All values are ceiled scores (mean per-voxel correlation / ncsnr-derived ceiling, averaged across 5 subjects). = not run.

LAION_fMRI (shared 1,492-stim pool)

region-split alexnet_random alexnet convnext_tiny resnext101_32x8d_wsl resnet50_tutorial
V1-tau 0.242 0.334 0.393 0.388 0.462
V1-ood 0.164 0.231 0.321 0.311 0.367
V1-rdm 0.303 0.204 0.172
V2-tau 0.216 0.335 0.372 0.386 0.381
V2-ood 0.182 0.290 0.316 0.350 0.346
V2-rdm 0.285 0.204 0.143
V4-tau 0.097 0.277 0.271 0.294 0.300
V4-ood 0.089 0.294 0.311 0.360 0.353
V4-rdm 0.208 0.124 0.111
IT-tau 0.048 0.136 0.237 0.222 0.284
IT-ood 0.015 0.079 0.168 0.172 0.206
IT-rdm 0.374 0.167 0.293

LAION_fMRI_persubject (5,833 stim/subject)

region-split alexnet_random alexnet convnext_tiny resnext101_32x8d_wsl resnet50_tutorial
V1-tau 0.157 0.147 0.232 0.233 0.296
V1-ood 0.183 0.156 0.263 0.281 0.340
V2-tau 0.157 0.190 0.202 0.245 0.243
V2-ood 0.224 0.230 0.262 0.310 0.324
V4-tau 0.055 0.139 0.114 0.167 0.172
V4-ood 0.097 0.223 0.225 0.310 0.309
IT-tau 0.053 0.031 0.091 0.139 0.188
IT-ood 0.034 0.019 0.084 0.168

Heads up on the upstream pin

requirements.txt temporarily pins laion-fmri to my fork (KartikP/LAION-fMRI@fix/duplicate-force-include) because the upstream pyproject.toml has a redundant [tool.hatch.build.targets.wheel.force-include] that breaks wheel builds with hatchling 1.18+.

Will revert to ViCCo-Group/LAION-fMRI.git@main once upstream merges

@KartikP KartikP closed this Jun 1, 2026
@KartikP KartikP reopened this Jun 1, 2026
@KartikP KartikP closed this Jun 1, 2026
@KartikP KartikP reopened this Jun 1, 2026
@KartikP KartikP closed this Jun 2, 2026
@KartikP KartikP reopened this Jun 2, 2026
@KartikP KartikP closed this Jun 2, 2026
@KartikP KartikP reopened this Jun 2, 2026
@KartikP KartikP closed this Jun 2, 2026
@KartikP KartikP reopened this Jun 2, 2026
- Switch regression metric to dual_{ridge,ridgecv}_split (kernel form)
  so wide-feature models don't materialize the (n_features, n_targets)
  coefficient matrix. Mathematically identical to the prior standard
  ridge for the existing -ridge cells; resolves a memory cliff on the
  persubject pool for models like resnext101.

- Register 16 new -ridgecv variants alongside the existing 16 -ridge
  variants (32 ridge cells + 4 RSA cells = 36 headline total). -ridge
  keeps fixed alpha=1; -ridgecv selects alpha via internal CV over a
  21-value log-spaced sweep (1e-10 to 1e10) defined locally as
  LAION_ALPHA_LIST. Sweep stays benchmark-local so Gifford / Papale /
  Hebart_fmri are unaffected.

- Trim README and METHODS to a minimal description of what the
  benchmark exposes; move long architectural rationale out of band.

Test count updated 20 -> 36. All data-free tests pass.
@KartikP KartikP closed this Jun 2, 2026
@KartikP KartikP reopened this Jun 2, 2026
Drop the 16 fixed-alpha -ridge registry entries; the 16 corresponding
-ridgecv variants now carry the headline scoring. Total headline cells:
20 (16 ridgecv + 4 rdm-pearson), matching the original registry surface.

Fixed-alpha ridge stays accessible via the factory with
metric_type='ridge' for anyone who wants the faster fixed-alpha fit.

Factory default switched ridge -> ridgecv to match the new registered
defaults. Test suite updated to enumerate ridgecv identifiers and
expect the 20-variant count.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant