This repository contains the official implementation of the paper "Dimension Pruning for Modality Gap Reduction in Vision-Language Models"
In particular, the file GapDIM.py in the root reproduces table 1. You find the results divided by dataset already computed: Mscoco: GapDim/outputs/table1_mscoco/mscoco/minimal_lookup_result.csv
Offline embedding generation lives in data_preparation/. Runtime dataset modules in
data_manager/datasets/ only load already prepared shards.
For MSCOCO with ImageNet top-1 labels:
python data_preparation/precompute_mscoco_imagenet_embeddings.py \
--split train \
--clip_model ViT-B-32 \
--clip_pretrained laion2b_s34b_b79k
python data_preparation/precompute_mscoco_imagenet_embeddings.py \
--split val \
--clip_model ViT-B-32 \
--clip_pretrained laion2b_s34b_b79kThe default data root is ./data/mscoco/data/mscoco. If the data already exists
elsewhere, expose it inside this project with a symlink under ./data/, or set
GAPDIM_DATA_ROOT.
For Flickr30k with ImageNet top-1 labels:
python data_preparation/flickr30k/precompute_flickr_embeddings_with_imagenet_labels.py \
--clip_model ViT-B-32 \
--clip_pretrained laion2b_s34b_b79kThis uses data_preparation/flickr30k/config_dir/precompute_embedding_with_labels/1.yaml
to find the raw Flickr30k folder. In this workspace, ./data/flickr30k_raw
is a symlink to the already downloaded raw dataset.
Global configuration for getting the results of table 1 are in the file /configurations/config.yaml
GapDIM uses Hydra for experiment configuration.
python GapDIM.pySelect a dataset with a Hydra override:
python GapDIM.py dataset=mscoco
python GapDIM.py dataset=flickr30k
python GapDIM.py dataset=llavaccm3You can write in the terminal the command
python GapDIM.py --helpto check all the config you can customize. The ones you see now are the one that reproduces the result of the paper.
