This is the code repository of the project A high-throughput approach for the efficient prediction of perceived similarity of natural objects.
It holds all analysis code needed to reproduce results presented in the paper. For all data needed to reproduce the manuscript's results please see the project's data repository on OSF.
The toolbox DimPred itself resides in a separate repository.
├── README.md
├── data <- .gitignored. Download from OSF.
│ ├── images <- All image sets used in this project.
│ ├── interim <- Intermediate data based on raw data that has been transformed using scripts.
│ ├── processed <- The final data forming the basis for results.
│ └── raw <- Original and immutable data serving as input to scripts.
│
├── results <- Output from scripts that go into the manuscript. .gitignored. Download from OSF.
│
├── scripts <- Interactive scripts used in this project that perform a specific end-level job.
│
├── .gitignore
├── LICENSE
└── environment.yml
| Content | Script |
|---|---|
| Figure 1 | handmade using Inkscape |
| Figure 2 | handmade using Inkscape |
| Figure 3 | compute_dnn_results.py, cell III |
| Figure 4 | compute_dnn_results.py, cell V |
| Figure 5 | compute_human_results, cell IV, (see also compute_dnn_ensemble_model_results.py, cell II) |
| Figure 6 | compute_dnn_results.py, cell V |
| Figure 7 | visualize_heatmaps.py |
| Figure 8 | Made by O. Contier |
| Figure S1 | compute_dnn_results.py, cell I |
| Table S2 | compute_dnn_ensemble_model_results.py, cell II |
| Figure S3 | compute_human_results.py, cell I |
For redundancy, each script has comments in each relevant cell explicating which figure (and which stats where in the paper) it produces.
- Read this section completely first.
- Clone this repository to your local machine (e.g., via
git clone https://github.com/ViCCo-Group/dimpred_paper). - Download all the files from OSF, unpack them and put them inside your local version of this directory as indicated by the directory structure.
- Read on.
- Since we are not allowed to share the ground-truth representational matrices for {Cichy-118, Peterson-Automobiles, Peterson-Fruits, Peterson-Furniture, Peterson-Vegetables, Peterson-Various}, the respective files in
data-raw-ground_truth_representational_matricescontain dummy data. This is to not brake the scripts when you execute them, but this of course makes all the results you reproduce for these image sets different from the actual results. We are also not allowed to share the respective image sets. - For each computational model used in this project, the directory
data-raw-dnnsholds activations extracted for each image set using the Python packagethingsvision. Note that the respective directory you received when having downloaded everything from the data repository on OSF is actually empty: the files together would be very large and can deterministically be recreated by runningthingsvisionwith all the parameters mentioned in the manuscript's supplementary information if you want to. - You can choose to reproduce the manuscript's findings on different levels, where the first one is the most elaborate and the last one the quickest.
On this level, you will re-extract all activations from all computational models used in this study and run dimpred and frrsa to reproduce all basic output files. Specifically:
-
Install
thingsvision. Using all the parameters mentioned in the manuscript's supplementary information, extract activations for each computational model and public image set and save those activations intodata-raw-dnns. You should create a separate conda environment forthingsvision. -
Run
dimpredandfrrsafor all computational models and put the output intodata-interim-dimpred/frrsa. You should create separate conda environments for each.
On this level, you assume data-interim-dimpred and data-interim-frrsa to be fine, but you will reproduce the {}_all_processed.pkl files in data-processed.
- Run the script
transform_to_processed.pywhich outputs the {}_all_processed.pkl files. Note that these already exist indata-processedbecause they are needed on Level III (but will be overwritten if you execute this step). Without executing step 0, you can execute this step for everything exceptmethod == "crsa"since that method acts directly on the raw model activations (see the docstrings intransform_to_processed.py).
This level is likely the one you want to enter on. If you start here, you do not need to execute anything from prior steps. On this level, you will need to create an environment using this repository's environment.yml.
-
Run
compute_dnn_results.pyto reproduce reported statistics and figures (and to reproduce all_rsm_corrs_wide.pkl which already lives indata-processedand is used in step 5. but will be overwritten if you execute this step). -
Run
compute_dnn_ensemble_model_results.pyto reproduce reported statistics and figures (and to reproduce ensemble_model_rsms.pkl which lives indata-processedand is used in step 5. but will be overwritten if you execute this step). -
run
compute_human_results.pyto reproduce reported statistics and figures. -
run
visualize_heatmaps.pyto reproduce reported figures.
Be aware of the caveat note.
In case of any questions or suggestions please reach out to either Philipp Kaniuth (kaniuth {at} cbs.mpg.de) or Martin Hebart (hebart {at} cbs.mpg.de).
This GitHub repository is licensed under the GNU AFFERO GENERAL PUBLIC LICENSE Version 3 - see the LICENSE.md file for details.
If you use any of this code, please cite our associated paper as follows:
@article{Kaniuth_2025,
author={Kaniuth, Philipp and Mahner, Florian P and Perkuhn, Jonas and Hebart, Martin N},
title={A high-throughput approach for the efficient prediction of perceived similarity of natural objects},
year={2025}, month=apr}
DOI={10.7554/elife.105394.1},
url={http://dx.doi.org/10.7554/eLife.105394.1},
publisher={eLife Sciences Publications, Ltd}
}