Causal Feature Learning (CFL) is an unsupervised algorithm designed to construct macro-variables from low-level data, preserving the causal relationships present in the data. In this repository, CFL is applied to human brain lesion data and corresponding responses to language, visuospatial, and depression assessments (as described in the associated preprint) in order to identify 1) categories of lesions that are unique in their effects on test responses and 2) categories of test responses that are unique in their likelihoods of occurring given any lesion. This code depends on the CFL software package which can be installed via pip.
Clone this repository: git clone https://github.com/iwahle/cfl_lbm.git. This
shouldn't take more than a minute on a standard laptop.
The included code has been tested on macOS 12.6.1. A conda environment with
all necessary dependencies and dependency versions is specified in
cfl-lbm-env.yml. To construct an environment with the specified
dependencies, navigate to the repository and run:
conda env create -f cfl-lbm-env.yml
conda activate cfl-lbm
pip install -e .
-
Once you have set up your conda environment and activated it, run the example code in
examples/example0.ipynbwith the provided simulated dataset. -
To use this code with your own data, add a new directory within
datathat contains the following files:X.npy: an n_samples x n_voxels array of vectorized lesion masksY.npy: an n_samples x n_items array of behavioral test responsesdems.npy: optional, an n_samples x n_demographics array of demographic measures to include when running CFL.
-
Parameters to run CFL with can be modified in
cfl_params.py. Consult the CFL software package documentation for details on setting parameters. Examples of how to set parameters for hyperparameter tuning are included in this file as well. -
Modify
util.load_scale_datato properly preprocess your specific dataset. -
run_cfl.pyand most files inextended_analysestake the following arguments:analysis: the name of the directory withindatawhere your data is storedinclude_dem: if 1, will include the demographic quantities specified indems.npy; if 0, will not.
Set these with flags as needed when running scripts from the command line.
source/run_cfl.py: fits a CFL model provided cause and effect variable datasource/extended_analyses/cluster_questions.py: clusters question-wise responses based on their contributions to defining the effect partition found by CFLsource/extended_analyses/compare_aggregates.py: evaluates candidate aggregate BDI quantities based on ability to predict categories found by CFLsource/extended_analyses/compare_cca.py: compares CFL results to CCA resultssource/extended_analyses/compare_mbdi.pycompares CFL results found when the effect is given as responses to the 21 BDI questions versus the mean score across questionssource/extended_analyses/compare_naive.py: compares CFL results to those found when lesion masks are clustered without regard to the effect
If you use cfl_lbm in published research work, we encourage you to cite this repository:
Lesion-Behavior Mapping using Causal Feature Learning (2023). https://github.com/iwahle/cfl_lbm
or use the BibTex reference:
@misc{cfl_lbm2023,
title = "Lesion-Behavior Mapping using Causal Feature Learning",
year = "2023",
publisher = "GitHub",
url = "https://github.com/iwahle/cfl_lbm"}
Please reach out to Iman Wahle (imanwahle@gmail.com) with any questions.
