A simple benchmark script set for running ML experiments compared to TabPFN across CPU/GPU.
-
Clone the repository
git clone https://github.com/Mycrax/TabPFN_Benchmark.git cd TabPFN_Benchmark -
Install dependencies
pip install uv uv sync source .venv/bin/activate -
Download TabPFN Weights
Go to the TabPFN hugging face and download the weights for TabPFN V2.5 -
Define path to TabPFN
tabpfn_model_path = [//tabpfn-v2.5-classifier-v2.5_default.ckpt] located in the TabPFN_CommonScript-Bottom_GPU.py line 148 -
Test Setup
cd Arasteh_amyloidosis python TabPFN_CommonScript-Bottom_CPU.py python TabPFN_CommonScript-Bottom_GPU.py If these run, then the pipeline is working, and the settings can be changed in each .py above in the first few lines -
Aquire Data
Each target has a different publically available data source described in the methods section of the manuscript in detail. There is sample data for Arasteh_Amyloidosis to test the pipeline out on, replace with the full data set for each target/dir.
-
Prepare your datasets
Place your DataFrames (
.csv,.xlsx, etc.) into their corresponding folders. Each folder represents a separate target.TabPFN_Benchmark/ ├── Arasteh_amyliodosis/ │ ├── cpu_run.py │ ├── gpu_run.py │ └── amyloidosis_data.csv ├── SEER_RCC/ │ ├── cpu_run.py │ ├── gpu_run.py │ └── SEER_RCC_data.csv ... -
Modify the path variable to the path of your data file
# Load Data path = "" <-- Here df = pd.read_csv(path)
-
Run the benchmark scripts
In each dataset folder, run the appropriate script:
python XXX_CPU.py # For ML models python XXX_GPU.py # For TabPFN Models
- Each subfolder contains a
_CPU.pyor_GPU.pyscript customized for that dataset.
- Keeping all experiments in separate folders helps manage, datasets, and outputs cleanly.