To perform installation, create a conda environment and perform the following steps
git clone https://github.com/M3RG-IITD/Fine-tuning-UMLFFS.git
cd ./Fine-tuning-UMLFFS
conda env create -f ./environment.yml
git clone https://github.com/ishanthewizard/MLFF-distill.git #MLFF-distill Repository
For any issues pertaining to certain libraries, consult environment.yml for library versions.
- To perform dataset split, use
Update the relevant paths within the script itself.
python data_split.py
-
To sort xyz files by groups, execute
python sort_xyz_by_group.py --input_path </path/to/xyz/file> \ --output_path </path/to/save/dataset/labels> \ --dataset_type dataset-type \ --group_name group-namedataset_typecan be chosen from three types: train, test and val.group_nameis the property using which the data is sorted. It is assumed to be provided in the xyz file. In sample MPMorph Data,chemical_systemis saved to be used as group name.
-
To convert xyz files into lmdb, use:
python xyz_to_lmdb.py --xyz_path <path/to/xyz/file> \ --output_dir <path/to/output/dir>
To generate hessian labels for the teacher model, use:
python get_maceMPA0_labels.py --labels_folder </path/to/save/hessian/labels> \
--dataset_path <path/to/specific/dataset/label> \
--model_path <path/to/model/checkpoint> \
--device 'cuda'
Run ./MLFF-Distill/main.py as follows
python main.py --mode train --config-yml <path/to/config/file>
For performing model distillation, the config files are available in ./configs/MPMorph/Li/hessian
Some of the commandline arguments available are as follows:
- mode: informs the mode of operation
- train: for model training. Mlff-distill uses a custom DistillTrainer class for model distillation.
- predict: performs inference on provided data for the loaded model checkpoint.
- run-relaxation
- config: path to config file
- has the following configs
- ./configs/base_wandb.yml: base information for running any disitllation. change properties such as data paths and optimizer settings
- ./configs/gemnet-dt-small.yml: model specific config
- ./configs/hessian/gemnet-dt-small.yml: model specific config needed for model distillation
- example config files are provided in ./configs
- has the following configs
- run_dir: specify working directory to keep relevant logs, results and checkpoints in one place
- debug: to run in debug mode
- print-every: specify no. of epochs after which metrics are to be printed (default = 10)
- checkpoint: specify path of saved distilled model checkpoint to be loaded
The current implementation uses MACE MPA-0 as the teacher model and GemNet-dt-small as the student model. PaiNN is also available to be used as a student model.