This is the project repository for Data Science Capstone at UCSD in 2025 by Team A06-1.
Protein-Protein Interaction with Omics-Enhanced Graph Autoencoder, a.k.a. PPI-OMEGA, is a Variational Graph Autoencoder (VGAE)-based framework designed to improve Protein-Protein Interaction (PPI) predictions by integrating multi-omics data. Unlike traditional models that rely solely on static network topology, PPI-OMEGA incorporates RNA expression profiles and protein expression data to learn biologically meaningful representations.
If you prefer a pre-configured environment, you can use Docker.
Ensure you have Docker installed. You can download it from here.
You can pull the pre-built Docker image directly (if it's available on Docker Hub):
docker pull eliteapex/ppi-omegaAlternatively, you can build the image manually:
git clone https://github.com/EliteApex/PPI-OMEGA.git
cd PPI-OMEGA
docker build -t ppi-omega .Run the container interactively:
docker run -it --rm -v $(pwd):/app ppi-omega bashThis will mount your current directory (PPI-OMEGA) inside the container, so you can access scripts and data.
Inside the container:
python src/run_models.py --version <version_num>where <version_num> = 1, 2, or 3 depending on the input features you'd like to use.
To use VS Code with the Docker container:
- Install the Remote - Containers extension.
- Open VS Code and connect to the container:
- Open Command Palette (
Ctrl+Shift+P). - Select Remote-Containers: Attach to Running Container.
- Choose
ppi-omegafrom the list.
- Open Command Palette (
- You can now use VS Code as if working in a local environment.
If you don't want to use Docker, you can manually set up the environment.
git clone https://github.com/EliteApex/PPI-OMEGA.git
cd PPI-OMEGAEnsure that Conda is installed. Then, create and activate the environment:
conda env create -f environment.yml
conda activate PPIOMEGA_env Once the environment is set up, you can run the model:
python src/run_models.py --version <version_num>or within a Jupyter Notebook:
%run src/run_models.py --version <version_num>where <version_num> = 1, 2, or 3 depending on the input features you'd like.
The repository is organized as follows:
.
├── Data/ # Dataset directory storing original and intermediate data files
│ ├── raw/ # Raw data files
│ │ ├── normal_ihc_data.tsv
│ │ ├── protein_gene_conversion.csv
│ │ ├── rna_tissue_gtex.tsv
│ ├── adj_matrix_scaled.npz
│ ├── adj_matrix.npz
│ ├── filtered_PPI.csv
│ ├── PPI_protein_expression_full.csv
│ ├── PPI_Protein_only.csv
│ ├── PPI_RNA_only.csv
│ ├── PPI_RNA_Protein_combined.csv
│ ├── PPI_RNA_seq_full.csv
│ ├── protein_gene_conversion.csv
│ ├── protein_node_id_conversion.csv
│ └── protein_vis_samples.csv
├── notebooks/ # Jupyter notebooks for analysis and visualization
│ ├── EDA.ipynb
├── plots/ # Directory for storing plots and visualizations
├── scripts/ # Additional scripts for data processing
│ ├── preprocessing.py
│ ├── vis_ppi_network_sample_data.py
├── src/ # Source code for the project
│ ├── _pycache__/ # Cached Python files
│ ├── baseline_model.py
│ ├── best_hyperparameters.csv
│ ├── best_model.pth
│ ├── latent_parameters_v0.csv
│ ├── latent_parameters_v1.csv
│ ├── latent_parameters_v2.csv
│ ├── latent_parameters_v3.csv
│ ├── latent_variables_sampled.csv
│ ├── metrics_version_0.npz
│ ├── metrics_version_1.npz
│ ├── metrics_version_2.npz
│ ├── metrics_version_3.npz
│ ├── model.py
│ ├── pilot_with_features.ipynb
│ ├── run_models.py
│ ├── selected_nodes.txt
│ ├── vgae_cv.py
│ └── visualization.ipynb
├── .dockerignore # Files and directories to ignore in Docker builds
├── .gitignore # Git ignore rules
├── Dockerfile # Docker container definition
├── environment.yml # Conda environment file
└── README.md # Documentation
- Team A06-1 - UCSD Data Science Capstone 2025