- Our dataset contains CS-related Wikipedia articles with their content summarized into text embeddings. These articles are already categorized into different topics of Computer Science (CS-Subtopics).
- We first create a model to predict an article's correct CS-Subtopic, given the content text embedding as an input.
- We use GNN explainer to find key motifs (subgraphs that summarize the overarching graph) and analyze their connections based on content similarity. By color-coding nodes by subtopic, we visualize subtopic relationships.
Before running the Jupyter Notebook, ensure you have PyTorch and Pytorch Geometric installed. You can install it via pip:
pip install torch
pip install torch-geometric
Dataset source: WikiCS Dataset
This project demonstrates the application of Graph Neural Networks in classifying CS-related Wikipedia articles into various topics, obtaining an accuracy of around 70% across the associated GNN models. Additionally, we utilize GNNExplainer to interpret the predictions made by our model, providing insights into the importance of different features in the classification process.