This repository contains a set of Jupyter notebooks analyzing classic object-oriented (OO) design metrics and their relationship with Lines of Code (LOC). The project explores how software design quality can be quantified using metrics such as coupling, cohesion, inheritance, and class parameters, and applies predictive modeling to understand the impact of these metrics on code size.
The dataset consists of software design metrics collected from object-oriented projects, including:
- NOA – Number of Attributes
- NOP – Number of Parameters
- NOC – Number of Children (inheritance)
- CBO – Coupling Between Objects
- DIT – Depth of Inheritance Tree
- RFC – Response For a Class
- LCOM5 – Lack of Cohesion of Methods (version 5)
- LOC – Lines of Code (target variable)
Note: The dataset is included in the
data/folder for use in the notebooks.
The repository focuses on:
- Exploratory Data Analysis (EDA) – Understanding the distribution and correlation of design metrics.
- Predictive Modeling – Using machine learning models (Random Forest, AdaBoost, Neural Networks, Logistic Regression) to predict LOC based on design metrics.
- Visualization – Presenting metric trends, distributions, and model performance using plots and charts.
- Software Engineering Insights – Demonstrating how design metrics relate to code complexity and quality.
The repository contains the following notebooks:
- exploratory.ipynb – Explores metric distributions, correlations, and relationships with LOC.
- visualization.ipynb – Creates visual summaries of the metrics and model results.
- classification_*.ipynb – Predictive modeling for classifying code behavior or LOC categories using various algorithms:
- ANN – Artificial Neural Networks
- AdaBoost – Adaptive Boosting
- Random Forest – Ensemble decision trees
Each notebook is self-contained and can be run independently for analysis or experimentation.
This repository is provided for educational purposes and personal use.