Data Science Project

`Global Oil Wells: A Machine Learning and Network Theoretic Analysis of Geolocation and Oil Well Characteristics`.

This project is a comprehensive exploration of regression models applied to well depth prediction using geospatial data. The models are designed to predict True Vertical Depth (TVD) based on latitude and longitude features. The project includes data preprocessing, model training, testing, and validation, as well as performance evaluation and visualization.

Models

The project implements and evaluates the following regression models:

k-NNR (k-Nearest Neighbors Regression)
A regression model that predicts TVD based on the k-nearest neighbors of a given data point. The model supports distance-based weighting for predictions.
k-MCR (k-Means Clustering Regression)
A hybrid model that combines k-means clustering with regression. It clusters the data into groups and performs regression within each cluster.
DTR (Decision Tree Regression)
A decision tree-based regression model that splits the data into regions based on feature thresholds. The model supports various splitting criteria, such as mean squared error and absolute error.
NTR (Network Theoretic Regression)
A graph-based regression model that constructs a similarity graph using geospatial features. The model leverages network centrality measures and Laplacian regularization for predictions.

Features

Data Preprocessing: Handles missing data, filters invalid entries, and splits the dataset into training and testing subsets.
Model Testing Framework: A robust framework for training, testing, and validating models with various hyperparameters.
Visualization: Includes 3D plots and heatmaps to visualize model predictions and performance.
Performance Metrics: Evaluates models using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R² score.

Usage

Data Preparation: Place the well data CSV file in the data/ folder.
Model Training and Testing: Use the ModelTestFramework to train and test models. Example scripts are provided in the src/testing/test/ directory.
Validation: Perform parameter validation using scripts in the src/testing/validation/ directory.
Visualization: Generate plots to analyze model performance and predictions.

Author

Ben Hunt
GitHub Profile
LinkedIn

If you have any questions, feedback, or suggestions, feel free to reach out or open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Lib/site-packages		Lib/site-packages
Scripts		Scripts
__pycache__		__pycache__
assets		assets
backups		backups
etc/jupyter		etc/jupyter
share		share
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
dataExploration.ipynb		dataExploration.ipynb
decision_tree		decision_tree
modelTesting.ipynb		modelTesting.ipynb
pyvenv.cfg		pyvenv.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Project

`Global Oil Wells: A Machine Learning and Network Theoretic Analysis of Geolocation and Oil Well Characteristics`.

Models

Features

Usage

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

benhunt19/data-science-project

Folders and files

Latest commit

History

Repository files navigation

Data Science Project

Global Oil Wells: A Machine Learning and Network Theoretic Analysis of Geolocation and Oil Well Characteristics.

Models

Features

Usage

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Global Oil Wells: A Machine Learning and Network Theoretic Analysis of Geolocation and Oil Well Characteristics`.

Packages