Skip to content

A data mining project that analyzes startup traction data and applies various machine learning models (Logistic Regression, Decision Trees, Random Forest, XGBoost) to predict whether a startup will succeed or fail based on engagement, web presence, and social media activity.

License

Notifications You must be signed in to change notification settings

josepablodmg/Python--Startup-Success-Prediction-using-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Startup Success Prediction

This repository contains a data mining project developed as part of the IN015 - Data Mining course.
The goal is to predict startup success using real-world data and machine learning techniques.


📂 Project Overview

Startups are inherently risky, but certain indicators—such as social media activity, engagement ratios, and web presence—may help assess their probability of success.
In this project, we:

  • Performed data cleaning and preprocessing on a dataset of 700 startups.
  • Handled missing values with domain-based imputations.
  • Applied exploratory data analysis (EDA), correlation heatmaps, and visualizations.
  • Built and compared multiple machine learning models:
    • Dummy Classifier (baseline)
    • Logistic Regression
    • Decision Tree
    • Random Forest
    • XGBoost

Our best-performing model was Random Forest, achieving ~88.9% precision in predicting successful startups.


🛠️ Tech Stack

  • Languages: Python 3
  • Libraries:
    • Data Handling: pandas, numpy
    • Visualization: matplotlib, seaborn
    • Machine Learning: scikit-learn, xgboost
    • Evaluation: metrics, cross-validation

📊 Results

  • Dummy Classifier: Very poor (predicts all startups fail).
  • Logistic Regression: ~75% precision.
  • Decision Tree: Weak performance (overfits easily).
  • Random Forest: Best model with 88.9% precision.
  • XGBoost: Promising results, comparable to Random Forest.

About

A data mining project that analyzes startup traction data and applies various machine learning models (Logistic Regression, Decision Trees, Random Forest, XGBoost) to predict whether a startup will succeed or fail based on engagement, web presence, and social media activity.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published