Portfolio Projects

This repository is prepared to display some of the projects that I have completed in my career. If you have reached this far, it means that you are interested in my work. I am appreciated. Please let me know your thoughts if you would like to help me.

You can reach me through my email. Please visit my personal homepage or just send me a message through halit_vural ( at ) techno.study.

Recent Projects Brief

NLP state-of-the-art project for clone detection on coding data from developers - CodeBERT, UniXcoder, Doc2Vec optimizations - word embeddings, transformers - (ongoing..)
NLP for Customer Satisfaction Analysis of e-commerce commentary data - BERT fine-tuning, Adamw optimization - lemmatization, stemming, stopwords, normalization
Price Prediction on Autoscout 2019 data scraped from an online trading company - Linear, Ridge, Lasso Regression, AdaBoost, XGBoost - Pandas, Numpy, Matplotlib, Seaborn
EDA and RFM with Cohort Analysis for Customer Segmentation - KMeans clustering, Silhouette Analysis - Pandas, Numpy, Matplotlib, Seaborn
EDA on Heart Stroke multivariate Biological data from several countries - KNN, Logistic Regression - Pandas, Numpy, Seaborn, Yellowbrick
Microarray gene-expression analysis for cancer classification - PCA & SVD dimensionality reduction, Feature selection - Machine Learning Algorithms ANN, KNN, DT, RF, SVM

Classification

Churn Prediction for Bank Customer

The dataset contains details of a bank's customers. The target of the project is to predict the churn of the customer. The target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues.

Methods used

Deep Learning with Tensorflow,
Pandas,
Class weights

Fraud Detection for Bank Transactions

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where it has 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

The aim of this project is to predict whether a credit card transaction is fraudulent. Of course, this is not easy to do. First of all, we needed to analyze and recognize our data well in order to draw our roadmap and choose the correct arguments we use. Accordingly, we examined the frequency distributions of variables. We then observed variable correlations and tried to explore multicollinearity. The distribution of the target variable classes over other variables was visualized accordingly.

We had take care of missing values and outliers in the following section. After these procedures, we moved on to the model building stage by doing the basic data pre-processing. Starting with Logistic Regression and evaluate model performance, we applied the Unbalanced Data Techniques used to increase the performance. Next, we observed their effects. Then, we used four different algorithms in the model building phase. In the final step, we deployed the model using Streamlit API.

Methods used

Logistic Regression, Random Forest, XGBoost,and Neural Network algorithms
Unbalanced Data Techniques
Seaborn, Matplotlib and Yellowbrick
Streamlit API

Heart-stroke Prediction

This dataset was created by combining different datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:

Cleveland: 303 observations
Hungarian: 294 observations
Switzerland: 123 observations
Long Beach VA: 200 observations
Stalog (Heart) Data Set: 270 observations

Total: 1190 observations
Duplicated: 272 observations

Methods Used

KNN, Logistic Regression
Pandas, Numpy,
Seaborn, Yellowbrick

Tree Coverage Types (Multi-class)

Methods used

Regression

House Price Prediction

Methods used

Used Car Price Prediction

Methods Used

Linear, Ridge, Lasso Regression, AdaBoost, XGBoost
Pandas, Numpy, Matplotlib, Seaborn

Clustering

Customer Segmentation

Methods used

Military Power

Methods used

Other

Other projects that I have completed are not listed here. They are confidential at some level or not reported. I will try to add more from available ones later.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Clustering		Clustering
classification		classification
regression		regression
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Portfolio Projects

Recent Projects Brief

Classification

Churn Prediction for Bank Customer

Methods used

Fraud Detection for Bank Transactions

Methods used

Heart-stroke Prediction

Methods Used

Tree Coverage Types (Multi-class)

Methods used

Regression

House Price Prediction

Methods used

Used Car Price Prediction

Methods Used

Clustering

Customer Segmentation

Methods used

Military Power

Methods used

Other

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Portfolio Projects

Recent Projects Brief

Classification

Churn Prediction for Bank Customer

Methods used

Fraud Detection for Bank Transactions

Methods used

Heart-stroke Prediction

Methods Used

Tree Coverage Types (Multi-class)

Methods used

Regression

House Price Prediction

Methods used

Used Car Price Prediction

Methods Used

Clustering

Customer Segmentation

Methods used

Military Power

Methods used

Other

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages