A set of Data analysis tools in pYTHON 3.x.
Clone this repository to your local machine anr run pip:
git clone https://github.com/shakedzy/dython.git
cd dython
pip install -e .
Dependencies: numpy, pandas, seaborn, scipy, matplotlib, sklearn
A set of functions to explore nominal (categorical) datasets and mixed (nominal and continuous) data-sets.
Coefficients and statistics:
- Conditional entropy (
conditional_entropy) - Cramer's V (
cramers_v) - Theil's U (
theils_u) - Correlation ratio (
correlation_ratio)
Additional functions:
associations: Calculate correlation/strength-of-association of a data-setnumerical_encoding: Encode a mixed data-set to a numerical data-set (one-hot encoding)
A set of functions to gain more information over a model's performance.
roc_graph: compute and plot a ROC graph (and AUC score) for a model's predictionsrandom_forest_feature_importance: plot the feature importance of a trained sklearnRandomForestClassifier
See the examples.py module for roc_graph and associations examples.
Apache License 2.0