git clone https://github.com/alicekwn/occ.git
cd occ
chmod +x setup_repo.sh
./setup_repo.sh
. venv/bin/activate
or follow the step-by-step instructions below between the two horizontal rules:
- MacOS / Linux
python3 -m venv venv- Windows
python -m venv venv- MacOS / Linux
. venv/bin/activate- Windows (in Command Prompt, NOT Powershell)
venv\Scripts\activate.batpip install toml
pip install -e ".[dev]"Plot its PMF and its CLT approximation of P(union=X) and P(intersection=Y)
python scripts/plot_comb_clt_univariate.py
Plot the heatmap for different combinations of
python scripts/plot_comb_heatmap.py
Plot the bivariate distribution of P(union=X, intersection=Y)
python scripts/plot_comb_bivariate.py
Plot the PMF of Jaccard Index, together the CLT approx.
python scripts/plot_comb_clt_jaccard.py
Note that the combinatorial result is the recursion equations derived using combinatorics.
Test for the 3 distributions:
- univariate
- bivariate
- jaccard index
pytest tests/test_comb_univariate.py tests/test_comb_bivariate.py tests/test_comb_jaccard.py
Test whether the marginal probabilities of bivariate distribution adds up
pytest tests/test_comb_bivariate_marginal.py
Test whether the mean and variance match:
pytest tests/test_clt_comb_mean_var.py
Result: test passed, even at edge cases
Test whether the pmf match (after discretising the CLT normal distribution):
pytest tests/test_clt_comb_pmf.py
Result: test failed when at least one shard is too small or too big relative to the total number (edge).
Each plot overlays the results of a Monte Carlo simulation.
python scripts/plot_clt_sim_degree_vector.py
For univariate weighted projection with weights either
python plot_clt_sim_edge_cases.py
Otherwise, when weights aren't $0$s or $1$s, poisson distribution is better.
Test whether the degree-count vector's probability is aligning with simulation.
pytest tests/test_clt_degree_prob.py
Test whether the CLT approximation for univariate weighted projection is aligning with simulation.
pytest tests/test_clt_univariate_projection.py