How to assert agreement using statistical indices ?
This repo implements some indices used in statistical agreement such as total deviation index (TDI) and coverage probability (CP).
Statistical Agreement is an ensemble of processes to declare (or not) if two (or more) measurement methods lead to the same results.
Currently, only implementations for basic continuous or categorical models are planned.
pip install statisticalagreementYou can find examples in the example folder.
Here is an example of CCC usage with Gaussian simulated data:
from scipy. stats import multivariate_normal
import numpy as np
import statisticalagreement as sa
import seaborn as sns
import matplotlib.pyplot as plt
mean=np.array([-np.sqrt(0.1)/2, np.sqrt(0.1)/2])
cov=np.array([[1.1**2, 0.95*1.1*0.9], [0.95*1.1*0.9, 0.9**2]])
xy = multivariate_normal.rvs(mean=mean, cov=cov, size=100)
x = xy[:, 0]
y = xy[:, 1]
ax = sns.histplot(x - y)
ax.set(xlabel="Difference of methods")
plt.show()
# Return approximate estimate of CCC
# with a alpha risk of 5%
# and an allowance of whithin sample deviation of 10%.
ccc = sa.ccc(x, y, method="approx", alpha=0.05, allowance=0.10)
print(f"Approximate estimate of CCC: {ccc.estimate:.4f}\n\
Lower confident interval of the estimate with confident level of 95%: {ccc.limit:.4f}\n")Approximate estimate of CCC: 0.8943
Lower confident interval of the estimate with confident level of 95%: 0.8625
Since allowance > limit, then there is no allowance by criterion defined by the user.
The distribution of the difference of methods can be displayed for visual analysis.

Running the main.py with the argument -e will display the examples.
For each index listed in the following table:
- naive designes an implemetation using a parametric hypothesis (like a normal hypothesis), and thus only accurate if the hypothesis is true.
- robust designes an implemetation not depending of any kind of hypothesis.
- tested indicates if the implementation of the said index is tested with a monte-carlo test and results are correct in regards of the scientific literature.
- bootstrap indicates if an alternative way to compute confident interval using a resample method is implemented.
- unified model indicates if there is an implementation for models using continuous and categorical data (for instance with multiple raters and/or readings) - not planned currently
| Index | Naive | Tested | Robust | Tested | Bootstrap | Unified model |
|---|---|---|---|---|---|---|
| MSD | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ |
| TDI | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ |
| CP | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ |
| Accuracy | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ |
| Precision | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ |
| CCC | ✔️ | ✔️ | WIP | ❌ | ❌ | ❌ |
| Kappa | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ |
| Weighted Kappa | ✔️1 | ✔️ | ❌ | ❌ | ❌ | ❌ |
Unit tests for all indeces are written. Estimate tests can be launched using the "not stochastic" marker:
pytest -v -m "not stochastic"Variance tests rely on MonteCarlo simulation and are quite slow. Tehy can be launched using the "stochastic" marker:
pytest -v -m "stochastic"Tests that match results from the scientific literature and also implemented. Currently tests of MonteCarlo simulations can be display running main.py with the -s i argument where i is the index simulated.
Currently only msd and ccc tests are implemented. One can compare msd simulation results with \cite{LIN2000} and ccc one with \cite{LIN1989}.
main.py -s msd
main.py -s cccBibtex is available here.
For VS Code users on Windows, using a venv to run the script can be prohibited due to ExecutionPolicy.
Get-ExecutionPolicy -List
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachineFootnotes
-
Absolute and Squared Weighted Kappa ↩