Currently we store (for numeric columns):
- Mean X of numeric atts
- Stdev of X of numeric atts
- Quartile {1, 2, 3} of X of numeric atts
- Min of X of numeric atts
- Max of X of numeric atts
Where X = {mean, stdev, kurtosis, skewness}. Something similar for information theoretic measures of nominal atts.
This selection is arbitrary and not well supported in the literature.
Much better would be to store a vector of each value per attribute, giving the possibility to researchers to calculate these values client-side.