Skip to content

Better visualisations for entropies#74

Merged
XachaB merged 13 commits intodevfrom
zones
Mar 11, 2025
Merged

Better visualisations for entropies#74
XachaB merged 13 commits intodevfrom
zones

Conversation

@XachaB
Copy link
Owner

@XachaB XachaB commented Mar 5, 2025

This adds zones of interpredictibility to the heatmap computation (they are also heatmaps), as well as heatmaps of the resulting distillations.

eg:

qumin action=ent_heatmap data=source/vlexique/vlexique.package.json entropy.importFile=results/metrics/entropies.csv heatmap.order="[Mode,Tense,Number,Person,Gender]" heatmap.cols="[Mode,Tense,Gender]"

Produces:

entropyHeatmap
zonesTable
entropyHeatmap_distillation

This introduces a heatmap.cols keyword to know which features to show in columns (the others go to rows). If not given, all cells are given in rows.

@XachaB
Copy link
Owner Author

XachaB commented Mar 5, 2025

I think this is an improvement because:

  • I am fed-up with re-implementing this for each dataset, I want a generic solution.
  • it is really nice to get an overview of the results (and whether something looks odd)
  • The distillation is much easier to read than the full heatmap

@XachaB XachaB requested a review from julesbouton March 5, 2025 17:20
@XachaB XachaB mentioned this pull request Mar 5, 2025
12 tasks
@julesbouton
Copy link
Collaborator

I didn't review the full code yet, but:

  • This fails (maybe because everything is interpredictable in non-overabundant settings):
qumin action=H data=./tests/data/TestPackage/test.package.json
  • Do you properly exclude n-predictor computation results ?

I will review the rest of the code this afternoon.

@XachaB
Copy link
Owner Author

XachaB commented Mar 6, 2025

Thanks, I'll fix those

Copy link
Collaborator

@julesbouton julesbouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me!

@XachaB
Copy link
Owner Author

XachaB commented Mar 6, 2025

I added colorbars like so for frequencies (when available):

entropyHeatmap
Uploading entropyHeatmap_distillation.png…

They represent only cell frequencies, but may help provide some help in interpretation.

@julesbouton
Copy link
Collaborator

Wow, nice! Is there a way to disable them easily if a user wants a simple output? And do they scale well for npreds heatmaps?

@julesbouton
Copy link
Collaborator

julesbouton commented Mar 6, 2025

Actually, I did also rewrite part of the heatmap code in the probability of success branch (#67), and I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful.

@XachaB
Copy link
Owner Author

XachaB commented Mar 6, 2025

I didn't do n-preds, but it would be doable (with n columns for frequency for example ?)

I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful.

That sounds good, can you add them to this PR ?

@XachaB
Copy link
Owner Author

XachaB commented Mar 6, 2025

I haven't added an option to disable printing of the frequencies: do we want that ? I would like to avoid proliferation of fine grained options. I can see the case for: we do a good enough figure with all the info we have, then it's up to the user to write their own code if they want finer control. Of course, you can argue the opposite, and perhaps we'll fall in agreement ;)

@XachaB
Copy link
Owner Author

XachaB commented Mar 6, 2025

Note: if the weighs we calculate for pairs of cells were given like entropies (in the measure/value columns), then we'd get a heatmap of probabilistic weights for free. Wouldn't that be helpful too ?

@julesbouton
Copy link
Collaborator

julesbouton commented Mar 6, 2025 via email

@julesbouton
Copy link
Collaborator

julesbouton commented Mar 6, 2025 via email

@julesbouton
Copy link
Collaborator

julesbouton commented Mar 6, 2025 via email

@XachaB
Copy link
Owner Author

XachaB commented Mar 7, 2025

Ok, ok, I'll add the options.

Yes of course! As soon as possible, but the end of the week is intense!

No rush of course !

… heatmap formatting.

Note: sets vmin to 0 for heatmaps, including with number of pairs. The previous plot was treacherous as it used the full gradient for whatever variation we had: light colors looked like we had little data, when it could still mean thousands of pairs. This will be more interpretable at a glance (although it hides fine variation in number of pairs)
@XachaB XachaB added the enhancement New feature or request label Mar 7, 2025
This implementation easily scales to new metrics (as the probability of success) and to debug computations. It will also significantly shorten PR #67
@julesbouton
Copy link
Collaborator

I reorganised the code for the heatmaps to make it more readable. It it also a bit more versatile, although we do not use its full potential yet. For instance, it should be possible to easily add new metrics / debug computations.

Next step: add the frequencies for n_preds. Should be fairly easy.

@XachaB XachaB changed the title Zones of inter-predictibility Better visualisations for entropies Mar 11, 2025
# Conflicts:
#       sphinx/changelog.rst
#       src/qumin/calc_paradigm_entropy.py
#       src/qumin/config/qumin.yaml
#       src/qumin/entropy_heatmap.py
@julesbouton
Copy link
Collaborator

Currently merging this: something is broken in dev. I'm trying to fix it.

# Conflicts:
#       src/qumin/representations/paradigms.py
@julesbouton
Copy link
Collaborator

In my latest commit, I :

  • I added an export of both the predictors probability and the predictor-target probability in all entropy runs.
  • I use this export for the frequencies, so that we have a probability for n-predictors combination.
  • We could also add the heatmap of the pair probabilities that you mentionned. But is it really useful if we already have the frequencies for the predictors and the targets on the side of the heatmap?

The new entropies file looks like this:

predictor,predicted,measure,value,n_pairs,n_preds,dataset,pair_probability,pred_probability,probability_source
second,third,cond_entropy,0.0,2,1,test,0.11219602550871902,0.46187363834422657,tokens
third,second,cond_entropy,0.0,2,1,test,0.06945468245777843,0.13071895424836602,tokens

@XachaB
Copy link
Owner Author

XachaB commented Mar 11, 2025

I think we can merge now, before it becomes its own monster. This meets the initial specs plus some :)

@XachaB XachaB merged commit cc5cd62 into dev Mar 11, 2025
6 checks passed
@julesbouton julesbouton deleted the zones branch March 12, 2025 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants