Better visualisations for entropies by XachaB · Pull Request #74 · XachaB/Qumin

XachaB · 2025-03-05T17:19:05Z

This adds zones of interpredictibility to the heatmap computation (they are also heatmaps), as well as heatmaps of the resulting distillations.

eg:

qumin action=ent_heatmap data=source/vlexique/vlexique.package.json entropy.importFile=results/metrics/entropies.csv heatmap.order="[Mode,Tense,Number,Person,Gender]" heatmap.cols="[Mode,Tense,Gender]"

Produces:

This introduces a heatmap.cols keyword to know which features to show in columns (the others go to rows). If not given, all cells are given in rows.

…n heatmap

XachaB · 2025-03-05T17:20:04Z

I think this is an improvement because:

I am fed-up with re-implementing this for each dataset, I want a generic solution.
it is really nice to get an overview of the results (and whether something looks odd)
The distillation is much easier to read than the full heatmap

julesbouton · 2025-03-06T10:51:17Z

I didn't review the full code yet, but:

This fails (maybe because everything is interpredictable in non-overabundant settings):

qumin action=H data=./tests/data/TestPackage/test.package.json

Do you properly exclude n-predictor computation results ?

I will review the rest of the code this afternoon.

XachaB · 2025-03-06T10:52:57Z

Thanks, I'll fix those

…ation if multiple cells in it

julesbouton

Seems fine to me!

src/qumin/entropy_heatmap.py

XachaB · 2025-03-06T14:48:37Z

I added colorbars like so for frequencies (when available):

They represent only cell frequencies, but may help provide some help in interpretation.

julesbouton · 2025-03-06T15:03:24Z

Wow, nice! Is there a way to disable them easily if a user wants a simple output? And do they scale well for npreds heatmaps?

julesbouton · 2025-03-06T15:05:15Z

Actually, I did also rewrite part of the heatmap code in the probability of success branch (#67), and I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful.

XachaB · 2025-03-06T16:54:21Z

I didn't do n-preds, but it would be doable (with n columns for frequency for example ?)

I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful.

That sounds good, can you add them to this PR ?

XachaB · 2025-03-06T16:55:34Z

I haven't added an option to disable printing of the frequencies: do we want that ? I would like to avoid proliferation of fine grained options. I can see the case for: we do a good enough figure with all the info we have, then it's up to the user to write their own code if they want finer control. Of course, you can argue the opposite, and perhaps we'll fall in agreement ;)

XachaB · 2025-03-06T17:00:58Z

Note: if the weighs we calculate for pairs of cells were given like entropies (in the measure/value columns), then we'd get a heatmap of probabilistic weights for free. Wouldn't that be helpful too ?

julesbouton · 2025-03-06T17:23:32Z

True, and the are already given when setting token_freq.cells=True ! Le 6 mars 2025 18:01:20 GMT+01:00, Sacha Beniamine ***@***.***> a écrit :

…

XachaB left a comment (XachaB/Qumin#74) Note: if the weighs we calculate for pairs of cells were given like entropies (in the measure/value columns), then we'd get a heatmap of probabilistic weights for free. Wouldn't that be helpful too ? -- Reply to this email directly or view it on GitHub: #74 (comment) You are receiving this because your review was requested. Message ID: ***@***.***>

julesbouton · 2025-03-06T17:24:45Z

Yes of course! As soon as possible, but the end of the week is intense! Le 6 mars 2025 17:54:43 GMT+01:00, Sacha Beniamine ***@***.***> a écrit :

…

XachaB left a comment (XachaB/Qumin#74) I didn't do n-preds, but it would be doable (with n columns for frequency for example ?) > I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful. That sounds good, can you add them to this PR ? -- Reply to this email directly or view it on GitHub: #74 (comment) You are receiving this because your review was requested. Message ID: ***@***.***>

julesbouton · 2025-03-06T17:29:16Z

OK, I agree. I think we should just be able to easily produce a figure containing only the heatmap for 1 pred, which is typically what you want to share (that's the reason behind the options to disable the debug and the n_preds). This is even more striking when you add a probability of success and the debug of the probability of success: too many plots on the same figure. Le 6 mars 2025 17:55:56 GMT+01:00, Sacha Beniamine ***@***.***> a écrit :

…

XachaB left a comment (XachaB/Qumin#74) I haven't added an option to disable printing of the frequencies: do we want that ? I would like to avoid proliferation of fine grained options. I can see the case for: we do a good enough figure with all the info we have, then it's up to the user to write their own code if they want finer control. Of course, you can argue the opposite, and perhaps we'll fall in agreement ;) -- Reply to this email directly or view it on GitHub: #74 (comment) You are receiving this because your review was requested. Message ID: ***@***.***>

XachaB · 2025-03-07T14:06:06Z

Ok, ok, I'll add the options.

Yes of course! As soon as possible, but the end of the week is intense!

No rush of course !

… heatmap formatting. Note: sets vmin to 0 for heatmaps, including with number of pairs. The previous plot was treacherous as it used the full gradient for whatever variation we had: light colors looked like we had little data, when it could still mean thousands of pairs. This will be more interpretable at a glance (although it hides fine variation in number of pairs)

This implementation easily scales to new metrics (as the probability of success) and to debug computations. It will also significantly shorten PR #67

julesbouton · 2025-03-10T11:19:19Z

I reorganised the code for the heatmaps to make it more readable. It it also a bit more versatile, although we do not use its full potential yet. For instance, it should be possible to easily add new metrics / debug computations.

Next step: add the frequencies for n_preds. Should be fairly easy.

…encies to heatmaps

# Conflicts: # sphinx/changelog.rst # src/qumin/calc_paradigm_entropy.py # src/qumin/config/qumin.yaml # src/qumin/entropy_heatmap.py

julesbouton · 2025-03-11T16:23:15Z

Currently merging this: something is broken in dev. I'm trying to fix it.

# Conflicts: # src/qumin/representations/paradigms.py

julesbouton · 2025-03-11T16:59:36Z

In my latest commit, I :

I added an export of both the predictors probability and the predictor-target probability in all entropy runs.
I use this export for the frequencies, so that we have a probability for n-predictors combination.
We could also add the heatmap of the pair probabilities that you mentionned. But is it really useful if we already have the frequencies for the predictors and the targets on the side of the heatmap?

The new entropies file looks like this:

predictor,predicted,measure,value,n_pairs,n_preds,dataset,pair_probability,pred_probability,probability_source
second,third,cond_entropy,0.0,2,1,test,0.11219602550871902,0.46187363834422657,tokens
third,second,cond_entropy,0.0,2,1,test,0.06945468245777843,0.13071895424836602,tokens

sphinx/changelog.rst

XachaB · 2025-03-11T18:16:40Z

I think we can merge now, before it becomes its own monster. This meets the initial specs plus some :)

Sacha Beniamine added 2 commits March 5, 2025 15:35

rename entropy_heatmap to entropy_analysis

31a309e

Back to ent_heatmap. Adds to ent_heatmap: zones graphics, distillatio…

6d4692e

…n heatmap

XachaB requested a review from julesbouton March 5, 2025 17:20

XachaB mentioned this pull request Mar 5, 2025

Roadmap for version 3.0.0 #66

Closed

12 tasks

few fixes, deprecated applymap, PEP conformity

9ca6d6f

Sacha Beniamine added 2 commits March 6, 2025 10:58

Bugfixes: hydra config defaults to False, not None. Only show distill…

0c6a935

…ation if multiple cells in it

Limit zones to results with 1 pred

bd89695

julesbouton reviewed Mar 6, 2025

View reviewed changes

src/qumin/entropy_heatmap.py Show resolved Hide resolved

Adding frequency colorbars when available

b028b6c

XachaB added the enhancement New feature or request label Mar 7, 2025

clean up code, add more restrictive export options

a6716ca

This implementation easily scales to new metrics (as the probability of success) and to debug computations. It will also significantly shorten PR #67

extract and save pair and predictor probabilities + add n_preds frequ…

0270a2f

…encies to heatmaps

XachaB changed the title ~~Zones of inter-predictibility~~ Better visualisations for entropies Mar 11, 2025

Merge branch 'dev' into zones

e266ce4

# Conflicts: # sphinx/changelog.rst # src/qumin/calc_paradigm_entropy.py # src/qumin/config/qumin.yaml # src/qumin/entropy_heatmap.py

Merge branch 'dev' into zones

f374fd8

# Conflicts: # src/qumin/representations/paradigms.py

XachaB commented Mar 11, 2025

View reviewed changes

sphinx/changelog.rst Show resolved Hide resolved

Sacha Beniamine added 2 commits March 11, 2025 19:09

Merge branch 'dev' into zones

d31eb66

Update readme

cb1f0b9

XachaB merged commit cc5cd62 into dev Mar 11, 2025
6 checks passed

julesbouton deleted the zones branch March 12, 2025 06:11

Conversation

XachaB commented Mar 5, 2025

Uh oh!

XachaB commented Mar 5, 2025

Uh oh!

julesbouton commented Mar 6, 2025

Uh oh!

XachaB commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julesbouton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

XachaB commented Mar 6, 2025

Uh oh!

julesbouton commented Mar 6, 2025

Uh oh!

julesbouton commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XachaB commented Mar 6, 2025

Uh oh!

XachaB commented Mar 6, 2025

Uh oh!

XachaB commented Mar 6, 2025

Uh oh!

julesbouton commented Mar 6, 2025 via email

Uh oh!

julesbouton commented Mar 6, 2025 via email

Uh oh!

julesbouton commented Mar 6, 2025 via email

Uh oh!

XachaB commented Mar 7, 2025

Uh oh!

julesbouton commented Mar 10, 2025

Uh oh!

julesbouton commented Mar 11, 2025

Uh oh!

julesbouton commented Mar 11, 2025

Uh oh!

Uh oh!

XachaB commented Mar 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XachaB commented Mar 6, 2025 •

edited

Loading

julesbouton commented Mar 6, 2025 •

edited

Loading