Conversation
|
I think this is an improvement because:
|
|
I didn't review the full code yet, but:
I will review the rest of the code this afternoon. |
|
Thanks, I'll fix those |
…ation if multiple cells in it
|
Wow, nice! Is there a way to disable them easily if a user wants a simple output? And do they scale well for npreds heatmaps? |
|
Actually, I did also rewrite part of the heatmap code in the probability of success branch (#67), and I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful. |
|
I didn't do n-preds, but it would be doable (with n columns for frequency for example ?)
That sounds good, can you add them to this PR ? |
|
I haven't added an option to disable printing of the frequencies: do we want that ? I would like to avoid proliferation of fine grained options. I can see the case for: we do a good enough figure with all the info we have, then it's up to the user to write their own code if they want finer control. Of course, you can argue the opposite, and perhaps we'll fall in agreement ;) |
|
Note: if the weighs we calculate for pairs of cells were given like entropies (in the measure/value columns), then we'd get a heatmap of probabilistic weights for free. Wouldn't that be helpful too ? |
|
True, and the are already given when setting token_freq.cells=True !
Le 6 mars 2025 18:01:20 GMT+01:00, Sacha Beniamine ***@***.***> a écrit :
…XachaB left a comment (XachaB/Qumin#74)
Note: if the weighs we calculate for pairs of cells were given like entropies (in the measure/value columns), then we'd get a heatmap of probabilistic weights for free. Wouldn't that be helpful too ?
--
Reply to this email directly or view it on GitHub:
#74 (comment)
You are receiving this because your review was requested.
Message ID: ***@***.***>
|
|
Yes of course! As soon as possible, but the end of the week is intense!
Le 6 mars 2025 17:54:43 GMT+01:00, Sacha Beniamine ***@***.***> a écrit :
…XachaB left a comment (XachaB/Qumin#74)
I didn't do n-preds, but it would be doable (with n columns for frequency for example ?)
> I added a few options, as an easy way to disable the n_pairs graph, etc. Maybe I could already bring back those changes in v3, since they can be useful.
That sounds good, can you add them to this PR ?
--
Reply to this email directly or view it on GitHub:
#74 (comment)
You are receiving this because your review was requested.
Message ID: ***@***.***>
|
|
OK, I agree. I think we should just be able to easily produce a figure containing only the heatmap for 1 pred, which is typically what you want to share (that's the reason behind the options to disable the debug and the n_preds). This is even more striking when you add a probability of success and the debug of the probability of success: too many plots on the same figure.
Le 6 mars 2025 17:55:56 GMT+01:00, Sacha Beniamine ***@***.***> a écrit :
…XachaB left a comment (XachaB/Qumin#74)
I haven't added an option to disable printing of the frequencies: do we want that ? I would like to avoid proliferation of fine grained options. I can see the case for: we do a good enough figure with all the info we have, then it's up to the user to write their own code if they want finer control. Of course, you can argue the opposite, and perhaps we'll fall in agreement ;)
--
Reply to this email directly or view it on GitHub:
#74 (comment)
You are receiving this because your review was requested.
Message ID: ***@***.***>
|
|
Ok, ok, I'll add the options.
No rush of course ! |
… heatmap formatting. Note: sets vmin to 0 for heatmaps, including with number of pairs. The previous plot was treacherous as it used the full gradient for whatever variation we had: light colors looked like we had little data, when it could still mean thousands of pairs. This will be more interpretable at a glance (although it hides fine variation in number of pairs)
This implementation easily scales to new metrics (as the probability of success) and to debug computations. It will also significantly shorten PR #67
|
I reorganised the code for the heatmaps to make it more readable. It it also a bit more versatile, although we do not use its full potential yet. For instance, it should be possible to easily add new metrics / debug computations. Next step: add the frequencies for n_preds. Should be fairly easy. |
…encies to heatmaps
# Conflicts: # sphinx/changelog.rst # src/qumin/calc_paradigm_entropy.py # src/qumin/config/qumin.yaml # src/qumin/entropy_heatmap.py
|
Currently merging this: something is broken in dev. I'm trying to fix it. |
# Conflicts: # src/qumin/representations/paradigms.py
|
In my latest commit, I :
The new entropies file looks like this: |
|
I think we can merge now, before it becomes its own monster. This meets the initial specs plus some :) |

This adds zones of interpredictibility to the heatmap computation (they are also heatmaps), as well as heatmaps of the resulting distillations.
eg:
qumin action=ent_heatmap data=source/vlexique/vlexique.package.json entropy.importFile=results/metrics/entropies.csv heatmap.order="[Mode,Tense,Number,Person,Gender]" heatmap.cols="[Mode,Tense,Gender]"Produces:
This introduces a heatmap.cols keyword to know which features to show in columns (the others go to rows). If not given, all cells are given in rows.