I've tried the metric with high quality 1080p AVC versions of the same video.
The subjective score of the variants are relatively high and similar but still discriminable, well above 1JND but
the metric reports very small differences (i.e. 4.15 vs 4.1) in the output predictions.
Following your suggestions I'm looking mainly at "compression" but it is not able to discriminate an accurately encoded version at 1080@7Mbps from a fast encoded version @5mbps, this means that the metric is not good for this purpose ?
In the code I see that you scale down the source with ffmpeg to 1280x720, this can prevent the algorithm from catching
differences at 1080p details levels ?