downloader: CacheKey: include decompression flag#4067
downloader: CacheKey: include decompression flag#4067AkihiroSuda merged 1 commit intolima-vm:masterfrom
Conversation
5e6507b to
ec85797
Compare
| k += "+decomp" | ||
| } | ||
| return k | ||
| } |
There was a problem hiding this comment.
(On second thought this might be rather confusing when the original data is not compressed)
ec85797 to
5868ae8
Compare
|
ping @lima-vm/maintainers This PR has been open for more than 4 months |
The caches are now created separately for compressed and decompressed contents. When a decompressed content is cached and a compressed content is requested, the downloader now correctly returns the compressed content. - URLSHA/data: unmodified original data - URLSHA/sha256.digest: digest of the original data - URLSHA+decomp/data: decompressed data - URLSHA+decomp/sha256.digest: digest of the *original* (i.e., compressed) data Caching a decompressed content does not automatically cache the original compressed content. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
5868ae8 to
db72962
Compare
| return fmt.Sprintf("%x", sha256.Sum256([]byte(remote))) | ||
| func CacheKey(remote string, decompress bool) string { | ||
| k := fmt.Sprintf("%x", sha256.Sum256([]byte(remote))) | ||
| if decompress && decompressor(remote) != "" { |
There was a problem hiding this comment.
I'm sorry, I never got around to reviewing this before it got merged, but I don't see how this can work: decompressor() expects a file extension, but you pass the whole URL, so decompressor(remote) will always return "" and you will never append +decomp to the cache key.
There was a problem hiding this comment.
Ideally there should be BATS tests for the downloader functionality, as it has been getting quite complex.
There was a problem hiding this comment.
Even a unit test comparing CacheKey("https://example.com/image.qcow2.gz", true) with CacheKey("https://example.com/image.qcow2.gz", false) would detect this specific failure.
There was a problem hiding this comment.
Opened a PR to fix it:
|
Maybe I'm too tired now, but I don't understand how this PR works. As far as I can tell So it looks like if you have both a compressed and an uncompressed entry, they still store the same data, taking twice the space. And the decompression still happens every time I'll take another look tomorrow, in case I managed to thoroughly confuse myself right now. |
You are right. Not sure how I misunderstood this 5 months ago. 🤦 |
The caches are now created separately for compressed and decompressed contents. When a decompressed content is cached and a compressed content is requested, the downloader now correctly returns the compressed content.
Caching a decompressed content does not automatically cache the original compressed content.