Skip to content

downloader: CacheKey: include decompression flag#4067

Merged
AkihiroSuda merged 1 commit intolima-vm:masterfrom
AkihiroSuda:fix-cachekey-decomp
Feb 13, 2026
Merged

downloader: CacheKey: include decompression flag#4067
AkihiroSuda merged 1 commit intolima-vm:masterfrom
AkihiroSuda:fix-cachekey-decomp

Conversation

@AkihiroSuda
Copy link
Member

The caches are now created separately for compressed and decompressed contents. When a decompressed content is cached and a compressed content is requested, the downloader now correctly returns the compressed content.

  • URLSHA/data: unmodified original data
  • URLSHA/sha256.digest: digest of the original data
  • URLSHA+decomp/data: decompressed data
  • URLSHA+decomp/sha256.digest: digest of the original (i.e., compressed) data

Caching a decompressed content does not automatically cache the original compressed content.

@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Sep 22, 2025
@AkihiroSuda AkihiroSuda marked this pull request as draft September 22, 2025 16:32
k += "+decomp"
}
return k
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(On second thought this might be rather confusing when the original data is not compressed)

@AkihiroSuda AkihiroSuda requested review from a team and jandubois September 24, 2025 00:12
@AkihiroSuda AkihiroSuda marked this pull request as ready for review October 1, 2025 09:14
@AkihiroSuda AkihiroSuda requested a review from a team October 9, 2025 07:30
@AkihiroSuda AkihiroSuda modified the milestones: v2.0.0, v2.1.0 (?) Oct 16, 2025
@AkihiroSuda
Copy link
Member Author

ping @lima-vm/maintainers This PR has been open for more than 4 months

The caches are now created separately for compressed and decompressed contents.
When a decompressed content is cached and a compressed content is requested,
the downloader now correctly returns the compressed content.

- URLSHA/data:                 unmodified original data
- URLSHA/sha256.digest:        digest of the original data
- URLSHA+decomp/data:          decompressed data
- URLSHA+decomp/sha256.digest: digest of the *original* (i.e., compressed) data

Caching a decompressed content does not automatically cache the original compressed content.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Copy link
Contributor

@unsuman unsuman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@AkihiroSuda AkihiroSuda merged commit d8610f1 into lima-vm:master Feb 13, 2026
37 checks passed
return fmt.Sprintf("%x", sha256.Sum256([]byte(remote)))
func CacheKey(remote string, decompress bool) string {
k := fmt.Sprintf("%x", sha256.Sum256([]byte(remote)))
if decompress && decompressor(remote) != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I never got around to reviewing this before it got merged, but I don't see how this can work: decompressor() expects a file extension, but you pass the whole URL, so decompressor(remote) will always return "" and you will never append +decomp to the cache key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there should be BATS tests for the downloader functionality, as it has been getting quite complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even a unit test comparing CacheKey("https://example.com/image.qcow2.gz", true) with CacheKey("https://example.com/image.qcow2.gz", false) would detect this specific failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jandubois
Copy link
Member

jandubois commented Feb 14, 2026

Maybe I'm too tired now, but I don't understand how this PR works.

As far as I can tell fetch() will always store the raw compressed data, regardless of the value of the decompression flag.

So it looks like if you have both a compressed and an uncompressed entry, they still store the same data, taking twice the space. And the decompression still happens every time copyLocal() copies it into an instance directory.

I'll take another look tomorrow, in case I managed to thoroughly confuse myself right now.

@AkihiroSuda
Copy link
Member Author

As far as I can tell fetch() will always store the raw compressed data, regardless of the value of the decompression flag.

You are right. Not sure how I misunderstood this 5 months ago. 🤦
Let me revert this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reverted Merged but reverted later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants