Skip to content

Conversation

@Rogdham
Copy link
Owner

@Rogdham Rogdham commented Apr 27, 2025

Deprecated compress_stream and decompress_stream:

  • Use @deprecated decorator to mark them as deprecated
  • Update tests accordingly
  • Update documentation accordingly
  • Add documentation page for migration

📚 documentation preview 📚

@Rogdham Rogdham force-pushed the deprecate_stream_functions branch 4 times, most recently from 15bdf68 to ff9e07a Compare May 1, 2025 16:49
@pscheri
Copy link

pscheri commented May 1, 2025

Hi!
This is a very nice feature that I'm currently using, what's the workaround plan for this? To manually define the io stream?
Thanks!

@Rogdham
Copy link
Owner Author

Rogdham commented May 1, 2025

Hello @pscheri and thank you for chiming in.

I wrote a migration page that illustrate how to migrate to shutil.copyfileobj.

I would be really interested in knowing if you have a usecase that is not cover by that, or if you have any feedback.

Thanks!

@pscheri
Copy link

pscheri commented May 1, 2025

my use case is quite simple, but on top of those, I'm using the buffering size options:

with open(tar_path, 'rb') as f_in:
	with open(output_path, 'wb') as f_out:
		pyzstd.compress_stream(
			f_in,
			f_out,
			level_or_option=option,
			read_size=READ_BUFFER_SIZE,  # Dynamically calculated based on available memory
			write_size=WRITE_BUFFER_SIZE,  # Dynamically calculated based on available memory
		)

I can probably pass some of those somewhere else

@Rogdham
Copy link
Owner Author

Rogdham commented May 1, 2025

Thanks for the code example. I took a random 7G archive, and performed some timing checks. I used a level of 3.

For reference, here is the code with shutil.copyfileobj:

with open(tar_path, 'rb') as f_in:
    with pyzstd.open(output_path, 'wb', level_or_option=3) as f_out:
        shutil.copyfileobj(f_in, f_out, BUFFER_SIZE)
Case Buffer size Timing
zstd command 38s
compress_stream 1k 49s
compress_stream 10k 42s
compress_stream 100k 41s
compress_stream 1M 36s
compress_stream 10M 37s
compress_stream 100M 41s
compress_stream 1G 42s
shutil.copyfileobj not specified 42s
shutil.copyfileobj 1k 44s
shutil.copyfileobj 10k 51s
shutil.copyfileobj 100k 45s
shutil.copyfileobj 1M 37s
shutil.copyfileobj 10M 39s
shutil.copyfileobj 100M 39s
shutil.copyfileobj 1G 44s

It seems that giving a bigger buffer size is not always the best, and in any case timings are in the 10% range (unless you give a buffer size too small). Also, the proposed alternative of shutil.copyfileobj seems to do the job.

Do you have a specific usecase for specifying the buffer sizes, or was it just for speed purposes? If so, did you make a benchmark on your own?

@Rogdham Rogdham force-pushed the deprecate_stream_functions branch 4 times, most recently from 4259bc3 to 1746868 Compare May 9, 2025 10:46
@Rogdham Rogdham merged commit 1746868 into master May 9, 2025
21 checks passed
@Rogdham Rogdham deleted the deprecate_stream_functions branch May 9, 2025 14:52
@Rogdham
Copy link
Owner Author

Rogdham commented May 9, 2025

The documentation has been updated to mention the length param of shutil.copyfileobj. Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants