Skip to content

Use zip32 if possible#7972

Merged
MorrisJobke merged 7 commits intomasterfrom
fix_7782
Apr 9, 2018
Merged

Use zip32 if possible#7972
MorrisJobke merged 7 commits intomasterfrom
fix_7782

Conversation

@rullzer
Copy link
Member

@rullzer rullzer commented Jan 20, 2018

fixes #7782

  • OSX doesn't handle 64zip that well
  • Some other implentations don't handle it perfectly either
  • If the file is belog 4GiB (some overhead) => zip32
  • This covers the 99% case I bet

In some future work we should really move this to some decent Node based API. All this static stuff is making me cry 😭

Requires:

@codecov
Copy link

codecov bot commented Jan 20, 2018

Codecov Report

Merging #7972 into master will increase coverage by 0.06%.
The diff coverage is 0%.

@@             Coverage Diff              @@
##             master    #7972      +/-   ##
============================================
+ Coverage     51.88%   51.95%   +0.06%     
- Complexity    25297    25305       +8     
============================================
  Files          1603     1603              
  Lines         95060    95081      +21     
  Branches       1388     1388              
============================================
+ Hits          49325    49396      +71     
+ Misses        45735    45685      -50
Impacted Files Coverage Δ Complexity Δ
lib/private/Streamer.php 0% <0%> (ø) 16 <4> (+2) ⬆️
lib/private/legacy/files.php 13.09% <0%> (-1.13%) 78 <3> (+6)
lib/private/Files/Cache/Propagator.php 94.93% <0%> (-1.27%) 16% <0%> (ø)
lib/private/Files/ObjectStore/SwiftFactory.php 51% <0%> (+51%) 37% <0%> (ø) ⬇️
lib/private/Files/ObjectStore/Swift.php 72.41% <0%> (+72.41%) 9% <0%> (ø) ⬇️

$fileSize = 0;

if (\OC\Files\Filesystem::is_file($path)) {
$fileSize = \OC\Files\Filesystem::filesize($path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use getFileInfo instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

if (\OC\Files\Filesystem::is_file($path)) {
$fileSize = \OC\Files\Filesystem::filesize($path);
} elseif (\OC\Files\Filesystem::is_dir($path)) {
$files= \OC\Files\Filesystem::getDirectoryContent($path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with getFileInfo you can just get the directory size in one go without the need to recurse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oeee

if ($getType === self::ZIP_FILES) {
$fileSize = 0;
foreach ($files as $file) {
$fileSize += \OC\Files\Filesystem::getFileInfo($dir . '/' . $file)['size'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

->size() instead of ['size'] for bonus points

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course I want bonus points

@MorrisJobke
Copy link
Member

I tried it on macOS again and it still is not possible to extract the zip with the built-in tools. I also checked that the first if clause is used and this is the case.

@rullzer
Copy link
Member Author

rullzer commented Jan 22, 2018

:S stupid OSX. No clue then

Copy link
Member

@danxuliu danxuliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes possible to open downloaded zip files with KDE Archive KIOSlave (that is, directly in a file explorer or in an Open file dialog) :-D

I have taken a look to the Zip specification and it seems that it is safe to use zip32 for a basic (without compression, volumes nor encryption) zip file when all the following conditions are met (do not quote me on this, though ;-) ):

  • No file size is larger than 4 bytes (file size < 4294967296); see 4.4.9 uncompressed size
  • The size of all files plus their local headers is not larger than 4 bytes; see 4.4.16 relative offset of local header and 4.4.24 offset of start of central directory with respect to the starting disk number
  • The total number of entries (files and directories) in the zip file is not larger than 2 bytes (number of entries < 65536); see 4.4.22 total number of entries in the central dir
  • The size of the central directory (similar to the local headers for each file) is not larger than 4 bytes; see 4.4.23 size of the central directory

The first two are already covered in this pull request. The fourth one is mostly covered by the margin between 4x1000^3 and 4x1024^3; if I am not mistaken it is still technically possible to create a zip file from files smaller than 4GB with a central directory larger than 4GiB (which would be broken with zip32), but it should not happen in the real world.

However, it could be good to take into account the third case too (that is, use zip32 only if the size is less than 4GB and the total number of files and directories is less than 65536); it is also an unlikely scenario, yet I guess that this one could enter in the plausible category.

@rullzer
Copy link
Member Author

rullzer commented Jan 26, 2018

@danxuliu thanks for the detailed comment 😄

I agree we should take this all properly into account.
I propose to get this in and abstract further away in #8017

That way we can just keep track of all the files/folders/file handles. And do all kind of calculations. Including the number of files and the size.

@danxuliu
Copy link
Member

danxuliu commented Feb 6, 2018

I have noticed a difference in behaviour from the old code to the new one: if one of the files to be downloaded does not exist, before the zip was downloaded anyway but without that file; now an error is shown to the user. I guess the new behaviour is fine, but I mention it just in case ;-)

I propose to get this in and abstract further away in #8017

I could not wait ;-) so I have addressed my own comment about using zip32 only if the number of files and directories is less than 65536

I have also added integration tests for downloading a zip file. The problem is that the tests for zip32/zip64 boundaries (65535/65536 files) are terribly slow; in my system they take more than two and a half hours... But it is a 8~9-years old machine, so let's see if Drone workers can finish them in a reasonable amount of time; if not they can just be disabled in Drone but kept to be manually run when working on zip related things.

Pending issues (for this pull request or future ones ;-) ):

  • Until the PHPZipStreamer fix for OSX is added the if branches should be reordered to prefer Tar; if this pull request is merged once the OSX fix is added no changes are needed (should the tar branch be removed in that case? Or should it be kept to use Tar instead of zip64 in OSX?)
  • Once the PHPZipStreamer fix for OSX is added the checks in the integration tests could be extended to look for the data descriptor header too
  • Currently if the server runs on a 32 bits system and a zip64 should be generated (because the zip32 constraints are not met) a zip32 file is generated anyway. That file will be broken, so an error telling something like "This Nextcloud server can not generate zip64 files. Please contact your system administrator (or try to download less files)." should be shown instead of generating the zip file.
  • Is the check against 4 * 1000 * 1000 * 1000 safe in 32 bit machines? PHP integers overflow to floats so it is probably fine, but I am not sure. In any case, even if using zip32, can 32 bit machines generate 4GB zip files or are they limited to 2^31 bytes (as there are not unsigned integers in PHP)?
  • The integration tests for zip64 should not run in 32 bit systems. Unfortunately Behat does not support skipping a test by throwing SkippedException or something like that; the support for that needs to be explicitly added. A cheesy alternative would be to throw PendingException to mark the test step as pending and skip the rest of tests in the scenario... :-P In any case, once tests can be skipped it would be possible to add other integration tests that create a 4GB file, but checking beforehand if there is enough space in the file system to do that

@MorrisJobke
Copy link
Member

@danxuliu They fail - maybe rebase to get the latest .drone.yml update as well ;)

@danxuliu
Copy link
Member

danxuliu commented Feb 6, 2018

@MorrisJobke The integration test fails because I naively used ../../data to access the data directory... but of course that does not work if the data directory is elsewhere, like happens when run in Drone ;-) I will see how to fix that and then rebase.

@danxuliu danxuliu force-pushed the fix_7782 branch 2 times, most recently from 533ff1d to 08a785a Compare February 6, 2018 17:35
@danxuliu
Copy link
Member

danxuliu commented Feb 6, 2018

As I suspected they take too long to run, so I have disabled those large integration tests in Drone.

@MorrisJobke
Copy link
Member

@danxuliu @rullzer Just to verify: using this I still can't extract the zip file on macOS 😕

@danxuliu
Copy link
Member

@MorrisJobke That is expected; a patch for PHPZipStreamer is needed, but it has not been added yet ;-)

@MorrisJobke
Copy link
Member

@MorrisJobke That is expected; a patch for PHPZipStreamer is needed, but it has not been added yet ;-)

Problem now is, that his actually breaks the download of multiple files for macOS. Where are those fixes? Could we add them to this PR as well?

@danxuliu danxuliu added 2. developing Work in progress and removed 3. to review Waiting for reviews labels Feb 15, 2018
@danxuliu
Copy link
Member

@MorrisJobke

Problem now is, that his actually breaks the download of multiple files for macOS.

Oops, this pull request should not have been in To review state, sorry :-)

Where are those fixes? Could we add them to this PR as well?

The fixes are here: McNetic/PHPZipStreamer#40 I do not know how @rullzer intended to add them to the server (in this pull request, on a different pull request and then rebase this one on top of it, or other approach).

@rullzer
Copy link
Member Author

rullzer commented Apr 6, 2018

Rebased on master to include #9101

rullzer and others added 7 commits April 6, 2018 15:59
* OSX doesn't handle 64zip that well
* Some other implentations don't handle it perfectly either
* If the file is belog 4GiB (some overhead) => zip32
* This covers the 99% case I bet

Signed-off-by: Roeland Jago Douma <roeland@famdouma.nl>
A zip32 file can contain, at most, 65535 files (and folders), so take
that constraint into account.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
The data directory is not necessarily located at "../..". The proper
directory is now got by running "php console.php config:system:get
datadirectory".

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
"--tags=XXX" limits the features or scenarios to be run to those
matching the tag filter expression.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Large scenarios take too long to run, so they would be cancelled before
they were finished. Therefore, now they are not even run.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
@rullzer rullzer added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Apr 9, 2018
Copy link
Member

@MorrisJobke MorrisJobke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on macOS 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3. to review Waiting for reviews enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't use zip64 for less than 4GB of data

4 participants