Skip to content

Comments

Add ZIP and TAR format support, fix StringIO bug, and improve thread safety#6

Open
jkorany wants to merge 1 commit intomainfrom
multi-format-support-and-fixes
Open

Add ZIP and TAR format support, fix StringIO bug, and improve thread safety#6
jkorany wants to merge 1 commit intomainfrom
multi-format-support-and-fixes

Conversation

@jkorany
Copy link
Collaborator

@jkorany jkorany commented Feb 9, 2026

Summary

  • Fix StringIO NameError -- open_buffer references StringIO without requiring it, causing a NameError when stringio hasn't been loaded in the calling context (e.g. outside Rails). Added require "stringio" to reader.rb.
  • Wire ZIP and TAR formats through the C++ bridge -- The bridge previously hardcoded the 7z GUID everywhere, so format: SZ_FORMAT_ZIP and format: SZ_FORMAT_TAR silently failed. Refactored Init7zFormatGUID to InitFormatGUID(guid, format) to dynamically set the correct GUID byte for each format (7z = 0x07, ZIP = 0x01, TAR = 0xEE).
  • Vendor Zip/Tar SDK handlers -- Copied Archive/Zip/, Archive/Tar/, and all their compression/crypto dependencies (Deflate, BZip2, WzAes, ZipCrypto, Zstd, Ppmd, etc.) from the 7-Zip SDK v25.01. Includes BitlDecoder.cpp (x86_64 reverse-bits table) and Synchronization.cpp (POSIX WFMO vtables). Updated extconf.rb with the new include directories, source files, and -lpthread on Linux.
  • Remove the C API code path -- Unified all formats (including unencrypted 7z) on the C++ COM API. This eliminates a TOCTOU vulnerability flagged by DryRun (where extractToMemory re-opened archives by path), removes ~370 lines of dual-path branching from cpp_bridge.cpp, and simplifies Impl from 7 fields to 1.
  • Bump archive bomb ratio limit from 1,000:1 to 10,000:1 -- The C++ API now correctly reports compressed_size for all formats (the old C API returned 0 for 7z). LZMA2 legitimately achieves 1,000-5,000:1 on repetitive data, while real archive bombs exceed 1,000,000:1.
  • Update Format.supported? -- Now returns true for ZIP and TAR.

Test plan

  • All 112 existing tests pass (bundle exec rspec)
  • New spec/archive/multi_format_spec.rb covers:
    • ZIP: open, extract_data, extract_all
    • TAR: open, extract_data, extract_all
    • Encrypted ZIP: open with password, extract_data
    • open_buffer with String and StringIO for both ZIP and 7z
    • 7z regression (open, extract_data, extract_all still work)
    • Error handling for unsupported formats
  • DryRun TOCTOU finding resolved (no more re-opening by path)
  • Archive bomb ratio limit verified against LZMA2 high-compression edge cases

@jkorany jkorany force-pushed the multi-format-support-and-fixes branch 2 times, most recently from 1de83c0 to 7bb4781 Compare February 9, 2026 21:50
- Fix NameError when open_buffer is called without stringio loaded by
  adding `require "stringio"` to reader.rb
- Wire ZIP and TAR format GUIDs through the C++ bridge so that
  format: SZ_FORMAT_ZIP and format: SZ_FORMAT_TAR actually work
  end-to-end (previously hardcoded to 7z GUID only)
- Vendor the Zip and Tar archive handlers plus their compression and
  crypto dependencies from the 7-Zip SDK (v25.01), including
  BitlDecoder.cpp and Synchronization.cpp for Linux compatibility
- Link -lpthread on Linux for POSIX synchronization primitives
- Remove the C API code path entirely, unifying all formats on the
  C++ COM API. This eliminates a TOCTOU vulnerability where
  extractToMemory re-opened archives by path, simplifies every
  ArchiveReader method by removing dual-path branching, and shrinks
  cpp_bridge.cpp by ~370 lines
- Bump archive bomb compression ratio limit from 1000:1 to 10000:1
  to accommodate legitimate LZMA2 ratios now that the C++ API
  correctly reports compressed_size (the old C API returned 0)
- Update Format.supported? to return true for ZIP and TAR
- Add comprehensive multi-format RSpec tests (ZIP, TAR, encrypted ZIP,
  open_buffer with String/StringIO, 7z regression)
@jkorany jkorany force-pushed the multi-format-support-and-fixes branch from 7bb4781 to 7e0498c Compare February 9, 2026 22:08
@@ -0,0 +1,105 @@
/* Blake2.h -- BLAKE2sp Hash
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory contains lzma_sdk files that were added for Zip support - ignore the source changes in lzma_sdk (unless you want to learn more about 7zip's SDK)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant