Refactor and optimize archive and manifest handling with benchmarks #2

thesprockee · 2025-12-19T23:01:14Z

Introduce a new idiomatic Go project structure, enhancing organization and documentation. Implement significant performance optimizations in binary encoding/decoding and extraction processes, achieving substantial speed and memory allocation improvements. Add benchmark tests for various operations to facilitate optimization analysis. Introduce a CLI option for decimal filenames and clean up legacy code for better maintainability. Update dependencies and streamline performance analysis by removing obsolete files.

Added benchmarks for: - Zstd compression/decompression with and without context reuse - Compression level comparisons - Buffer allocation strategies - Manifest marshal/unmarshal operations - Archive header operations - Lookup table key strategies

Major changes: - Reorganized code into pkg/archive and pkg/manifest packages - Created clean CLI tool in cmd/evrtools - Used idiomatic Go naming conventions (CamelCase exports) - Added comprehensive documentation and comments - Consolidated duplicate types (removed redundancy between tool/ and evrManifests/) - Added unit tests and benchmarks for new packages - Updated README with library usage examples - Updated Makefile with proper targets Benchmark results show: - Context reuse for ZSTD decompression: ~3.7x faster (6290ns vs 1688ns) - Zero allocations with context reuse - CombinedInt64Key lookup: ~2.7x faster than StructKey Legacy code in tool/ and evrManifests/ retained for backwards compatibility.

Header operations (archive): - Marshal: 136.3 ns → 1.05 ns (130x faster), 3 allocs → 0 allocs - Unmarshal: 134.5 ns → 3.8 ns (35x faster), 2 allocs → 0 allocs Manifest operations: - Marshal: 1,354,843 ns → 122,781 ns (11x faster) - Memory: 3,228,085 B → 729,093 B (4.4x reduction) - Allocs: 9 → 1 (9x reduction) - Unmarshal: 1,345,174 ns → 154,367 ns (8.7x faster) - Memory: 1,474,805 B → 737,286 B (2x reduction) - Allocs: 8 → 3 (2.7x reduction) Changes: - Replaced bytes.Buffer + binary.Write with direct LittleEndian encoding - Pre-calculate and allocate exact buffer sizes - Use inline field encoding instead of reflection-based binary package - Added size constants for all binary structures

Frame content lookup: - LinearScan: 2619 ns → PrebuiltIndex: 7 ns (374x faster) - Build frame index map before extraction loop - Eliminates O(n²) complexity in package extraction String formatting: - fmt.Sprintf: 68.5 ns/op, 1 alloc → strconv.FormatInt: 26.5 ns/op, 0 allocs - Use strconv.FormatInt/FormatUint for hex/decimal conversion - 2.6x faster with no allocations Other optimizations: - Builder.incrementSection: removed loop, use direct arithmetic - Package.Extract: cache created directories to avoid repeated MkdirAll - Added benchmarks for frame index and hex formatting strategies

Performance improvements: - Added EncodeTo/DecodeFrom methods to Header for zero-allocation encoding - Reader now uses embedded headerBuf array instead of allocating - Writer now uses embedded headerBuf array instead of allocating - Added BinarySize and EncodeTo methods to Manifest for pre-allocated encoding Benchmark results: - Header DecodeFrom: 3.8x faster than UnmarshalBinary (1.0ns vs 3.8ns) - Archive Encode: 13→11 allocations (15% reduction) - Archive Decode: 10→9 allocations (10% reduction) Remaining allocations are at practical minimum: - zstd compression/decompression buffers - Manifest slice allocations for data storage

Changes: - Remove legacy tool/ package (duplicated pkg/ functionality) - Remove legacy evrManifests/ package (unused manifest versions) - Remove legacy main.go CLI (replaced by cmd/evrtools) - Update module path from goopsie to EchoTools organization - Clean up benchmark log files - Update Makefile (remove legacy targets) - Update README with current structure and usage Final structure: cmd/evrtools/ - CLI application pkg/archive/ - ZSTD archive format handling pkg/manifest/ - EVR manifest/package operations All tests pass, build verified.

Changes: - Add -decimal-names flag to CLI (default: false, uses hex) - When -decimal-names is set, extract uses decimal format for filenames - Add WithDecimalNames() option to manifest.Extract() - Type symbols remain hex in directory names - File symbols can now be decimal (old behavior) or hex (new default) Usage: evrtools -mode extract ... -decimal-names # Use decimal filenames

…; delete obsolete 'evrtools' binary

Changes: - Hex filenames are now formatted as uint64 (e.g. 0xc8c33e483b601ab6) - Decimal filenames remain int64 (e.g. -3980269165710665034) - Type symbols are now formatted as uint64 hex

thesprockee · 2025-12-19T23:32:35Z

pkg/manifest/scanner.go

+		}
+
+		// Grow slice if needed
+		for int(chunkNum) >= len(files) {


In general, to fix this kind of issue you should avoid converting a potentially large int64 (from ParseInt with size 64) to a smaller integer type without checking that the value lies within the target type’s range. Either (a) parse directly into the appropriate size, or (b) enforce explicit bounds before casting.

Here, the problem is the conversion from int64 (chunkNum) to int in the loop condition for int(chunkNum) >= len(files). The simplest fix that preserves existing behavior is to avoid the narrowing conversion entirely by working in int64 consistently and using int64(len(files)) in the comparison. That way, no unchecked narrowing occurs, and the code is safe regardless of the platform’s int size.

Concretely in pkg/manifest/scanner.go:

Replace the loop condition for int(chunkNum) >= len(files) { with a comparison that casts len(files) to int64: for chunkNum >= int64(len(files)) {.

Replace the subsequent use of files[chunkNum] with files[chunkNumInt] where chunkNumInt is a safely converted int with checked range or, more simply, keep using files[chunkNum] by converting chunkNum to int after proving it fits via the loop bound. However, because slice indices must be int, we need one conversion.

The best approach: keep the loop condition in int64, and inside the loop and before indexing, convert chunkNum once after a bounds check that ensures chunkNum is within the int range (at least non‑negative and not exceeding math.MaxInt on this platform). In practice, we can leverage the invariant that chunkNum >= 0 (directory names for chunks should be non‑negative); we should add an explicit check to enforce this and to ensure chunkNum <= math.MaxInt. This makes the index conversion safe.

So:

Add an import of math (already used earlier in other examples but not in this file).

After parsing chunkNum, add checks:

if chunkNum < 0 { return fmt.Errorf("invalid chunk number: %d", chunkNum) }

if chunkNum > int64(math.MaxInt) { return fmt.Errorf("chunk number too large: %d", chunkNum) }

Then we can safely use int(chunkNum) in the loop and index expressions, since we’ve guaranteed it is within int’s range.

This approach retains the structure of the code and satisfies CodeQL by adding explicit upper (and lower) bound checks before the narrowing conversion.

just switch to using uint64 from the start, chunkNum will never be negative

Copilot

Pull request overview

This PR refactors the project into an idiomatic Go structure with significant performance optimizations and enhanced functionality. The changes move from a monolithic main.go to a well-organized package structure, implement direct binary encoding/decoding for better performance, and add comprehensive benchmarks.

Key Changes:

Reorganized codebase into pkg/ and cmd/ structure following Go conventions
Optimized binary encoding/decoding and extraction with context reuse and pre-allocated buffers
Added CLI option for decimal filenames and improved command-line interface

Reviewed changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`pkg/manifest/manifest.go`	Core manifest types and optimized binary encoding/decoding
`pkg/manifest/package.go`	Multi-part package extraction with performance optimizations
`pkg/manifest/builder.go`	Package building functionality from scanned files
`pkg/manifest/scanner.go`	Directory scanning for package building
`pkg/archive/header.go`	ZSTD archive header handling with zero-allocation encoding
`pkg/archive/reader.go`	Streaming decompression with reusable buffers
`pkg/archive/writer.go`	Streaming compression with header management
`cmd/evrtools/main.go`	Refactored CLI application with cleaner flag handling
`README.md`	Updated documentation with library usage examples and project structure
`Makefile`	Build automation and development workflow targets
`go.mod`	Updated module path and Go version
Legacy files	Removed obsolete code from previous implementation

Comments suppressed due to low confidence (2)

ready-at-dawn-echo-arena:1

This file contains an absolute path to a local installation directory. This appears to be a development/local configuration file that should not be committed to version control as it exposes system paths and won't work for other users.
_data:1
This file contains an absolute path to a local installation directory. This appears to be a development/local configuration file that should not be committed to version control as it exposes system paths and won't work for other users.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/manifest/scanner.go

pkg/manifest/package.go

pkg/manifest/builder.go

cmd/evrtools/main.go

README.md

thesprockee · 2025-12-19T23:32:54Z

@copilot implement the comments.

Copilot · 2025-12-19T23:33:03Z

@thesprockee I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you.

thesprockee added 12 commits December 19, 2025 04:05

Remove obsolete benchmark log files to streamline performance analysis

5435f23

Add extracted directory to .gitignore

50a4eed

Update .gitignore to include 'bin/' and remove 'extracted/' directory…

02919a6

…; delete obsolete 'evrtools' binary

Use uint64 for hex filename formatting

19bdc30

Changes: - Hex filenames are now formatted as uint64 (e.g. 0xc8c33e483b601ab6) - Decimal filenames remain int64 (e.g. -3980269165710665034) - Type symbols are now formatted as uint64 hex

Fix code review issues

67e01c7

Copilot AI review requested due to automatic review settings December 19, 2025 23:01

thesprockee marked this pull request as draft December 19, 2025 23:02

github-advanced-security bot found potential problems Dec 19, 2025

View reviewed changes

Copilot AI reviewed Dec 19, 2025

View reviewed changes

pkg/manifest/scanner.go Show resolved Hide resolved

pkg/manifest/package.go Show resolved Hide resolved

pkg/manifest/builder.go Show resolved Hide resolved

cmd/evrtools/main.go Show resolved Hide resolved

README.md Show resolved Hide resolved

Copilot AI mentioned this pull request Dec 19, 2025

Convert TypeSymbol and FileSymbol from int64 to uint64 #3

Merged

thesprockee marked this pull request as ready for review December 20, 2025 01:22

thesprockee force-pushed the main branch from a04d532 to ab23418 Compare December 20, 2025 02:20

thesprockee force-pushed the enhance/performance branch from bb33f9a to 67e01c7 Compare December 20, 2025 02:22

@@ -2,6 +2,7 @@
             import (
             	"fmt"
+            	"math"
             	"os"
             	"path/filepath"
             	"strconv"
@@ -40,6 +41,13 @@
             		if err != nil {
             			return fmt.Errorf("parse chunk number: %w", err)
             		}
+            		// Ensure chunkNum is within the valid range for int to avoid overflow on conversion.
+            		if chunkNum < 0 {
+            			return fmt.Errorf("invalid chunk number: %d", chunkNum)
+            		}
+            		if chunkNum > int64(math.MaxInt) {
+            			return fmt.Errorf("chunk number too large: %d", chunkNum)
+            		}
             		typeSymbol, err := strconv.ParseInt(parts[len(parts)-2], 10, 64)
             		if err != nil {
@@ -69,7 +77,7 @@
             			files = append(files, nil)
             		}
-            		files[chunkNum] = append(files[chunkNum], file)
+            		files[int(chunkNum)] = append(files[int(chunkNum)], file)
             		return nil
             	})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor and optimize archive and manifest handling with benchmarks #2

Refactor and optimize archive and manifest handling with benchmarks #2

Uh oh!

thesprockee commented Dec 19, 2025

Uh oh!

Check failure

Copilot Autofix

thesprockee Dec 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thesprockee commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor and optimize archive and manifest handling with benchmarks #2

Are you sure you want to change the base?

Refactor and optimize archive and manifest handling with benchmarks #2

Uh oh!

Conversation

thesprockee commented Dec 19, 2025

Uh oh!

Check failure

Uh oh!

Copilot Autofix

thesprockee Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thesprockee commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants