Skip to content

Conversation

@thesprockee
Copy link
Member

Introduce a new idiomatic Go project structure, enhancing organization and documentation. Implement significant performance optimizations in binary encoding/decoding and extraction processes, achieving substantial speed and memory allocation improvements. Add benchmark tests for various operations to facilitate optimization analysis. Introduce a CLI option for decimal filenames and clean up legacy code for better maintainability. Update dependencies and streamline performance analysis by removing obsolete files.

Added benchmarks for:
- Zstd compression/decompression with and without context reuse
- Compression level comparisons
- Buffer allocation strategies
- Manifest marshal/unmarshal operations
- Archive header operations
- Lookup table key strategies
Major changes:
- Reorganized code into pkg/archive and pkg/manifest packages
- Created clean CLI tool in cmd/evrtools
- Used idiomatic Go naming conventions (CamelCase exports)
- Added comprehensive documentation and comments
- Consolidated duplicate types (removed redundancy between tool/ and evrManifests/)
- Added unit tests and benchmarks for new packages
- Updated README with library usage examples
- Updated Makefile with proper targets

Benchmark results show:
- Context reuse for ZSTD decompression: ~3.7x faster (6290ns vs 1688ns)
- Zero allocations with context reuse
- CombinedInt64Key lookup: ~2.7x faster than StructKey

Legacy code in tool/ and evrManifests/ retained for backwards compatibility.
Header operations (archive):
- Marshal: 136.3 ns → 1.05 ns (130x faster), 3 allocs → 0 allocs
- Unmarshal: 134.5 ns → 3.8 ns (35x faster), 2 allocs → 0 allocs

Manifest operations:
- Marshal: 1,354,843 ns → 122,781 ns (11x faster)
- Memory: 3,228,085 B → 729,093 B (4.4x reduction)
- Allocs: 9 → 1 (9x reduction)

- Unmarshal: 1,345,174 ns → 154,367 ns (8.7x faster)
- Memory: 1,474,805 B → 737,286 B (2x reduction)
- Allocs: 8 → 3 (2.7x reduction)

Changes:
- Replaced bytes.Buffer + binary.Write with direct LittleEndian encoding
- Pre-calculate and allocate exact buffer sizes
- Use inline field encoding instead of reflection-based binary package
- Added size constants for all binary structures
Frame content lookup:
- LinearScan: 2619 ns → PrebuiltIndex: 7 ns (374x faster)
- Build frame index map before extraction loop
- Eliminates O(n²) complexity in package extraction

String formatting:
- fmt.Sprintf: 68.5 ns/op, 1 alloc → strconv.FormatInt: 26.5 ns/op, 0 allocs
- Use strconv.FormatInt/FormatUint for hex/decimal conversion
- 2.6x faster with no allocations

Other optimizations:
- Builder.incrementSection: removed loop, use direct arithmetic
- Package.Extract: cache created directories to avoid repeated MkdirAll
- Added benchmarks for frame index and hex formatting strategies
Performance improvements:
- Added EncodeTo/DecodeFrom methods to Header for zero-allocation encoding
- Reader now uses embedded headerBuf array instead of allocating
- Writer now uses embedded headerBuf array instead of allocating
- Added BinarySize and EncodeTo methods to Manifest for pre-allocated encoding

Benchmark results:
- Header DecodeFrom: 3.8x faster than UnmarshalBinary (1.0ns vs 3.8ns)
- Archive Encode: 13→11 allocations (15% reduction)
- Archive Decode: 10→9 allocations (10% reduction)

Remaining allocations are at practical minimum:
- zstd compression/decompression buffers
- Manifest slice allocations for data storage
Changes:
- Remove legacy tool/ package (duplicated pkg/ functionality)
- Remove legacy evrManifests/ package (unused manifest versions)
- Remove legacy main.go CLI (replaced by cmd/evrtools)
- Update module path from goopsie to EchoTools organization
- Clean up benchmark log files
- Update Makefile (remove legacy targets)
- Update README with current structure and usage

Final structure:
  cmd/evrtools/     - CLI application
  pkg/archive/      - ZSTD archive format handling
  pkg/manifest/     - EVR manifest/package operations

All tests pass, build verified.
Changes:
- Add -decimal-names flag to CLI (default: false, uses hex)
- When -decimal-names is set, extract uses decimal format for filenames
- Add WithDecimalNames() option to manifest.Extract()
- Type symbols remain hex in directory names
- File symbols can now be decimal (old behavior) or hex (new default)

Usage:
  evrtools -mode extract ... -decimal-names  # Use decimal filenames
Changes:
- Hex filenames are now formatted as uint64 (e.g. 0xc8c33e483b601ab6)
- Decimal filenames remain int64 (e.g. -3980269165710665034)
- Type symbols are now formatted as uint64 hex
Copilot AI review requested due to automatic review settings December 19, 2025 23:01
@thesprockee thesprockee marked this pull request as draft December 19, 2025 23:02
}

// Grow slice if needed
for int(chunkNum) >= len(files) {

Check failure

Code scanning / CodeQL

Incorrect conversion between integer types High

Incorrect conversion of a signed 64-bit integer from
strconv.ParseInt
to a lower bit size type int without an upper bound check.

Copilot Autofix

AI 6 days ago

In general, to fix this kind of issue you should avoid converting a potentially large int64 (from ParseInt with size 64) to a smaller integer type without checking that the value lies within the target type’s range. Either (a) parse directly into the appropriate size, or (b) enforce explicit bounds before casting.

Here, the problem is the conversion from int64 (chunkNum) to int in the loop condition for int(chunkNum) >= len(files). The simplest fix that preserves existing behavior is to avoid the narrowing conversion entirely by working in int64 consistently and using int64(len(files)) in the comparison. That way, no unchecked narrowing occurs, and the code is safe regardless of the platform’s int size.

Concretely in pkg/manifest/scanner.go:

  • Replace the loop condition for int(chunkNum) >= len(files) { with a comparison that casts len(files) to int64: for chunkNum >= int64(len(files)) {.
  • Replace the subsequent use of files[chunkNum] with files[chunkNumInt] where chunkNumInt is a safely converted int with checked range or, more simply, keep using files[chunkNum] by converting chunkNum to int after proving it fits via the loop bound. However, because slice indices must be int, we need one conversion.
  • The best approach: keep the loop condition in int64, and inside the loop and before indexing, convert chunkNum once after a bounds check that ensures chunkNum is within the int range (at least non‑negative and not exceeding math.MaxInt on this platform). In practice, we can leverage the invariant that chunkNum >= 0 (directory names for chunks should be non‑negative); we should add an explicit check to enforce this and to ensure chunkNum <= math.MaxInt. This makes the index conversion safe.

So:

  1. Add an import of math (already used earlier in other examples but not in this file).
  2. After parsing chunkNum, add checks:
    • if chunkNum < 0 { return fmt.Errorf("invalid chunk number: %d", chunkNum) }
    • if chunkNum > int64(math.MaxInt) { return fmt.Errorf("chunk number too large: %d", chunkNum) }
  3. Then we can safely use int(chunkNum) in the loop and index expressions, since we’ve guaranteed it is within int’s range.

This approach retains the structure of the code and satisfies CodeQL by adding explicit upper (and lower) bound checks before the narrowing conversion.

Suggested changeset 1
pkg/manifest/scanner.go

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/pkg/manifest/scanner.go b/pkg/manifest/scanner.go
--- a/pkg/manifest/scanner.go
+++ b/pkg/manifest/scanner.go
@@ -2,6 +2,7 @@
 
 import (
 	"fmt"
+	"math"
 	"os"
 	"path/filepath"
 	"strconv"
@@ -40,6 +41,13 @@
 		if err != nil {
 			return fmt.Errorf("parse chunk number: %w", err)
 		}
+		// Ensure chunkNum is within the valid range for int to avoid overflow on conversion.
+		if chunkNum < 0 {
+			return fmt.Errorf("invalid chunk number: %d", chunkNum)
+		}
+		if chunkNum > int64(math.MaxInt) {
+			return fmt.Errorf("chunk number too large: %d", chunkNum)
+		}
 
 		typeSymbol, err := strconv.ParseInt(parts[len(parts)-2], 10, 64)
 		if err != nil {
@@ -69,7 +77,7 @@
 			files = append(files, nil)
 		}
 
-		files[chunkNum] = append(files[chunkNum], file)
+		files[int(chunkNum)] = append(files[int(chunkNum)], file)
 		return nil
 	})
 
EOF
@@ -2,6 +2,7 @@

import (
"fmt"
"math"
"os"
"path/filepath"
"strconv"
@@ -40,6 +41,13 @@
if err != nil {
return fmt.Errorf("parse chunk number: %w", err)
}
// Ensure chunkNum is within the valid range for int to avoid overflow on conversion.
if chunkNum < 0 {
return fmt.Errorf("invalid chunk number: %d", chunkNum)
}
if chunkNum > int64(math.MaxInt) {
return fmt.Errorf("chunk number too large: %d", chunkNum)
}

typeSymbol, err := strconv.ParseInt(parts[len(parts)-2], 10, 64)
if err != nil {
@@ -69,7 +77,7 @@
files = append(files, nil)
}

files[chunkNum] = append(files[chunkNum], file)
files[int(chunkNum)] = append(files[int(chunkNum)], file)
return nil
})

Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just switch to using uint64 from the start, chunkNum will never be negative

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the project into an idiomatic Go structure with significant performance optimizations and enhanced functionality. The changes move from a monolithic main.go to a well-organized package structure, implement direct binary encoding/decoding for better performance, and add comprehensive benchmarks.

Key Changes:

  • Reorganized codebase into pkg/ and cmd/ structure following Go conventions
  • Optimized binary encoding/decoding and extraction with context reuse and pre-allocated buffers
  • Added CLI option for decimal filenames and improved command-line interface

Reviewed changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/manifest/manifest.go Core manifest types and optimized binary encoding/decoding
pkg/manifest/package.go Multi-part package extraction with performance optimizations
pkg/manifest/builder.go Package building functionality from scanned files
pkg/manifest/scanner.go Directory scanning for package building
pkg/archive/header.go ZSTD archive header handling with zero-allocation encoding
pkg/archive/reader.go Streaming decompression with reusable buffers
pkg/archive/writer.go Streaming compression with header management
cmd/evrtools/main.go Refactored CLI application with cleaner flag handling
README.md Updated documentation with library usage examples and project structure
Makefile Build automation and development workflow targets
go.mod Updated module path and Go version
Legacy files Removed obsolete code from previous implementation
Comments suppressed due to low confidence (2)

ready-at-dawn-echo-arena:1

  • This file contains an absolute path to a local installation directory. This appears to be a development/local configuration file that should not be committed to version control as it exposes system paths and won't work for other users.
    _data:1
  • This file contains an absolute path to a local installation directory. This appears to be a development/local configuration file that should not be committed to version control as it exposes system paths and won't work for other users.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thesprockee
Copy link
Member Author

@copilot implement the comments.

Copy link

Copilot AI commented Dec 19, 2025

@thesprockee I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you.

@thesprockee thesprockee marked this pull request as ready for review December 20, 2025 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants