Skip to content

Comments

feat: implement maxdepth and withdirs options for S3FileSystem.find()#544

Merged
laughingman7743 merged 1 commit intomasterfrom
max_depth_with_dirs
Jul 19, 2025
Merged

feat: implement maxdepth and withdirs options for S3FileSystem.find()#544
laughingman7743 merged 1 commit intomasterfrom
max_depth_with_dirs

Conversation

@laughingman7743
Copy link
Member

@laughingman7743 laughingman7743 commented May 12, 2024

Summary

This PR implements the maxdepth and withdirs options for the S3FileSystem.find() method, providing compatibility with s3fs behavior.

Changes

  • maxdepth parameter: Limits directory traversal depth using an efficient recursive approach
  • withdirs parameter: Controls whether directories are included in results (default: False)
  • Improved cache management: Uses (path, delimiter) tuple as cache key
  • Code refactoring: Extracted directory generation logic to _extract_parent_directories() method
  • Comprehensive tests: Added tests for both parameters
  • Documentation: Updated CLAUDE.md with development guidelines

Implementation Details

maxdepth

  • Uses recursive traversal with delimiter="/" for efficiency (similar to s3fs)
  • Only fetches directories up to the specified depth, avoiding unnecessary S3 API calls
  • maxdepth=0: Returns only files directly in the specified path
  • maxdepth=1: Returns files in the path and one level of subdirectories

withdirs

  • Default is False (returns only files, consistent with typical user expectations)
  • When True, includes directory entries in the results
  • For unlimited depth queries, directories are derived from file paths

Code Quality Improvements

  • Extracted _extract_parent_directories() method to reduce _find() method complexity
  • Added Google-style docstring with clear parameter and return value documentation
  • Follows the principle of single responsibility for better maintainability

Performance Improvements

  • When maxdepth is specified, uses level-by-level traversal instead of fetching all files
  • Significantly faster for shallow searches in large directory trees
  • Reduces S3 API calls and data transfer

Cache Management

  • Replaced the initial approach of adding delimiter field to S3Object
  • Now uses (path, delimiter) as the cache key
  • Ensures different delimiter queries don't conflict in the cache

Testing

All tests pass successfully:

  • test_find_maxdepth: Verifies correct depth limiting at multiple levels
  • test_find_withdirs: Verifies directory inclusion/exclusion behavior
  • All existing tests continue to pass

Documentation Updates

Updated CLAUDE.md with:

  • Google-style docstring guidelines
  • Testing best practices and examples
  • S3 FileSystem development guidelines
  • Performance optimization tips

Compatibility

The implementation follows s3fs conventions for both parameters, ensuring drop-in compatibility for users migrating from s3fs.

🤖 Generated with Claude Code

@laughingman7743
Copy link
Member Author

laughingman7743 commented Jun 30, 2024

TODO:

  • Since the list_objects_v2 results differ depending on the delimiter, it is necessary to consider how to keep the differences in the cache.

@laughingman7743 laughingman7743 force-pushed the max_depth_with_dirs branch 5 times, most recently from 0dd38b4 to 983c4c7 Compare December 29, 2024 08:45
@laughingman7743 laughingman7743 changed the title Support maxdepth and withdirs options feat: implement maxdepth and withdirs options for S3FileSystem.find() Jul 19, 2025
@laughingman7743 laughingman7743 marked this pull request as ready for review July 19, 2025 02:34
@laughingman7743 laughingman7743 force-pushed the max_depth_with_dirs branch 3 times, most recently from cb25fb4 to 980a17a Compare July 19, 2025 03:35
- Add maxdepth parameter to limit directory traversal depth
  - Uses recursive approach with delimiter="/" for efficiency
  - Compatible with s3fs behavior
- Add withdirs parameter to include directories in results
  - Default is False (returns only files)
  - When True, includes directory entries
- Improve cache management using (path, delimiter) tuple as key
- Add comprehensive tests for both parameters

Implementation details:
- When maxdepth is specified, uses level-by-level traversal
- When withdirs=True and delimiter="", derives directories from file paths
- Ensures compatibility with s3fs conventions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@laughingman7743 laughingman7743 merged commit 5a85040 into master Jul 19, 2025
8 of 10 checks passed
@laughingman7743 laughingman7743 deleted the max_depth_with_dirs branch July 19, 2025 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant