-
Notifications
You must be signed in to change notification settings - Fork 164
mkdir override for HNS buckets #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7997b1f to
487f74e
Compare
|
/gcbrun |
gcsfs/extended_gcsfs.py
Outdated
| kwargs["default_acl"] = None | ||
|
|
||
| bucket, key, _ = self.split_path(path) | ||
| # If key is empty, it's a bucket operation. Defer to parent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All operations are bucket operations only, Can we make the comment more clear, IIUC in case key is empty, it mean only bucket needs to be created, correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if key is empty we just create the bucket. This case is handled as fast fallback to ensure no further checks are made before calling super mkdir method. Updated comment.
| # Note: Emulators may not fully support HNS features like real GCS. | ||
| # TODO: Update to create HNS bucket once mkdir supports creating HNS buckets. | ||
| gcs.mkdir(TEST_HNS_BUCKET) | ||
| gcs.mkdir(TEST_HNS_BUCKET, create_hns_bucket=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also use same parameter name as bucket i.e instead of create_hns_bucket, use hierarchical_namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intent of create_hns_bucket is only to create the bucket with HNS enabled. For creating folders inside HNS enabled bucket this parameter need not to be specified. Also kept it consistent with other parameters like create_parents
Renaming it to hierarchical_namespace creates confusion, implying that this setting is required even for folder operations.
| request=expected_request | ||
| ) | ||
|
|
||
| def _assert_hns_mkdir_called_using_logs(self, caplog, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming logs are only used for real bucket testing. Instead of verifying logs, can we use get folder metadata api to validate if actual folder is created in case of real bucket testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already asserting the existence of folders using the exists() call in case of real bucket. Logs assertion was added in addition to that, but as it is not adding much value here, removed it
| gcsfs.rm(placeholder_file) | ||
| gcsfs.mkdir(path1) | ||
|
|
||
| with gcs_hns_mocks(BucketType.HIERARCHICAL, gcsfs) as mocks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding helper methods for assertions reduces the readability, consider DAMP over DRY in tests
https://stackoverflow.com/questions/6453235/what-does-damp-not-dry-mean-when-talking-about-unit-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, some details like mock is called only once was hidden in the helper method even though all the parameters are being passed from the context of the test.
Added helper only for creating the request by passing the right set of parameters from the test(As the parameters passed are clear in the test it would have the necessary details in test itself) and asserted the call in the test itself.
|
/gcbrun |
HNS-Aware Behavior: For buckets with Hierarchical Namespace (HNS) enabled, mkdir now uses a create_folder API that creates directories similar to any filesystem
Recursive Creation: The method supports create_parents=True, allowing for the recursive creation of nested directories in a single API call when using the HNS-aware path.
HNS Bucket Creation: It is now possible to create a new HNS-enabled bucket directly by calling mkdir with the create_hns_bucket=True flag. This passes the necessary HNS configuration when creating the bucket.
Fallback Mechanism: If a bucket is not HNS-enabled, or if the operation is to create a bucket itself, the method gracefully falls back to the original mkdir implementation from the parent class. It also falls back if it fails to determine the bucket type, for instance, if the bucket doesn't exist.
Backward Compatibility for Bucket Creation: For backward compatibility, buckets can still be created by providing a path that includes a directory (e.g., bucket-name/some-dir) and setting create_parents=True. The system will fall back to the parent mkdir logic, which creates the bucket if it does not exist.
Cache Management: Upon successfully creating a new directory, the parent directory's listing in the dircache is updated to include the new directory entry. This avoids clearing the entire parent cache and ensures the cache remains consistent.
Testing: The test suite includes comprehensive checks for both HNS and non-HNS buckets, cache invalidation logic, and various failure scenarios.
Error Handling: Error handling is kept similar to raise a standard FileNotFoundError if a parent directory doesn't exist when create_parents=False.