Skip to content

Conversation

@Mahalaxmibejugam
Copy link
Contributor

@Mahalaxmibejugam Mahalaxmibejugam commented Dec 29, 2025

HNS-Aware Behavior: For buckets with Hierarchical Namespace (HNS) enabled, mkdir now uses a create_folder API that creates directories similar to any filesystem

Recursive Creation: The method supports create_parents=True, allowing for the recursive creation of nested directories in a single API call when using the HNS-aware path.

HNS Bucket Creation: It is now possible to create a new HNS-enabled bucket directly by calling mkdir with the create_hns_bucket=True flag. This passes the necessary HNS configuration when creating the bucket.

Fallback Mechanism: If a bucket is not HNS-enabled, or if the operation is to create a bucket itself, the method gracefully falls back to the original mkdir implementation from the parent class. It also falls back if it fails to determine the bucket type, for instance, if the bucket doesn't exist.

Backward Compatibility for Bucket Creation: For backward compatibility, buckets can still be created by providing a path that includes a directory (e.g., bucket-name/some-dir) and setting create_parents=True. The system will fall back to the parent mkdir logic, which creates the bucket if it does not exist.

Cache Management: Upon successfully creating a new directory, the parent directory's listing in the dircache is updated to include the new directory entry. This avoids clearing the entire parent cache and ensures the cache remains consistent.

Testing: The test suite includes comprehensive checks for both HNS and non-HNS buckets, cache invalidation logic, and various failure scenarios.

Error Handling: Error handling is kept similar to raise a standard FileNotFoundError if a parent directory doesn't exist when create_parents=False.

@ankitaluthra1
Copy link
Collaborator

/gcbrun

kwargs["default_acl"] = None

bucket, key, _ = self.split_path(path)
# If key is empty, it's a bucket operation. Defer to parent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All operations are bucket operations only, Can we make the comment more clear, IIUC in case key is empty, it mean only bucket needs to be created, correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if key is empty we just create the bucket. This case is handled as fast fallback to ensure no further checks are made before calling super mkdir method. Updated comment.

# Note: Emulators may not fully support HNS features like real GCS.
# TODO: Update to create HNS bucket once mkdir supports creating HNS buckets.
gcs.mkdir(TEST_HNS_BUCKET)
gcs.mkdir(TEST_HNS_BUCKET, create_hns_bucket=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also use same parameter name as bucket i.e instead of create_hns_bucket, use hierarchical_namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent of create_hns_bucket is only to create the bucket with HNS enabled. For creating folders inside HNS enabled bucket this parameter need not to be specified. Also kept it consistent with other parameters like create_parents

Renaming it to hierarchical_namespace creates confusion, implying that this setting is required even for folder operations.

request=expected_request
)

def _assert_hns_mkdir_called_using_logs(self, caplog, path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming logs are only used for real bucket testing. Instead of verifying logs, can we use get folder metadata api to validate if actual folder is created in case of real bucket testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already asserting the existence of folders using the exists() call in case of real bucket. Logs assertion was added in addition to that, but as it is not adding much value here, removed it

gcsfs.rm(placeholder_file)
gcsfs.mkdir(path1)

with gcs_hns_mocks(BucketType.HIERARCHICAL, gcsfs) as mocks:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding helper methods for assertions reduces the readability, consider DAMP over DRY in tests
https://stackoverflow.com/questions/6453235/what-does-damp-not-dry-mean-when-talking-about-unit-tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some details like mock is called only once was hidden in the helper method even though all the parameters are being passed from the context of the test.

Added helper only for creating the request by passing the right set of parameters from the test(As the parameters passed are clear in the test it would have the necessary details in test itself) and asserted the call in the test itself.

@ankitaluthra1
Copy link
Collaborator

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants