-
Notifications
You must be signed in to change notification settings - Fork 164
✨ Support alternative GCP Universes #732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
✨ Support alternative GCP Universes #732
Conversation
This commits adds support for the `GOOGLE_CLOUD_UNIVERSE_DOMAIN` environment variable to support alternative GCP universes. Context ------- In multiple countries, local hosting companies are partnering with GCP to offer EU-sovereign GCP-like environments. Such partnerships include [one with T-Systems in Germany](https://www.t-systems.com/de/en/sovereign-cloud/solutions/sovereign-cloud-powered-by-google-cloud) or [another one with Thalès in France called S3NS](https://www.s3ns.io/en). In order to support such new environments, Google introduced the notion of `universe` in their SDKs, essentially to point them at non `googleapis.com` endpoints. We ([Pigment](https://pigment.com)) are currently porting our Platform to S3NS, and that includes our time-series forecasting services relying on Dask and - therefore - `fsspec` and `gcsfs`. To make that work, we're currently passing the `url_endpoint` storage parameter as `storage_options` in Dask bag calls requiring GCS access, however, that's far from ideal (these calls are scattered all around our codebase). To connect to other universes, clients are advised to use the `GOOGLE_CLOUD_UNIVERSE_DOMAIN` environment variable as you can see: - on [this pull request on the Google Cloud Python SDK](googleapis/google-api-python-client#2369) - in the documentation of S3NS: https://documentation.s3ns.fr/docs/overview/tpc-key-differences#key_differences_for_developers Support for this environment variable would make it much easier for us (and for anyone else) to connect to alternative GCP universes, without having to patch a single line of our own code. NOTE: credentials retrieval on S3NS Virtual Machines was working out-of-the-box, even without the `GOOGLE_CLOUD_UNIVERSE_DOMAIN` env var. set, because the underlying SDK supports it. Making sure `gcsfs` targets the correct endpoint was the only missing part.
|
/gcbrun |
|
Thank you so much for adding this support, is there a minimum version of |
docs/source/index.rst
Outdated
| ------------------------- | ||
|
|
||
| To target an alternative GCP universe, the ``GOOGLE_CLOUD_UNIVERSE_DOMAIN`` | ||
| environment variable should be set to your desired unverse domain for ``gcsfs`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo unverse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch thank you :) I'll amend it in a follow up commit (where I'll also fix the failing tests, looks like I overlooked the fact that on the CI GOOGLE_APPLICATION_CREDENTIALS is overridden, my bad 🙈
|
Before checking the implementation, let me comment that I fully support making use of a standard env var like this, and thrust of having alternative GCP variants. If there's anything I can do to help on behalf of fsspec, dask or any other of the projects I am associated with, please get in direct touch with me. I should mention that there are several other ways to set the endpoint url already:
All of these are specific to storage, so the general concept of "universe" is more powerful. (see https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration ) |
Thank you so much for your prompt reply. I wasn't expecting to get feedback on this contribution so quickly during the holiday season 🤗 You are absolutely right, it seems the latest bugfixes in I've just pushed a commit addressing all your remarks (and ran the tests locally but in the same conditions as they are executed on your CI, this time 🙈 ). |
Oooh, thank you so much 🙌 🙌 I did spot the I'll let you judge whether or not you think supporting the One little note: it is - of course - possible to retrieve the universe domain from the GCE VM metadata endpoint (e.g. when running code on a GCE VM / a GKE container). However, engineers at Google have warned us that there was a race condition where this endpoint might not return the universe domain in the first few seconds of a VM's life, and therefore advised us to support this "standard" environment variable in our applications instead. Thanks a lot for offering your help 🙌 If you're interested in feedback on these GCP "alternative" universes, my team and I have quite a bit of experience with S3NS and we would be super happy to share it with you. In a few words: it seems we managed to make dask work there, leveraging the dask Kubernetes operator (awesome tool, by the way) with almost no tweak 😌 |
|
/gcbrun |
Gotcha, having a general universe concept makes a lot of sense. Thanks for helping me see it as well, Martin! |
|
I totally overlooked the fact that my unit tests were only passing because I had credentials in I just switched the auth method they use to |
| def on_google(self): | ||
| return "torage.googleapis.com" in self._location | ||
| return f"torage.{_gcp_universe_domain()}" in self._location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's now "on google-like" :)
Don't change it...
| decorator>4.1.2 | ||
| fsspec==2025.12.0 | ||
| google-auth>=1.2 | ||
| google-auth>=2.36.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed if we are handling the URL construction ourselves?
@ankitaluthra1 : is there documentation on how the google backend for the zonal/hierarchical backend handles the various env vars or explicit url kwarg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be needed if we want to ensure that the underlying google-auth library supports alternative GCP universes accordingly too.
That being said, I'm totally fine removing this constraint from the requirements.txt file (given that most people won't be using alternative universes) and rather mention that minimum version requirement for alternative universes in the documentation if you think that makes more sense :)
| _emulator_location = f"http://{_emulator_location}" | ||
| return _emulator_location | ||
| return "https://storage.googleapis.com" | ||
|
|
||
| return f"https://storage.{_gcp_universe_domain()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the emulator variable takes precedence over the universe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that made more sense to prioritize the test emulator but happy to change that if you think otherwise.
docs/source/index.rst
Outdated
| To target an alternative GCP universe, the ``GOOGLE_CLOUD_UNIVERSE_DOMAIN`` | ||
| environment variable should be set to your desired universe domain for ``gcsfs`` | ||
| to target the `Google Cloud Storage`_ API in your alternative universe. | ||
|
|
||
| For instance, set ``GOOGLE_CLOUD_UNIVERSE_DOMAIN=s3nsapis.fr`` to target the | ||
| S3NS_ universe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expand the section to discuss all three methods for overriding the URL: the universe, the emulator host and explicit endpoint_url, including explicitly which takes precedence when multiple ones are passed.
Perhaps we could also mention how to set the endpoint_url (or other init kwargs) using the fsspec config or at least link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/gcbrun |

This commits adds support for the
GOOGLE_CLOUD_UNIVERSE_DOMAINenvironment variable to support alternative GCP universes.Context
In multiple countries, local hosting companies are partnering with GCP to offer EU-sovereign GCP-like environments. Such partnerships include one with T-Systems in Germany or another one with Thalès in France called S3NS.
In order to support such new environments, Google introduced the notion of
universein their SDKs, essentially to point them at nongoogleapis.comendpoints.We (Pigment) are currently porting our Platform to S3NS, and that includes our time-series forecasting services relying on Dask and - therefore -
fsspecandgcsfs.To make that work, we're currently passing the
url_endpointstorage parameter asstorage_optionsin Dask bag calls requiring GCS access, however, that's far from ideal (these calls are scattered all around our codebase).To connect to other universes, clients are advised to use the
GOOGLE_CLOUD_UNIVERSE_DOMAINenvironment variable as you can see:Support for this environment variable would make it much easier for us (and for anyone else) to connect to alternative GCP universes, without having to patch a single line of our own code.
NOTE: credentials retrieval on S3NS Virtual Machines was working out-of-the-box, even without the
GOOGLE_CLOUD_UNIVERSE_DOMAINenv var. set, because the underlying SDK supports it. Making suregcsfstargets the correct endpoint was the only missing part.