Skip to content

Conversation

@elafarge
Copy link

@elafarge elafarge commented Dec 30, 2025

This commits adds support for the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment variable to support alternative GCP universes.

Context

In multiple countries, local hosting companies are partnering with GCP to offer EU-sovereign GCP-like environments. Such partnerships include one with T-Systems in Germany or another one with Thalès in France called S3NS.

In order to support such new environments, Google introduced the notion of universe in their SDKs, essentially to point them at non googleapis.com endpoints.

We (Pigment) are currently porting our Platform to S3NS, and that includes our time-series forecasting services relying on Dask and - therefore - fsspec and gcsfs.

To make that work, we're currently passing the url_endpoint storage parameter as storage_options in Dask bag calls requiring GCS access, however, that's far from ideal (these calls are scattered all around our codebase).

To connect to other universes, clients are advised to use the GOOGLE_CLOUD_UNIVERSE_DOMAIN environment variable as you can see:

Support for this environment variable would make it much easier for us (and for anyone else) to connect to alternative GCP universes, without having to patch a single line of our own code.

NOTE: credentials retrieval on S3NS Virtual Machines was working out-of-the-box, even without the GOOGLE_CLOUD_UNIVERSE_DOMAIN env var. set, because the underlying SDK supports it. Making sure gcsfs targets the correct endpoint was the only missing part.

This commits adds support for the `GOOGLE_CLOUD_UNIVERSE_DOMAIN`
environment variable to support alternative GCP universes.

Context
-------

In multiple countries, local hosting companies are partnering with GCP
to offer EU-sovereign GCP-like environments. Such partnerships include
[one with T-Systems in
Germany](https://www.t-systems.com/de/en/sovereign-cloud/solutions/sovereign-cloud-powered-by-google-cloud)
or [another one with Thalès in France called
S3NS](https://www.s3ns.io/en).

In order to support such new environments, Google introduced the notion
of `universe` in their SDKs, essentially to point them at non
`googleapis.com` endpoints.

We ([Pigment](https://pigment.com)) are currently porting our Platform
to S3NS, and that includes our time-series forecasting services relying
on Dask and - therefore - `fsspec` and `gcsfs`.

To make that work, we're currently passing the `url_endpoint` storage
parameter as `storage_options` in Dask bag calls requiring GCS access,
however, that's far from ideal (these calls are scattered all around our
codebase).

To connect to other universes, clients are advised to use the
`GOOGLE_CLOUD_UNIVERSE_DOMAIN` environment variable as you can see:
- on [this pull request on the Google Cloud Python
  SDK](googleapis/google-api-python-client#2369)
- in the documentation of S3NS:
  https://documentation.s3ns.fr/docs/overview/tpc-key-differences#key_differences_for_developers

Support for this environment variable would make it much easier for us
(and for anyone else) to connect to alternative GCP universes, without
having to patch a single line of our own code.

NOTE: credentials retrieval on S3NS Virtual Machines was working
out-of-the-box, even without the `GOOGLE_CLOUD_UNIVERSE_DOMAIN` env var.
set, because the underlying SDK supports it. Making sure `gcsfs` targets
the correct endpoint was the only missing part.
@ankitaluthra1
Copy link
Collaborator

/gcbrun

@ankitaluthra1
Copy link
Collaborator

Thank you so much for adding this support, is there a minimum version of google-auth required to fully support universe domains (e.g., for correct token generation corresponding to universe). If so we should update the dependencies with the same.

-------------------------

To target an alternative GCP universe, the ``GOOGLE_CLOUD_UNIVERSE_DOMAIN``
environment variable should be set to your desired unverse domain for ``gcsfs``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo unverse

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thank you :) I'll amend it in a follow up commit (where I'll also fix the failing tests, looks like I overlooked the fact that on the CI GOOGLE_APPLICATION_CREDENTIALS is overridden, my bad 🙈

@martindurant
Copy link
Member

Before checking the implementation, let me comment that I fully support making use of a standard env var like this, and thrust of having alternative GCP variants. If there's anything I can do to help on behalf of fsspec, dask or any other of the projects I am associated with, please get in direct touch with me.

I should mention that there are several other ways to set the endpoint url already:

  • the STORAGE_EMULATOR_HOST variable, which is targeted at testing
  • including a file in ~/.config/fsspec/ specifying the defaults for the "gcs" protocol (or a different location, as given by FSSPEC_CONFIG_DIRECTORY)
  • specifying the same configuration using FSSPEC_GCS_ENDPOINT_URL

All of these are specific to storage, so the general concept of "universe" is more powerful.

(see https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration )

@elafarge
Copy link
Author

Thank you so much for adding this support, is there a minimum version of google-auth required to fully support universe domains (e.g., for correct token generation corresponding to universe). If so we should update the dependencies with the same.

Thank you so much for your prompt reply. I wasn't expecting to get feedback on this contribution so quickly during the holiday season 🤗

You are absolutely right, it seems the latest bugfixes in google-auth regarding alternative universes were added in v2.36.0. I've updated the requirements.txt file accordingly (I've seen no mention of universes in google-auth-oauthlib so I've left it "unpinned".

I've just pushed a commit addressing all your remarks (and ran the tests locally but in the same conditions as they are executed on your CI, this time 🙈 ).

@elafarge
Copy link
Author

elafarge commented Dec 30, 2025

Before checking the implementation, let me comment that I fully support making use of a standard env var like this, and thrust of having alternative GCP variants. If there's anything I can do to help on behalf of fsspec, dask or any other of the projects I am associated with, please get in direct touch with me.

I should mention that there are several other ways to set the endpoint url already:

  • the STORAGE_EMULATOR_HOST variable, which is targeted at testing
  • including a file in ~/.config/fsspec/ specifying the defaults for the "gcs" protocol (or a different location, as given by FSSPEC_CONFIG_DIRECTORY)
  • specifying the same configuration using FSSPEC_GCS_ENDPOINT_URL

All of these are specific to storage, so the general concept of "universe" is more powerful.

(see https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration )

Oooh, thank you so much 🙌 🙌

I did spot the STORAGE_EMULATOR_HOST environment variable (but given it's targeted at testing, it felt a bit unfit for this use case).
I did, however, completely miss the FSSPEC_GCS_ENDPOINT_URL environment variable, which is a much much better alternative to my current workaround of passing the storage_options.endpoint_url parameter in all the filesystem-related dask calls in our codebase.

I'll let you judge whether or not you think supporting the GOOGLE_CLOUD_UNIVERSE_DOMAIN makes sense or not.

One little note: it is - of course - possible to retrieve the universe domain from the GCE VM metadata endpoint (e.g. when running code on a GCE VM / a GKE container).

However, engineers at Google have warned us that there was a race condition where this endpoint might not return the universe domain in the first few seconds of a VM's life, and therefore advised us to support this "standard" environment variable in our applications instead.
That being said, if this race condition gets fixed, it will probably be worth relying on the VM metadata endpoint to determine the universe-specific GCS endpoint to target (and I will, of course, be happy to contribute that too).

Thanks a lot for offering your help 🙌
At this stage, I don't think there's anything else needed to make dask or fsspec work on S3NS (the "sovereign" GCP variant for France).

If you're interested in feedback on these GCP "alternative" universes, my team and I have quite a bit of experience with S3NS and we would be super happy to share it with you.
Feel free to reach out :) (git log on this PR's commits should give you my personal email 😄 ).

In a few words: it seems we managed to make dask work there, leveraging the dask Kubernetes operator (awesome tool, by the way) with almost no tweak 😌

@ankitaluthra1
Copy link
Collaborator

/gcbrun

@ankitaluthra1
Copy link
Collaborator

Before checking the implementation, let me comment that I fully support making use of a standard env var like this, and thrust of having alternative GCP variants. If there's anything I can do to help on behalf of fsspec, dask or any other of the projects I am associated with, please get in direct touch with me.

I should mention that there are several other ways to set the endpoint url already:

  • the STORAGE_EMULATOR_HOST variable, which is targeted at testing
  • including a file in ~/.config/fsspec/ specifying the defaults for the "gcs" protocol (or a different location, as given by FSSPEC_CONFIG_DIRECTORY)
  • specifying the same configuration using FSSPEC_GCS_ENDPOINT_URL

All of these are specific to storage, so the general concept of "universe" is more powerful.

(see https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration )

Gotcha, having a general universe concept makes a lot of sense. Thanks for helping me see it as well, Martin!

@elafarge
Copy link
Author

I totally overlooked the fact that my unit tests were only passing because I had credentials in ~/.config/gcloud locally.

I just switched the auth method they use to anon and ran the tests locally after moving my ~/.config/gcloud someplace else, and fixed a linting error along the way (I'm not a pythonist and didn't know about pre-commit until a few minutes ago, I did set it up, all green on that front too).

Comment on lines 1341 to +1351
def on_google(self):
return "torage.googleapis.com" in self._location
return f"torage.{_gcp_universe_domain()}" in self._location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's now "on google-like" :)
Don't change it...

decorator>4.1.2
fsspec==2025.12.0
google-auth>=1.2
google-auth>=2.36.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed if we are handling the URL construction ourselves?

@ankitaluthra1 : is there documentation on how the google backend for the zonal/hierarchical backend handles the various env vars or explicit url kwarg?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be needed if we want to ensure that the underlying google-auth library supports alternative GCP universes accordingly too.

That being said, I'm totally fine removing this constraint from the requirements.txt file (given that most people won't be using alternative universes) and rather mention that minimum version requirement for alternative universes in the documentation if you think that makes more sense :)

Comment on lines 126 to +133
_emulator_location = f"http://{_emulator_location}"
return _emulator_location
return "https://storage.googleapis.com"

return f"https://storage.{_gcp_universe_domain()}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the emulator variable takes precedence over the universe

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that made more sense to prioritize the test emulator but happy to change that if you think otherwise.

Comment on lines 201 to 206
To target an alternative GCP universe, the ``GOOGLE_CLOUD_UNIVERSE_DOMAIN``
environment variable should be set to your desired universe domain for ``gcsfs``
to target the `Google Cloud Storage`_ API in your alternative universe.

For instance, set ``GOOGLE_CLOUD_UNIVERSE_DOMAIN=s3nsapis.fr`` to target the
S3NS_ universe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should expand the section to discuss all three methods for overriding the URL: the universe, the emulator host and explicit endpoint_url, including explicitly which takes precedence when multiple ones are passed.

Perhaps we could also mention how to set the endpoint_url (or other init kwargs) using the fsspec config or at least link to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs to take these feedback into account, here's what it looks like now
2026-01-05-14:49:53

@ankitaluthra1
Copy link
Collaborator

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants