Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug: Authentication failure when downloading ERA5 data on HPC and Google Colab #146

Open
DaniJonesOcean opened this issue Dec 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@DaniJonesOcean
Copy link

DaniJonesOcean commented Dec 24, 2024

Description

I encountered an issue while using the get_era5_reanalysis_data function to download ERA5 sample data. The function works fine on my local MacBook, but it fails with authentication errors both on my university's HPC and on Google Colab. The error logs suggest a failure to communicate with the Google Compute Engine Metadata server. Although this might point to networking issues specific to the HPC environment, the failure on Google Colab suggests a broader problem. Ideally, this function would work in any environment, so I conducted some tests and documented the results below.

Warnings and Download Failure on U-M HPC (Batch Job)

When run on U-M HPC as a batch job, the function hangs indefinitely and never starts the download. Here are the captured warnings:

2024-12-23 23:26:47,837 - INFO - Cache directory created at ./cache
2024-12-23 23:26:47,837 - INFO - Starting data download for variables: ['2m_temperature', '10m_u_component_of_wind', '10m_v_component_of_wind']
2024-12-23 23:26:51,776 - WARNING - Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
2024-12-23 23:26:51,856 - WARNING - Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: [Errno 113] No route to host
2024-12-23 23:26:54,859 - WARNING - Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
2024-12-23 23:26:54,859 - WARNING - Authentication failed using Compute Engine authentication due to unavailable metadata server.

Warnings and Download Failure on U-M HPC (Jupyter Notebook)

In contrast to the batch job, the function crashes when run in a Jupyter Notebook. The full warning and error list is very long, but here is a key message:

Message: 'Authentication failed using [Google] Compute Engine authentication due to unavailable metadata server.'

Error and download failure when run on Google Colab

To determine if this issue was U-M-specific, I tested the function on Google Colab. The errors there were similar. The key excerpt is:

google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7b5318b245b0>)
ERROR:root:Error during data download: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7b5318b245b0>)
Error during data download: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7b5318b245b0>)

Here is a link to the public Google Colab notebook showing the complete function call and the output

Additional Notes

This issue appears to stem from how the function interacts with the Google Compute Engine Metadata server. While the HPC environment might involve network restrictions, the occurrence of similar issues on Colab suggests this could be a broader problem related to authentication mechanisms.

As I mentioned above, I've encountered similar issues when trying to access publicly accessible data hosted on Google Cloud Platform. In all the cases I've encountered, a download function attempts to use Compute Engine authentication, which depends on the availability of a metadata server. However, this server is specific to Google Cloud virtual machines and is not available in other environments like HPC or Google Colab unless explicitly configured.

I'm guessing that in environments like HPC or Colab, the fallback authentication methods might not be set up or accessible due to different system configurations, firewalls, or network restrictions. This even seems to apply when no authentication is required, e.g. when using a publicly accessible bucket.

Reproduction steps

1. Set up an environment on HPC or Google Colab with DeepSensor installed.
2. Attempt to download ERA5 data using `get_era5_reanalysis_data`.
3. Observe warnings and errors as described above.

Version

0.4.2

OS

Linux

@DaniJonesOcean DaniJonesOcean added the bug Something isn't working label Dec 24, 2024
@davidwilby
Copy link
Collaborator

Thanks @DaniJonesOcean much appreciated - I'll look into this in the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants