GCS gfile operations fail in TF 2.17.0rc0 and 2.18 nightly outside of GCP #2016

lgeiger · 2024-06-19T10:18:41Z

I already posted this issue a couple of days ago on upstream TensorFlow at tensorflow/tensorflow#69789 but posting it here again since it might be related to the gcs-filesystem package. /cc @yongtang

When trying to run GCS operations with tf.io.gfile on 2.17.0rc0 or 2.18 nightly anywhere outside of a GCP VM the command hangs and eventually fails after 10 retries with the error message as below.

import tensorflow as tf

tf.io.gfile.exists("gs://tfds-data/dataset_info/mnist/3.0.1/dataset_info.json")

2024-06-14 15:45:30.081439: W external/local_tsl/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/tensorflow/python/lib/io/file_io.py", line 290, in file_exists_v2
    _pywrap_file_io.FileExists(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.AbortedError: All 10 retry attempts failed. The last failure: Error executing an HTTP request: HTTP response code 301 with body '<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fmnist%2F3.0.1%2Fdataset_info.json?fields=size%2Cgeneration%2Cupdated">here</A>.
</BODY></HTML>
'
	 when reading metadata of gs://tfds-data/dataset_info/mnist/3.0.1/dataset_info.json

I can't seem to reproduce this issue on either colab or a GCP VM. But it will consistently fail locally on my mac, inside a python:3.11 docker container, on GitHub actions or inside a kaggle notebook. The same code works fine with TensorFlow 2.16 so I don't think this is due to my local setup.

It also seems like other people are running into this with the latest TF nightly: tensorflow/datasets#5360

Would be great to get this fixed before the next stable release.

The text was updated successfully, but these errors were encountered:

lgeiger · 2024-06-25T13:58:20Z

Looks like this issue is indeed caused by upstream TF: tensorflow/tensorflow#69789 (comment)

lgeiger mentioned this issue Jun 19, 2024

GCS gfile operations fail in TF 2.17.0rc0 and 2.18 nightly when not running in GCP tensorflow/tensorflow#69789

Closed

lgeiger closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCS gfile operations fail in TF 2.17.0rc0 and 2.18 nightly outside of GCP #2016

GCS gfile operations fail in TF 2.17.0rc0 and 2.18 nightly outside of GCP #2016

lgeiger commented Jun 19, 2024

lgeiger commented Jun 25, 2024

GCS gfile operations fail in TF 2.17.0rc0 and 2.18 nightly outside of GCP #2016

GCS gfile operations fail in TF 2.17.0rc0 and 2.18 nightly outside of GCP #2016

Comments

lgeiger commented Jun 19, 2024

lgeiger commented Jun 25, 2024