Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages when user hasn't accepted dataset EULA can be confusing #36

Open
betolink opened this issue Nov 16, 2021 · 20 comments
Open
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@betolink
Copy link
Member

betolink commented Nov 16, 2021

We've identified two behaviors that make this user experience frustrating:

  • earthaccess.download(), when it encounters a 403 due to a user not accepting EULA. @jessnicwelch showed a way to reproduce in this thread. When this error happens, we need to display a message like: Access to this data has been denied because it requires you to accept a End-User License Agreement (EULA). Follow this link to view and accept the EULA: https://{the rest of it}. The URL we need to display should be contained in the 403 response data. This is what this ticket is about.

  • earthaccess.download() will continue after an error. New ticket for this: earthaccess.download() ignores errors #581

Original description:

If we get a 302 redirect for a particular dataset, see if we can detect if the reason is the lack of an EULA (NASA requires users to approve an EULA for some datasets) and explain what happened.

Currently, users can receive a confusing message:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: ...
@MattF-NSIDC MattF-NSIDC pinned this issue Sep 12, 2023
@MattF-NSIDC MattF-NSIDC changed the title EULA Error messages when user hasn't accepted dataset EULA can be confusing Sep 12, 2023
@MattF-NSIDC MattF-NSIDC added the enhancement New feature or request label Sep 12, 2023
@asteiker
Copy link
Member

The UMM-C (CMR Collection) schema was updated to include EULA information, so this can now be determined programmatically by a client in order to provide better error handling: https://bugs.earthdata.nasa.gov/browse/SDRT-1236. I don't have an example handy but it looks like this is now included as a EULA identifiers list under UseConstraints (https://bugs.earthdata.nasa.gov/browse/ECSE-1213).

@mfisher87
Copy link
Collaborator

mfisher87 commented Jan 29, 2024

@asteiker My understanding is that only tells us whether a collection has a EULA (and provides a link to accept it?), not whether the user has accepted it. Is that right? Maybe there's some data in the 403 response that we can use, e.g. {"reason": "EULA not accepted"}? I vaguely remember discussing this with you and Daniel Crumly not so long ago. Do you remember that? 🤔

@asteiker
Copy link
Member

@mfisher87 I believe there is indeed data in the 403 response that tells a client whether or not the EULA was accepted. The EULA response was utilized in the Harmony API for this exact use case. Does this Harmony test help? Can we leverage something similar for earthaccess? https://github.com/nasa/harmony/blob/main/services/harmony/test/eula-acceptance.ts

@mfisher87 mfisher87 added this to the Version 1.0 milestone Mar 5, 2024
@jessnicwelch jessnicwelch self-assigned this Mar 19, 2024
@jessnicwelch
Copy link
Collaborator

Howdy! I recreated the problem using a fresh EDL account without accepting EULAs

>>> import earthaccess
>>> earthaccess.login()

Enter your Earthdata Login username: jessicanwelch
Enter your Earthdata password: 
<earthaccess.auth.Auth object at 0x000002BE59952C40>

>>> results = earthaccess.search_data(
...     short_name='SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205',
...     cloud_hosted=True,
...     bounding_box=(-10, 20, 10, 50),
...     temporal=("1999-02", "2019-03"),
...     count=10
... )
Granules found: 1467

>>> results = earthaccess.search_data(
...     short_name='S5P_L2__CH4____HiR',
...     cloud_hosted=True,
...     bounding_box=(-10, 20, 10, 50),
...     temporal=("1999-02", "2019-03"),
...     count=10
... )
Granules found: 907

>>> files = earthaccess.download(results, "C:/Users/qnw/Downloads/files") 
 Getting 10 granules, approx download size: 0.54 GB
QUEUEING TASKS | : 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 620.38it/s]
PROCESSING TASKS | :   0%|                                                                                                                                  | 0/10 [00:00<?, ?it/s]Error while downloading the file S5P_RPRO_L2__CH4____20180430T135151_20180430T153321_02826_03_020400_20221107T155202.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/120/S5P_RPRO_L2__CH4____20180430T135151_20180430T153321_02826_03_020400_20221107T155202.nc

PROCESSING TASKS | :  10%|████████████▏                                                                                                             | 1/10 [00:01<00:10,  1.17s/itE 
rror while downloading the file S5P_RPRO_L2__CH4____20180502T095056_20180502T113226_02852_03_020400_20221107T155539.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/122/S5P_RPRO_L2__CH4____20180502T095056_20180502T113226_02852_03_020400_20221107T155539.nc

Error while downloading the file S5P_RPRO_L2__CH4____20180430T121021_20180430T135151_02825_03_020400_20221107T155201.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/120/S5P_RPRO_L2__CH4____20180430T121021_20180430T135151_02825_03_020400_20221107T155201.nc

Error while downloading the file S5P_RPRO_L2__CH4____20180501T115123_20180501T133254_02839_03_020400_20221107T155400.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/121/S5P_RPRO_L2__CH4____20180501T115123_20180501T133254_02839_03_020400_20221107T155400.nc

Error while downloading the file S5P_RPRO_L2__CH4____20180430T102851_20180430T121021_02824_03_020400_20221107T155159.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/120/S5P_RPRO_L2__CH4____20180430T102851_20180430T121021_02824_03_020400_20221107T155159.nc

PROCESSING TASKS | :  50%|█████████████████████████████████████████████████████████████                                                             | 5/10 [00:01<00:00,  5.00it/sE 
rror while downloading the file S5P_RPRO_L2__CH4____20180502T113226_20180502T131356_02853_03_020400_20221107T155541.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/122/S5P_RPRO_L2__CH4____20180502T113226_20180502T131356_02853_03_020400_20221107T155541.nc

Error while downloading the file S5P_RPRO_L2__CH4____20180501T100953_20180501T115123_02838_03_020400_20221107T155339.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/121/S5P_RPRO_L2__CH4____20180501T100953_20180501T115123_02838_03_020400_20221107T155339.nc

PROCESSING TASKS | :  70%|█████████████████████████████████████████████████████████████████████████████████████▍                                    | 7/10 [00:01<00:00,  5.43it/sE 
rror while downloading the file S5P_RPRO_L2__CH4____20180501T133254_20180501T151424_02840_03_020400_20221107T155401.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/121/S5P_RPRO_L2__CH4____20180501T133254_20180501T151424_02840_03_020400_20221107T155401.nc

Error while downloading the file S5P_RPRO_L2__CH4____20180502T131356_20180502T145526_02854_03_020400_20221107T155602.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/122/S5P_RPRO_L2__CH4____20180502T131356_20180502T145526_02854_03_020400_20221107T155602.nc

PROCESSING TASKS | :  90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊            | 9/10 [00:01<00:00,  6.61it/sE 
rror while downloading the file S5P_RPRO_L2__CH4____20180503T111328_20180503T125458_02867_03_020400_20221107T155745.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2018/123/S5P_RPRO_L2__CH4____20180503T111328_20180503T125458_02867_03_020400_20221107T155745.nc

PROCESSING TASKS | : 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  5.07it/s] 
COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] 

@jessnicwelch
Copy link
Collaborator

Oh, and I used this dataset: https://disc.gsfc.nasa.gov/datasets/S5P_L2__CH4____HiR_1/summary

@jessnicwelch
Copy link
Collaborator

jessnicwelch commented Mar 28, 2024

Jess and Matt spoke during an Openscapes call today about the next steps on this issue. Jess will begin learning/experimenting with Python debugging to get a better look at the messages/code related to the observed errors.

@jessnicwelch
Copy link
Collaborator

Hi, @mfisher87. I took time to figure out how to use debugging with Python. Either I was unsuccessful or the debugging tools aren't useful for this code. My interpretation is that there isn't an error associated with the code earthaccess.download(results, "C:/Users/qnw/Downloads/files") as the code executes, it simply doesn't download the files to the directory "files," but the directory is created. Below is (1) the script, (2) screenshot of the debugging process., and (3) interactive code. Once I "step into" the breakpoint, it executes the code and provides the error messages (like above and) commented out in the code block below.

## Python 3.8.9

## python
# import pdb
import earthaccess
earthaccess.login()  # use jessicanwelch

# results = earthaccess.search_data(
#   short_name='SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205',
#   cloud_hosted=True,
#   bounding_box=(-10, 20, 10, 50),
#   temporal=("1999-02", "2019-03"),
#   count=10
#   )

results = earthaccess.search_data(
  short_name='S5P_L2__CH4____HiR',  ## https://disc.gsfc.nasa.gov/datasets/S5P_L2__CH4____HiR_1/summary
  cloud_hosted=True,
  bounding_box=(-10, 20, 10, 50),
  temporal=("2020-01", "2020-07"),
  count=5
  )

# pdb.set_trace()

earthaccess.download(results, "C:/Users/qnw/Downloads/files")
# files = earthaccess.download(results, "C:/Users/qnw/Downloads/files")

# Error while downloading the file S5P_OFFL_L2__CH4____20200101T110146_20200101T124316_11493_01_010302_20200103T041214.nc
# Traceback (most recent call last):
#   File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
#     r.raise_for_status()
#   File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
#     raise HTTPError(http_error_msg, response=self)
# requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.1/2020/001/S5P_OFFL_L2__CH4____20200101T110146_20200101T124316_11493_01_010302_20200103T041214.nc

Screenshot 2024-04-18 141538

PS C:\Users\qnw\Downloads> python -m pdb earthaccess_working.py
> c:\users\qnw\downloads\earthaccess_working.py(5)<module>()
-> import earthaccess
(Pdb) n
> c:\users\qnw\downloads\earthaccess_working.py(6)<module>()
-> earthaccess.login()  # use jessicanwelch
(Pdb) n
Enter your Earthdata Login username: jessicanwelch
Enter your Earthdata password: 
> c:\users\qnw\downloads\earthaccess_working.py(16)<module>()
-> results = earthaccess.search_data(
(Pdb) break 26
Breakpoint 1 at c:\users\qnw\downloads\earthaccess_working.py:26
(Pdb) c
Granules found: 1104
> c:\users\qnw\downloads\earthaccess_working.py(26)<module>()   
-> earthaccess.download(results, "C:/Users/qnw/Downloads/files")
(Pdb) s
--Call--
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(161)download()
-> def download(
(Pdb) s
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(182)download()
-> provider = _normalize_location(provider)
(Pdb) s
--Call--
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(16)_normalize_location()
-> def _normalize_location(location: Optional[str]) -> Optional[str]:
(Pdb) n
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(26)_normalize_location()
-> if location is not None:
(Pdb)
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(28)_normalize_location()
-> return location
(Pdb)
--Return--
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(28)_normalize_location()->None
-> return location
(Pdb)
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(183)download()
-> if isinstance(granules, DataGranule):
(Pdb)
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(185)download()
-> elif isinstance(granules, str):
(Pdb)
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(187)download()
-> try:
(Pdb)
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(188)download()
-> results = earthaccess.__store__.get(granules, local_path, provider, threads)
(Pdb)
 Getting 5 granules, approx download size: 0.38 GB
QUEUEING TASKS | : 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 180.55it/s] 
PROCESSING TASKS | :   0%|                                                                                                                                   | 0/5 [00:00<?, ?it/sE 
rror while downloading the file S5P_OFFL_L2__CH4____20200101T110146_20200101T124316_11493_01_010302_20200103T041214.nc
Error while downloading the file S5P_RPRO_L2__CH4____20200101T110146_20200101T124316_11493_03_020400_20221120T012417.nc
Error while downloading the file S5P_OFFL_L2__CH4____20200101T124316_20200101T142446_11494_01_010302_20200103T054229.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.1/2020/001/S5P_OFFL_L2__CH4____20200101T110146_20200101T124316_11493_01_010302_20200103T041214.nc

Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2020/001/S5P_RPRO_L2__CH4____20200101T110146_20200101T124316_11493_03_020400_20221120T012417.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.1/2020/001/S5P_OFFL_L2__CH4____20200101T124316_20200101T142446_11494_01_010302_20200103T054229.nc

PROCESSING TASKS | :  20%|████████████████████████▌                                                                                                  | 1/5 [00:01<00:04,  1.05s/it] 
Error while downloading the file S5P_OFFL_L2__CH4____20200102T104246_20200102T122416_11507_01_010302_20200104T034904.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.1/2020/002/S5P_OFFL_L2__CH4____20200102T104246_20200102T122416_11507_01_010302_20200104T034904.nc

Error while downloading the file S5P_RPRO_L2__CH4____20200101T124316_20200101T142446_11494_03_020400_20221120T013330.nc
Traceback (most recent call last):
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\earthaccess\store.py", line 607, in _download_file
    r.raise_for_status()
  File "C:\Users\qnw\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__CH4____HiR.2/2020/001/S5P_RPRO_L2__CH4____20200101T124316_20200101T142446_11494_03_020400_20221120T013330.nc

PROCESSING TASKS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  4.44it/s] 
COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5001.55it/s] 
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(193)download()
-> return results
(Pdb)
--Return--
> c:\users\qnw\appdata\local\programs\python\python38\lib\site-packages\earthaccess\api.py(193)download()->[Exception(), Exception(), Exception(), Exception(), Exception()]        
-> return results

@mfisher87
Copy link
Collaborator

Thanks for sharing back, Jess! I'm not sure I'll have time to dig in to this before next hack day. Would that be a good time to chat more about this?

@jessnicwelch
Copy link
Collaborator

If you need to "talk," I have availability next Tuesday and Friday. Lemme know what works for you.

@chuckwondo
Copy link
Collaborator

chuckwondo commented Apr 19, 2024

The problem is that we're using pqdm under the covers (not sure I agree with how/where this is used, but that's a separate topic), which is what is parallelizing the downloads and providing the "pretty" progress bars.

Unfortunately, we're using pqdm's default error-handling, which is to "ignore" errors, and simply collect them. This is why you're getting back a list of errors rather than having an error raised. This leads to the poor user experience you're having because no error is raised, so it is misleading you to believe that nothing went wrong, only to discover that your download "results" are simply a bunch of exception objects.

@mfisher87, a possible short-term "fix" for this would be for us to pass either "immediate" or "deferred" (instead of defaulting to "ignore") for pqdm's exception_behavior parameter. Choosing "immediate" will immediately reraise the first error encountered. Choosing "deferred" will continue working in the face of exceptions and raise an exception at the end, with the exception being a conglomeration of all encountered errors. Given that a single download failure very likely means that all will fail (certainly in the case that an EULA has not been accepted), it might be pointless (and wasteful) to use "deferred", so "immediate" might be the better choice.

@mfisher87
Copy link
Collaborator

Nice sleuthing! "ignore" is obviously not what we want. It looks like pqdm might also be making debugging more of a challenge here, too! Sorry @jessnicwelch if that was proving to be a wrench in the works. What do you think of those options?

My opinion: I tend to agree with "immediate", but I also really like the idea of exposing a fail_at_end: bool = False parameter that lets users control it. I don't think we should let users disable the exception entirely, as is the current behavior, I think we should encourage them to use try/except for that. So they'd only be choosing between "deferred" and "immediate" pqdm options.

@chuckwondo
Copy link
Collaborator

My opinion: I tend to agree with "immediate", but I also really like the idea of exposing a fail_at_end: bool = False parameter that lets users control it. I don't think we should let users disable the exception entirely, as is the current behavior, I think we should encourage them to use try/except for that. So they'd only be choosing between "deferred" and "immediate" pqdm options.

I like the idea of exposing the control to the user. I suggest we name the additional parameter fail_fast, which we would default to True, if the user does not specify a value.

@mfisher87
Copy link
Collaborator

I definitely like fail_fast: bool = True better :) Thinking forward to when we have a CLI, it might be used as --no-fail-fast. I don't have strong feelings about that. Do you?

@jessnicwelch
Copy link
Collaborator

Nice sleuthing! "ignore" is obviously not what we want. It looks like pqdm might also be making debugging more of a challenge here, too! Sorry @jessnicwelch if that was proving to be a wrench in the works. What do you think of those options?

My opinion: I tend to agree with "immediate", but I also really like the idea of exposing a fail_at_end: bool = False parameter that lets users control it. I don't think we should let users disable the exception entirely, as is the current behavior, I think we should encourage them to use try/except for that. So they'd only be choosing between "deferred" and "immediate" pqdm options.

No preferences because I don't know enough to provide a useful opinion. 😆

@chuckwondo
Copy link
Collaborator

I definitely like fail_fast: bool = True better :) Thinking forward to when we have a CLI, it might be used as --no-fail-fast. I don't have strong feelings about that. Do you?

I prefer fail_fast (it's shorter and is used in numerous other contexts, so it brings familiarity), and I like your thinking about the future CLI.

@mfisher87
Copy link
Collaborator

I think we're ready to implement this. @jessnicwelch are you interested in taking that on?

@Sherwin-14 was looking for an issue that might be a good fit, but there are other options as well if you want this one :)

@jessnicwelch
Copy link
Collaborator

I think we're ready to implement this. @jessnicwelch are you interested in taking that on?

@Sherwin-14 was looking for an issue that might be a good fit, but there are other options as well if you want this one :)

I'm not sure what I'm implementing... sorry. If it's coding earthaccess, I don't think I have the expertise to do that. I'm better at documentation and testing.

@mfisher87
Copy link
Collaborator

mfisher87 commented May 21, 2024

No apologies necessary :) Awesome work getting us this far! 🚀

Just to sum up in one place for everyone:

  • earthaccess.download(), when it encounters a 403 due to a user not accepting EULA. @jessnicwelch showed a way to reproduce in this thread. When this error happens, we need to display a message like: Access to this data has been denied because it requires you to accept a End-User License Agreement (EULA). Follow this link to view and accept the EULA: https://{the rest of it}. The URL we need to display should be contained in the 403 response data. This is what this ticket is about.

  • earthaccess.download() will continue after an error. New ticket for this: earthaccess.download() ignores errors #581

I'll update the OP with this summary.

@asteiker
Copy link
Member

asteiker commented Sep 3, 2024

We can utilize the approach Harmony is taking to provide an error response and relevant EULA link: https://github.com/nasa/harmony/blob/1d80b2bae53a0fc8209056eb7205f91875150bb1/services/harmony/app/middleware/cmr-collection-reader.ts#L29-L58

@mfisher87
Copy link
Collaborator

Amazing! Thanks for tracking this down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants