Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curiosity Scraper Fix #200

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

AidanPowers
Copy link

It looks like the old CuriosityScraper was broken sometime early this year.
The good news is the PerseveranceScraper is a newer version with some similarity.
After working through the website, it looks like this is the new link for the raw image json
https://mars.nasa.gov/api/v1/raw_image_items/?order=sol%20desc,instrument_sort%20asc,sample_type_sort%20asc,%20date_taken%20desc&per_page=200&page=0&condition_1=msl:mission&condition_2=4327:sol:in

I have implemented this new link, and attempted to merge it with the Perseverance logic
As a warning, this is my first project like this so please carefully review code before merge.
If an instrument name is not found in the database, it will add it. I am not sure if this is necessary as I may have not properly set up my database with the instrument names initially.

Here it is working on local host through my docker container for latest images, I am unsure if it will work for the older images on the main instance.
image

@adamjcolvin
Copy link

@AidanPowers I came across this issue as well and did some digging into it. Are you sure that the URL you're using will only return Curiosity images? There doesn't seem to be any filtering by the rover name?

That being said, the perseverance scraper uses this URL, which doesn't reference perseverance by name either:
https://mars.nasa.gov/rss/api/?feed=raw_images&category=mars2020&feedtype=json&latest=true

Is there some documentation around mars.nasa.gov that I've missed? I've struggled to find any.

@AidanPowers
Copy link
Author

@adamjcolvin Unfortunately I was not able to find any documentation on this API, but here was my thought process:
The old link was not returning a JSON anymore, so I assumed it was broken

In the Perseverance scraper link there is a category for "mars2020" which is assuredly the internal name for the rover. However I did not get any returns when looking at "mars2009", "mars2010", and so on.

This is the webpage for viewing the latest images downloaded from Curiosity:
https://mars.nasa.gov/msl/multimedia/raw-images/?order=sol+desc%2Cinstrument_sort+asc%2Csample_type_sort+asc%2C+date_taken+desc&per_page=50&page=0&mission=msl
This webpage exists and works, so I assumed that there was an API somewhere.
Upon analysis of the network calls on page load it makes a get request to this new API:
https://mars.nasa.gov/api/v1/raw_image_items/?order=sol+desc%2Cinstrument_sort+asc%2Csample_type_sort+asc%2C+date_taken+desc&per_page=50&page=0&condition_1=msl%3Amission&search=&extended=thumbnail%3A%3Asample_type%3A%3Anoteq
image

However, it uses some different tags at the end of the URL as compared to the old API.
So I plugged in the old API tags to this API while adjusting syntax to match.

Given that this is a newer API, it should be more similar to the Perseverance API, so I based my scraper off that code.

I think that this is a Curiosity exclusive API for a few reasons:
In the json response there is no field for rover identifier, however in the "description" field it appears to be hard coded to return the image as if it were from Curiosity
The image returned from the API match the latest images as seen on the website.

Also feel free to improve/replace of my code or API stuff, this project is my first experience with Ruby or API development, so bugs are highly likely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants