Option to not cache entry URLs #1322
Answered
by
mnmkng
SeeYangZhi
asked this question in
Q&A
-
Is there a method to not cache the URLs added to requestQueue?
|
Beta Was this translation helpful? Give feedback.
Answered by
mnmkng
Mar 31, 2022
Replies: 2 comments 5 replies
-
What exactly do you mean by that? |
Beta Was this translation helpful? Give feedback.
3 replies
-
It's a bit verbose, but I wanted to make it clear what's going on: import Apify from 'apify'
const repeatableRequests = []
const requestQueue = await Apify.openRequestQueue();
// You can do this only once. The requests will stay in the queue until you delete the file.
const { request } = await requestQueue.addRequest({ url: 'https://example.com' })
repeatableRequests.push(request)
// But you need to save the information about the requests for the subsequent runs.
await Apify.setValue('repeatable-requests', repeatableRequests);
// run the crawler normally
const crawler = new Apify.CheerioCrawler({
requestQueue,
handlePageFunction: async ({ request }) => {
console.log(request.url)
}
})
await crawler.run()
// This is the place where the selected requests are updated to be crawlable again.
// We're just telling the queue that those requests were not handled yet,
// which re-enables their crawling in the queue.
const requestsToUpdate = await Apify.getValue('repeatable-requests');
const promises = requestsToUpdate.map(req => {
return requestQueue.client.updateRequest({
...req, // you need a copy of the original request
handledAt: undefined // and change their handled state
})
});
await Promise.all(promises) |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
SeeYangZhi
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's a bit verbose, but I wanted to make it clear what's going on: