-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML is downloaded instead of PDF #1
Comments
An example of the interstitial page randomly encountered when trying downloads: Skip to Download request URL: It has sent this kind of POST payload to it to continue: There is also |
After some further research, it's clear this will not be that easy. The interstitial's page button actually executes JavaScript (e.g. https://dz2cdn2.dzone.com/storage/pub/11369473-combined.js) on the Skip to Download button: <a href="#" ng-click="download(campaign.itemId, 'rejected')" class="ng-binding">Skip to Download</a> The click function is: c.on("click",function(a,c){b.$apply(function(){k(b,{$event:c||a})})}); Go Colly does not support JavaScript execution. One option would be to use chromedp. This example demonstrates how to evaluate JavaScript and retrieve the result. Investigating options. |
Random links do not get redirected to the PDF, but rather to a website - it seems like protection against bots. This is the content of the website:
The fix is probably to extract real
Location
of each PDF, like:https://dzone.com/storage/assets/2805-rc001-gwt_style_online.pdf
instead of the origin:
https://dzone.com/asset/download/6
The text was updated successfully, but these errors were encountered: