-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any success with fetching images ! #33
Comments
Well google only load the real image after the page loaded, so when I fetch the original page, it will show the loading image rather than real image. I am still seeking the solution. |
The image is being stored in Javascript variable. If we can retrieve the
object corresponding to each variable, then image can be fetched.
AFAIK, we don't need to load the news article page. Instead, we can fetch
the corresponding Javascript variable only from the parsed html page.
रवि, 25 अक्तू॰ 2020, 15:10 को Hurin Hu <[email protected]> ने लिखा:
… Well google only load the real image after the page loaded, so when I
fetch the original page, it will show the loading image rather than real
image. I am still seeking the solution.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/HurinHu/GoogleNews/issues/33#issuecomment-716119478>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHIB5BSFUPUNFKI6XFY7VBLSMPXB7ANCNFSM4S6GIA2A>
.
|
I know, but when you using the script to fetch the page, the js is not executed, the images are dynamic loaded to js. That is what I got last time, will check again later if google has made new changes. |
Sure thanks ! Let us know whatever the result would be. |
We can get image by using another module inside this. Will it be a convenient way? |
|
newspaper3k |
Well, currently it can return default loading image, as google load the image through js, so it may need to execute the js to get the correct url. Any fetching script without js execution would not help. I have checked newpaper3k, it uses requests.get() method, which would not help. I am not sure how you get the result, can you post some sample code?
|
Here have attached the full code that I'm using currently. Code
Output |
Well, it is a solution, but this get the images from the news' page, which will fetch all the items one by one, not from google news directly. It is not a proper solution to do as it may take longer time to process ten or more web requests to get the images. If there are multiple pages requested, it may have side effects, like being blocked by the website by fetching the url too frequently or wait for a longer time. If anybody has this kind of needs, this method may help, but just be aware, set some delay time for each request, or you might easily be blocked.
|
Yes you are right. That's why I asked earlier. Also delay is important as you mentioned.
|
@HurinHu just added some comments to my pull request you closed. Let me know if that makes any difference. |
Dear Author @HurinHu ,
Thanks for the package !
Fetching images is still a pain in the code. The images are stored as a javascript object in the self.content variable (shown in image). I tried extracting the value of variable but didn't succeed. Could you try ?
The text was updated successfully, but these errors were encountered: