-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a DOM-based technique as a baseline #195
Comments
We should collect screenshots with DOM information also. Presently we don't have them. |
We have implemented collecting DOM information too, but we haven't collected any. |
@marco-c do we simply need to run collect.py to start collecting data with dom info? |
Yes, I think so. We implemented it recently. You should run it for a couple of websites and check that it is actually generating correct data. |
@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ? |
Yes, because of all the time we have to wait to be sure we have loaded everything the crawler is quite slow. |
We should compare our technique based on CNN with a technique that doesn't use machine learning.
The text was updated successfully, but these errors were encountered: