Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a DOM-based technique as a baseline #195

Open
marco-c opened this issue Jun 9, 2018 · 6 comments
Open

Implement a DOM-based technique as a baseline #195

marco-c opened this issue Jun 9, 2018 · 6 comments

Comments

@marco-c
Copy link
Owner

marco-c commented Jun 9, 2018

We should compare our technique based on CNN with a technique that doesn't use machine learning.

@marco-c marco-c changed the title Implement a DOM-based technique without using CNN as a baseline Implement a DOM-based technique as a baseline Jun 9, 2018
@sagarvijaygupta
Copy link
Collaborator

We should collect screenshots with DOM information also. Presently we don't have them.

@marco-c
Copy link
Owner Author

marco-c commented Jun 9, 2018

We have implemented collecting DOM information too, but we haven't collected any.

@Shashi456
Copy link
Contributor

@marco-c do we simply need to run collect.py to start collecting data with dom info?

@marco-c
Copy link
Owner Author

marco-c commented Jun 13, 2018

Yes, I think so. We implemented it recently. You should run it for a couple of websites and check that it is actually generating correct data.

@Shashi456
Copy link
Contributor

@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ?

@marco-c
Copy link
Owner Author

marco-c commented Jun 22, 2018

@marco-c although the dom info seems to be getting collected properly it is very slow in doing so. and could you tell me how i could add them to the repo since it's a git lfs file ?

Yes, because of all the time we have to wait to be sure we have loaded everything the crawler is quite slow.
You can add them normally, git lfs is completely transparent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants