Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 395 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 395 Bytes

colly-responsible

Responsible crawling with Colly. For the better Internet.

Based on lessons learned while writing Idun and subsequently getting banned by half of the website operators...

Supported limits

  • HTTP status code 429
  • HREF REL NOFOLLOW
  • robots.txt
  • actual delay between requests
  • URL tests (i.e. extension, domain, etc.)
  • Max run time