Skip to content

Commit

Permalink
Note on scopes etc.
Browse files Browse the repository at this point in the history
  • Loading branch information
anjackson committed Nov 9, 2023
1 parent d665f82 commit 99c3e18
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion ingest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Similarly to Kafka, use the supplied scripts (or varient of them) to launch the

A few differnet things need to be set up when running a crawler:

- Check scope surts and exclusions. These are on shared files with the host, and may need updating based on data from W3ACT/curators.
- Check scope surts and exclusions. These are on shared files with the host, and may need updating based on data from W3ACT/curators. FC manages scope and seeds via Kafka, but exclusions are manual. DC needs explicit scope and exclusion configuration.
- Update the Geo-IP DB for DC: https://github.com/ukwa/ukwa-services/issues/123

Note that setting up seeds, scope and exclusions for the domain crawl is particularly involved, and is documented at _TBA IS ON GITLAB_
Expand Down

0 comments on commit 99c3e18

Please sign in to comment.