Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a warning if more than a certain threshold of objects are updated in an import #340

Open
hancush opened this issue May 21, 2021 · 0 comments

Comments

@hancush
Copy link
Contributor

hancush commented May 21, 2021

We encountered an issue over in LA Metro where audio links disappeared from every event in a single scrape / import due to an outage in Legistar: Metro-Records/la-metro-councilmatic#713

Assuming an existing database, we generally only expect a handful of updates per scrape. Mass updates could be an indication of an important and/or breaking change at the scraping source. In this case, it would have been a very useful alert that something had gone wrong and allowed us to be more proactive in reaching a resolution.

It would be awesome if pupa had a configurable expected update threshold with a sane default, such as 75%, and would log a warning if more than that percentage of scraped entities are updated in a given run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant