-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
landing_worker: delete patch content after landing (bug 1879536) #374
base: main
Are you sure you want to change the base?
Conversation
zzzeid
commented
Feb 9, 2024
•
edited
Loading
edited
- delete patch content after landing
- add one-time cli command to do this for previously landed jobs
- update tests to reflect new behaviour
3995e78
to
1a13811
Compare
- delete patch content after landing - add one-time cli command to do this for previously landed jobs - update tests to reflect new behaviour
1a13811
to
cfb4598
Compare
# Patches are no longer necessary, delete them. | ||
for revision in job.revisions: | ||
revision.patch_bytes = b"" | ||
db.session.commit() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be deleting these right away, it's useful to have the patch contents around for debugging and other archaeological purposes. It would be a pain if we needed to inspect the patch contents but we couldn't do so, because we want to save a few MB of disk space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this previously and concluded that since the commit hash is present, then it's not going to be much added value to have the raw patch after landing (for successful landings, at least).
I agree that there could be some benefit to keeping those around, however, these are taking a bit more than a few MBs at this time (a good portion of the 2.3GB as of today) and appear to be increasing dramatically in the last few weeks. As I understand we have a cap of 10GB (or at least, in the current tier).
The time limit approach might be better though regardless. Let me think some more about this.
@cli.command(name="clean-landing-job-patches") | ||
def clean_landing_job_patches(): | ||
"""Iterate over all landed jobs and delete their patches.""" | ||
from landoapi.models.landing_job import LandingJob, LandingJobStatus | ||
from landoapi.storage import db, db_subsystem | ||
|
||
db_subsystem.ensure_ready() | ||
jobs = LandingJob.query.filter(LandingJob.status == LandingJobStatus.LANDED).all() | ||
for job in jobs: | ||
for revision in job.revisions: | ||
revision.patch_bytes = b"" | ||
db.session.commit() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make the query only consider patches that are older than a certain date, with a CLI flag to set some specific time.