Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

landing_worker: delete patch content after landing (bug 1879536) #374

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zzzeid
Copy link
Contributor

@zzzeid zzzeid commented Feb 9, 2024

  • delete patch content after landing
  • add one-time cli command to do this for previously landed jobs
  • update tests to reflect new behaviour

@zzzeid zzzeid force-pushed the zeid/bug-1879536-clean-up-patches branch 2 times, most recently from 3995e78 to 1a13811 Compare February 9, 2024 15:54
- delete patch content after landing
- add one-time cli command to do this for previously landed jobs
- update tests to reflect new behaviour
@zzzeid zzzeid force-pushed the zeid/bug-1879536-clean-up-patches branch from 1a13811 to cfb4598 Compare February 9, 2024 16:49
@zzzeid zzzeid marked this pull request as ready for review February 9, 2024 16:55
@zzzeid zzzeid requested a review from cgsheeh February 9, 2024 16:56
Comment on lines +442 to +446
# Patches are no longer necessary, delete them.
for revision in job.revisions:
revision.patch_bytes = b""
db.session.commit()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be deleting these right away, it's useful to have the patch contents around for debugging and other archaeological purposes. It would be a pain if we needed to inspect the patch contents but we couldn't do so, because we want to save a few MB of disk space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this previously and concluded that since the commit hash is present, then it's not going to be much added value to have the raw patch after landing (for successful landings, at least).

I agree that there could be some benefit to keeping those around, however, these are taking a bit more than a few MBs at this time (a good portion of the 2.3GB as of today) and appear to be increasing dramatically in the last few weeks. As I understand we have a cap of 10GB (or at least, in the current tier).

The time limit approach might be better though regardless. Let me think some more about this.

Comment on lines +110 to +123
@cli.command(name="clean-landing-job-patches")
def clean_landing_job_patches():
"""Iterate over all landed jobs and delete their patches."""
from landoapi.models.landing_job import LandingJob, LandingJobStatus
from landoapi.storage import db, db_subsystem

db_subsystem.ensure_ready()
jobs = LandingJob.query.filter(LandingJob.status == LandingJobStatus.LANDED).all()
for job in jobs:
for revision in job.revisions:
revision.patch_bytes = b""
db.session.commit()


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the query only consider patches that are older than a certain date, with a CLI flag to set some specific time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants