aws-janitor-boskos: add clean time and process time metrics #75

cpanato · 2021-02-24T09:24:27Z

...to collect the response time when cleaning and for the entire process when doing the janitor process

Make the histogram and created the exponential buckets, I've spoken in the past with Frederic Branczyk - @brancz and he suggests following this approach since it is very similar to what I've implemented in the past then used the same 😄

this is part of #13

opening this PR for feedback and see if sounds good, and then will do it for the other janitors

/assign @ixdy

ixdy

Thanks! This generally LGTM, just one question.

ixdy · 2021-02-25T20:48:09Z

cmd/aws-janitor-boskos/main.go

-
-	logrus.WithFields(logrus.Fields{"name": res.Name, "duration": duration.Seconds()}).Info("Finished cleaning")
+	collectMetric(start, res.Name, "clean")
+	logrus.WithFields(logrus.Fields{"name": res.Name, "duration": time.Since(start).Seconds()}).Info("Finished cleaning")


do you think we should include sweepCount somehow as well? might be interesting to understand if a longer duration is due to multiple sweeps.

That sounds a great idea, I will update this PR over the weekend
thanks for your review and feedback

added, but not 100% if it is correct, when you have time can you please take a look and let me know? thanks!

cmd/aws-janitor-boskos/main.go

ixdy · 2021-03-01T21:53:48Z

cmd/aws-janitor-boskos/main.go

-	logrus.WithFields(logrus.Fields{"name": res.Name, "duration": duration.Seconds()}).Info("Finished cleaning")
+	sweepsGauge.WithLabelValues(res.Name).Set(float64(countSweeps))
+	collectMetric(start, res.Name, "clean")
+	logrus.WithFields(logrus.Fields{"name": res.Name, "duration": time.Since(start).Seconds(), "sweeps": countSweeps}).Info("Finished cleaning")


Looking at this code again, I now realize I was misunderstanding it before - I thought that the specified sweepCount was the maximum number of attempted sweeps (i.e. retries if there was an error), but actually, I see that it always sweeps that many times, regardless of error or success.

We could keep the metric here, though it's not nearly as meaningful as I thought it was (since it'll basically remain the same for a given configuration). if we keep it, we can just pass sweepCount instead of using the separate countSweeps variable though.

I can drop that for now and in a follow-up do a small refactor in this part to only retry when it failed, and then we add back the metric. WDTY?

I think part of the reason it does multiple passes is that the AWS janitor typically does not return errors if it fails to delete resources, under the assumption that a later pass might successfully delete the resource (e.g. if there is some lingering dependency that gets cleaned up at a later stage).

This probably isn't ideal, but we'd need to investigate further and fix things to achieve the desired "retry only on error" behavior.

TL;DR we probably want to leave the existing behavior of always sweeping multiple times. The metrics you're adding can help us determine if this is taking too long and we should look into optimizing this process. :)

@ixdy sounds good :)

update the code to reflect your feedback, thanks again

to collect the response time when cleaning and for the entire process when doing the janitor process

k8s-ci-robot · 2021-03-03T23:21:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cpanato, ixdy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/aws-janitor-boskos/OWNERS~~ [ixdy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot assigned ixdy Feb 24, 2021

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Feb 24, 2021

k8s-ci-robot requested review from randomvariable and richardcase February 24, 2021 09:24

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 24, 2021

ixdy reviewed Feb 25, 2021

View reviewed changes

cpanato force-pushed the GH-13-aws-janitor-boskos branch from 95e6d25 to 41c5115 Compare February 28, 2021 16:00

ixdy reviewed Mar 1, 2021

View reviewed changes

cpanato force-pushed the GH-13-aws-janitor-boskos branch from 41c5115 to b8a8098 Compare March 3, 2021 13:45

aws-janitor-boskos: add clean time and process time metrics

d040320

to collect the response time when cleaning and for the entire process when doing the janitor process

cpanato force-pushed the GH-13-aws-janitor-boskos branch from b8a8098 to d040320 Compare March 3, 2021 13:47

ixdy approved these changes Mar 3, 2021

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2021

k8s-ci-robot merged commit 7fe7571 into kubernetes-sigs:master Mar 3, 2021

cpanato deleted the GH-13-aws-janitor-boskos branch March 4, 2021 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-janitor-boskos: add clean time and process time metrics #75

aws-janitor-boskos: add clean time and process time metrics #75

cpanato commented Feb 24, 2021

ixdy left a comment

ixdy Feb 25, 2021

cpanato Feb 26, 2021

cpanato Feb 28, 2021

ixdy Mar 1, 2021

cpanato Mar 2, 2021

ixdy Mar 2, 2021

cpanato Mar 3, 2021

k8s-ci-robot commented Mar 3, 2021

aws-janitor-boskos: add clean time and process time metrics #75

aws-janitor-boskos: add clean time and process time metrics #75

Conversation

cpanato commented Feb 24, 2021

ixdy left a comment

Choose a reason for hiding this comment

ixdy Feb 25, 2021

Choose a reason for hiding this comment

cpanato Feb 26, 2021

Choose a reason for hiding this comment

cpanato Feb 28, 2021

Choose a reason for hiding this comment

ixdy Mar 1, 2021

Choose a reason for hiding this comment

cpanato Mar 2, 2021

Choose a reason for hiding this comment

ixdy Mar 2, 2021

Choose a reason for hiding this comment

cpanato Mar 3, 2021

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 3, 2021