-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add recommended Prometheus dashboards for Go. #809
Conversation
This work was done in prep for our talk on GopherCon UK about Go Runtime Metrics. Feedback welcome to the dashboard data, layout and style! Essentially it has all metric we are maintaining in client_golang (most popular Go metric SDK). Exposed metrics also align with Go Team recommendation golang/go#67120 Signed-off-by: bwplotka <[email protected]>
Open questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on GKE.. but should work on GCE technically? I didn't check though.
Cool, do you mean via Ops Agent writing to prometheus_target? Not sure off the top of my head how the labels (e.g. cluster_name
, namespace_name
) are populated in that case...
Metrics for Sched Latency and Runtime Configuration options are not yet very common (we work on adopting this in OSS as we speak). This means likely those graphs could not work OOT. I think that's fine, those are new metrics, but maybe would create support burden?
We have some precedence for this, e.g. for NVIDIA DCGM. The best practice here is to try and be explicit via the section or chart titles whenever some graphs may not be populated.
I applied some grouping that makes sense to me, kind of works with auto-grouping feature (which I don't know how works technically), so I am bit guessing here on what's the recommended grouping I should use.
Can you elaborate on what do you mean grouping? Do you mean grouping the charts into the collapsible group widgets ("Version", "Memory", etc) or the sum by (X, Y, Z)
on the PromQL queries?
Similarly I used mix of "global" filters vs filters as vars. Not really sure when I should used vars vs global filters, so I guessed a bit, feedback welcome (:
Given that all of your charts are on prometheus_target
metrics, I expect both to behave similarly. Here's how you would pick between the two:
- template vars are nice when you want to opt-in and have the filter apply to chart A but not chart B
- template vars are nice when not all the labels line up (e.g. you have a
cluster_name
template variable, but you want to apply it to a GKE system metric with a label calledcluster
, or a log). - the expansion of template vars is explicit when you inspect the query, but global vars are more implicit
Hope this helps! I left a handful of formatting comments!
"widget": { | ||
"title": "Runtime Configuration", | ||
"collapsibleGroup": { | ||
"collapsed": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double-checking if it's intentional which sections you have collapsed or uncollapsed by default on page load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, those are useful only in certain, less often cases (but still important enough to have those on dashboard as per golang/go#67120)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sounds good. Consider if you should keep it where it is or move it to the bottom (below Concurrency and Memory, which are default open and presumably more general / widely applicable). I'll leave it up to you!
Another option (your call if this is right in this situation!) is to add an explanatory text widget (like this concept) |
Signed-off-by: bwplotka <[email protected]>
(C) / (D): You can actually open the network tab to inspect the structure of the network requests being sent to the GCM API :) As you can see, when you set template variables, the PromQL is being expanded / interpolated browser-side, which is why the application is visible. When you set global "group bys" and "filters", it's being applied as a (D): Strange, I just did this and the cluster applied to both correctly the chart and the table side by side. If this is reproducing reliably for you, can you record a screencast and file against buganizer component 133331? We can figure out internally what is going on (whether it's an experiment flag, etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refresh the screenshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, forgot, thanks!
This work was done in prep for our talk on GopherCon UK about Go Runtime Metrics.
Feedback welcome to the dashboard data, layout and style!
Essentially it has all metric we are maintaining in client_golang (most popular Go metric SDK). Exposed metrics also align with Go Team recommendation golang/go#67120