You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default analysis metric with Traefik and Prometheus query the wrong metric service tag if the service is not using the same name as the deployment.
You also need to make the load tester call the service via Traefik, instead of directly hitting the canary service, as otherwise you'll get no metrics at all. E.g. something like http://traefik.traefik:PORT/route-to-service as opposed to http://podinfo-canary:PORT.
Trigger Flagger to start the canary analysis and review the metrics in Prometheus. You'll find the traefik_service_request_duration_seconds_bucket shows up with a tag of NAMESPACE-podinfo-service-canary-PORT instead of NAMESPACE-podinfo-canary-PORT.
You can verify that Flagger queries for a tag matching NAMESPACE-podinfo-canary-PORT (using the deployment name without "-service") by enabling query logging in Prometheus.
Expected behavior
Flagger queries the correct metric even when the service name differs from the deployment (target) name.
Additional context
Flagger version: 1.40.0
Kubernetes version: 1.30
Service Mesh provider: N/A
Ingress provider: Traefik
I'm aware that I can create a custom metric instead of using the built-in one, but it would make sense to me that it uses the "correct" metric tag by default.
I also think it could be mentioned in the documentation that addServicesLabels must be true in the Traefik Prometheus config or there will be no service tag at all.
Additionally, to get metrics out of Traefik you need to load test the service via Traefik, not directly on the canary service. The example page does not. Possibly because it was adapted from a setup with a service mesh where I assume the metrics are gathered via the mesh regardless of how it's called.
I'm happy to take a stab at a PR for using the right service tag name, and to extend the documentation with my observations above, if it's of any interest to the project.
The text was updated successfully, but these errors were encountered:
Describe the bug
The default analysis metric with Traefik and Prometheus query the wrong metric service tag if the service is not using the same name as the deployment.
To Reproduce
Use the basic example from the Traefik Canary Deployments tutorial; https://docs.flagger.app/tutorials/traefik-progressive-delivery, but rename the service so that it's different from the deployment name. E.g.
podinfo-service
instead ofpodinfo
.You also need to make the load tester call the service via Traefik, instead of directly hitting the canary service, as otherwise you'll get no metrics at all. E.g. something like
http://traefik.traefik:PORT/route-to-service
as opposed tohttp://podinfo-canary:PORT
.Trigger Flagger to start the canary analysis and review the metrics in Prometheus. You'll find the
traefik_service_request_duration_seconds_bucket
shows up with a tag ofNAMESPACE-podinfo-service-canary-PORT
instead ofNAMESPACE-podinfo-canary-PORT
.You can verify that Flagger queries for a tag matching
NAMESPACE-podinfo-canary-PORT
(using the deployment name without "-service") by enabling query logging in Prometheus.Expected behavior
Flagger queries the correct metric even when the service name differs from the deployment (target) name.
Additional context
I'm aware that I can create a custom metric instead of using the built-in one, but it would make sense to me that it uses the "correct" metric tag by default.
I also think it could be mentioned in the documentation that
addServicesLabels
must betrue
in the Traefik Prometheus config or there will be no service tag at all.Additionally, to get metrics out of Traefik you need to load test the service via Traefik, not directly on the canary service. The example page does not. Possibly because it was adapted from a setup with a service mesh where I assume the metrics are gathered via the mesh regardless of how it's called.
I'm happy to take a stab at a PR for using the right service tag name, and to extend the documentation with my observations above, if it's of any interest to the project.
The text was updated successfully, but these errors were encountered: