Traefik metrics queries are on the wrong metric if the service name differs from the deployment #1767

judgeaxl · 2025-02-21T04:23:58Z

Describe the bug

The default analysis metric with Traefik and Prometheus query the wrong metric service tag if the service is not using the same name as the deployment.

To Reproduce

Use the basic example from the Traefik Canary Deployments tutorial; https://docs.flagger.app/tutorials/traefik-progressive-delivery, but rename the service so that it's different from the deployment name. E.g. podinfo-service instead of podinfo.

You also need to make the load tester call the service via Traefik, instead of directly hitting the canary service, as otherwise you'll get no metrics at all. E.g. something like http://traefik.traefik:PORT/route-to-service as opposed to http://podinfo-canary:PORT.

Trigger Flagger to start the canary analysis and review the metrics in Prometheus. You'll find the traefik_service_request_duration_seconds_bucket shows up with a tag of NAMESPACE-podinfo-service-canary-PORT instead of NAMESPACE-podinfo-canary-PORT.

You can verify that Flagger queries for a tag matching NAMESPACE-podinfo-canary-PORT (using the deployment name without "-service") by enabling query logging in Prometheus.

Expected behavior

Flagger queries the correct metric even when the service name differs from the deployment (target) name.

Additional context

Flagger version: 1.40.0
Kubernetes version: 1.30
Service Mesh provider: N/A
Ingress provider: Traefik

I'm aware that I can create a custom metric instead of using the built-in one, but it would make sense to me that it uses the "correct" metric tag by default.

I also think it could be mentioned in the documentation that addServicesLabels must be true in the Traefik Prometheus config or there will be no service tag at all.

Additionally, to get metrics out of Traefik you need to load test the service via Traefik, not directly on the canary service. The example page does not. Possibly because it was adapted from a setup with a service mesh where I assume the metrics are gathered via the mesh regardless of how it's called.

I'm happy to take a stab at a PR for using the right service tag name, and to extend the documentation with my observations above, if it's of any interest to the project.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traefik metrics queries are on the wrong metric if the service name differs from the deployment #1767

Traefik metrics queries are on the wrong metric if the service name differs from the deployment #1767

judgeaxl commented Feb 21, 2025

Traefik metrics queries are on the wrong metric if the service name differs from the deployment #1767

Traefik metrics queries are on the wrong metric if the service name differs from the deployment #1767

Comments

judgeaxl commented Feb 21, 2025

Describe the bug

To Reproduce

Expected behavior

Additional context