Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik metrics queries are on the wrong metric if the service name differs from the deployment #1767

Open
judgeaxl opened this issue Feb 21, 2025 · 0 comments

Comments

@judgeaxl
Copy link

Describe the bug

The default analysis metric with Traefik and Prometheus query the wrong metric service tag if the service is not using the same name as the deployment.

To Reproduce

Use the basic example from the Traefik Canary Deployments tutorial; https://docs.flagger.app/tutorials/traefik-progressive-delivery, but rename the service so that it's different from the deployment name. E.g. podinfo-service instead of podinfo.

You also need to make the load tester call the service via Traefik, instead of directly hitting the canary service, as otherwise you'll get no metrics at all. E.g. something like http://traefik.traefik:PORT/route-to-service as opposed to http://podinfo-canary:PORT.

Trigger Flagger to start the canary analysis and review the metrics in Prometheus. You'll find the traefik_service_request_duration_seconds_bucket shows up with a tag of NAMESPACE-podinfo-service-canary-PORT instead of NAMESPACE-podinfo-canary-PORT.

You can verify that Flagger queries for a tag matching NAMESPACE-podinfo-canary-PORT (using the deployment name without "-service") by enabling query logging in Prometheus.

Expected behavior

Flagger queries the correct metric even when the service name differs from the deployment (target) name.

Additional context

  • Flagger version: 1.40.0
  • Kubernetes version: 1.30
  • Service Mesh provider: N/A
  • Ingress provider: Traefik

I'm aware that I can create a custom metric instead of using the built-in one, but it would make sense to me that it uses the "correct" metric tag by default.

I also think it could be mentioned in the documentation that addServicesLabels must be true in the Traefik Prometheus config or there will be no service tag at all.

Additionally, to get metrics out of Traefik you need to load test the service via Traefik, not directly on the canary service. The example page does not. Possibly because it was adapted from a setup with a service mesh where I assume the metrics are gathered via the mesh regardless of how it's called.

I'm happy to take a stab at a PR for using the right service tag name, and to extend the documentation with my observations above, if it's of any interest to the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant