Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubble relay fails on on make helm-install-without-tls when deploying locally on Kind cluster #851

Open
SRodi opened this issue Oct 11, 2024 · 6 comments
Assignees

Comments

@SRodi
Copy link
Member

SRodi commented Oct 11, 2024

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

make quick-build
make helm-install-without-tls

Once the control plane is deployed:

hubbleRelayPodName=$(kubectl get pods -l app.kubernetes.io/name=hubble-relay -n kube-system -o jsonpath='{.items[*].metadata.name}')
k logs -n kube-system $hubbleRelayPodName -f

see error

level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.kubernetes:80"

Expected behavior
This error should not be present, hubble should run without tls so that can be port forwarded on the local machine

Screenshots

Image

Platform (please complete the following information):

  • OS: WSL2 Ubuntu-24.04
  • Kubernetes Version: Kind (Kubernetes v1.31.0)
  • Host: self-host
  • Retina Version: v0.0.16

Additional context
Related to cilium/cilium#20130

I have tested this in AKS (Kubernetes v1.29.8) and this issue is NOT present

@SRodi SRodi changed the title Hubble relay fails on on make helm-install-without-tls Hubble relay fails on on make helm-install-without-tls when deploying locally on Kind cluster Oct 11, 2024
@GuessWhoSamFoo
Copy link

Managed to get this working on kind 0.24.0 (k8s 1.31):

level=info msg="Starting gRPC health server..." addr=":4222" subsys=hubble-relay
level=info msg="Starting gRPC server..." options="{peerTarget:hubble-peer.kube-system.svc.cluster.local.:80 dialTimeout:5000000000 retryTimeout:30000000000 listenAddress::4245 healthListenAddress::4222 metricsListenAddress: log:0xc0002da540 serverTLSConfig:<nil> insecureServer:true clientTLSConfig:<nil> clusterName:default insecureClient:true observerOptions:[0x1f02b40 0x1f02c20] grpcMetrics:<nil> grpcUnaryInterceptors:[] grpcStreamInterceptors:[]}" subsys=hubble-relay
level=info msg="Received peer change notification" change notification="name:\"kind-control-plane\" address:\"192.168.176.2\" type:PEER_ADDED" subsys=hubble-relay
level=info msg="Received peer change notification" change notification="name:\"kind-worker\" address:\"192.168.176.5\" type:PEER_ADDED" subsys=hubble-relay
level=info msg="Received peer change notification" change notification="name:\"kind-worker2\" address:\"192.168.176.3\" type:PEER_ADDED" subsys=hubble-relay
level=info msg="Received peer change notification" change notification="name:\"kind-worker3\" address:\"192.168.176.4\" type:PEER_ADDED" subsys=hubble-relay
level=info msg=Connecting address="192.168.176.4:4244" hubble-tls=false peer=kind-worker3 subsys=hubble-relay
level=info msg=Connecting address="192.168.176.2:4244" hubble-tls=false peer=kind-control-plane subsys=hubble-relay
level=info msg=Connecting address="192.168.176.3:4244" hubble-tls=false peer=kind-worker2 subsys=hubble-relay
level=info msg=Connecting address="192.168.176.5:4244" hubble-tls=false peer=kind-worker subsys=hubble-relay
NAME              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS         IMAGES                                                                                                  SELECTOR
coredns           2/2     2            2           3h58m   coredns            registry.k8s.io/coredns/coredns:v1.11.1                                                                 k8s-app=kube-dns
hubble-relay      1/1     1            1           3h50m   hubble-relay       mcr.microsoft.com/oss/cilium/hubble-relay:v1.15.0                                                       k8s-app=hubble-relay
hubble-ui         1/1     1            1           3h50m   frontend,backend   mcr.microsoft.com/oss/cilium/hubble-ui:v0.12.2,mcr.microsoft.com/oss/cilium/hubble-ui-backend:v0.12.2   k8s-app=hubble-ui
retina-operator   1/1     1            1           3h50m   retina-operator    ghcr.io/guesswhosamfoo/retina/retina-operator:v0.0.16-116-gecdabdb-linux-amd64                          control-plane=retina-operator

DNS resolves as expected

bash-5.0# nslookup hubble-peer.kube-system.svc.cluster.local
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	hubble-peer.kube-system.svc.cluster.local
Address: 10.96.158.48

It is worth calling out in the workflow written above

make quick-build
make helm-install-without-tls

make quick-build creates retina images via git describe --tags --always whereas make helm-install-without-tls could default to the latest tag. This can result in image pull errors for retina operator/init which will result in "Failed to create peer client for peers synchronization" for hubble-relay

@timraymond
Copy link
Member

@SRodi can you try this again? I believe it may have been related to #921 but want to confirm.

@SRodi
Copy link
Member Author

SRodi commented Nov 5, 2024

@timraymond I confirm I still see this issue on my Kind cluster.

Image

@timraymond
Copy link
Member

@SRodi okay, thanks--good to know. I'll try to repro on mine.

@timraymond
Copy link
Member

@SRodi I believe the root cause here is this:

retina/Makefile

Lines 417 to 419 in 5552182

LATEST_TAG := $(shell curl -s https://api.github.com/repos/microsoft/retina/releases | jq -r '.[0].name')
HELM_IMAGE_TAG ?= $(LATEST_TAG)
The problem is that you're building images from main, but using tags from the latest release, which are getting pulled from MCR--so you're not using the images you're building. Those likely had the aforementioned issue (#921), which explains the similar presentation.

I don't remember why we need to use the latest GH release here, do you @jimassa ? (found by git blame)

At any rate, it seems that the "developer-friendly" install command here is make quick-deploy-hubble which sets the HELM_IMAGE_TAG to $(TAG)-linux-amd64 which is only derived from git locally.

@timraymond timraymond self-assigned this Nov 7, 2024
@SRodi
Copy link
Member Author

SRodi commented Nov 7, 2024

Hi @timraymond, I am not using tags from the last release. Please see the screenshot I included in my last comment. I've purposefully left the bottom terminal so that you can see the image tag used.

Also, .PHONY: quick-deploy-hubble target uses make helm-install-without-tls, that's why I referenced it here.

retina/Makefile

Lines 559 to 562 in 5552182

.PHONY: quick-deploy-hubble
quick-deploy-hubble:
$(MAKE) helm-uninstall || true
$(MAKE) helm-install-without-tls HELM_IMAGE_TAG=$(TAG)-linux-amd64

Can you please try test it on your Kind cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants