Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash: inconsistent label cardinality #838

Open
matmerr opened this issue Oct 9, 2024 · 1 comment
Open

Crash: inconsistent label cardinality #838

matmerr opened this issue Oct 9, 2024 · 1 comment

Comments

@matmerr
Copy link
Member

matmerr commented Oct 9, 2024

Describe the bug

Retina experienced crash on start with image: mcr.microsoft.com/containernetworking/retina-agent:v0.0.15

To Reproduce
Steps to reproduce the behavior:

Defaulted container "retina" out of: retina, init-retina (init)
Starting Retina Agent
starting Retina daemon with legacy control plane v0.0.15
init client-go
api server:  ****
init logger
ts=2024-08-28T16:33:20.448Z level=info caller=metrics/metrics.go:169 msg="Metrics initialized"
ts=2024-08-28T16:33:20.448Z level=info caller=legacy/daemon.go:137 msg="{data aggregation level 15 0 low <nil>}"
ts=2024-08-28T16:33:20.449Z level=info caller=legacy/daemon.go:173 msg="Remote context is disabled, only pods deployed on the same node as retina-agent will be monitored"
ts=2024-08-28T16:33:20.449Z level=info caller=legacy/daemon.go:194 msg="pod selector when remote context is disabled{pod selector 15 0 spec.nodeName=aks-nodepool1-47685273-vmss000001,status.podIP!=10.0.0.4 <nil>}"
ts=2024-08-28T16:33:20.478Z level=info caller=legacy/daemon.go:226 msg="Kubernetes server version: v1.27.16"
ts=2024-08-28T16:33:20.478Z level=info caller=legacy/daemon.go:248 msg="Initializing Pod controller"
ts=2024-08-28T16:33:20.478Z level=info caller=pod/controller.go:89 msg="Setting up Pod controller"
ts=2024-08-28T16:33:20.478Z level=info caller=legacy/daemon.go:264 msg="Initializing Node controller"
ts=2024-08-28T16:33:20.478Z level=info caller=node/controller.go:89 msg="Setting up Node controller"
ts=2024-08-28T16:33:20.479Z level=info caller=legacy/daemon.go:270 msg="Initializing Service controller"
ts=2024-08-28T16:33:20.479Z level=info caller=service/controller.go:95 msg="Setting up Service controller"
ts=2024-08-28T16:33:20.479Z level=info caller=legacy/daemon.go:277 msg="Initializing MetricsConfig namespaceController"
ts=2024-08-28T16:33:20.479Z level=info caller=namespace/namespace_controller.go:105 msg="Setting up Namespace controller"
ts=2024-08-28T16:33:20.479Z level=info caller=pluginmanager/pluginmanager.go:67 msg="plugin manager has pod level enabled"
ts=2024-08-28T16:33:20.479Z level=info caller=controllermanager/controllermanager.go:79 msg="Initializing controller manager ..."
ts=2024-08-28T16:33:20.479Z level=info caller=servermanager/servermanager.go:33 msg="Initializing HTTP server ..."
ts=2024-08-28T16:33:20.479Z level=info caller=server/server.go:42 msg="Setting up handlers"
ts=2024-08-28T16:33:20.479Z level=info caller=server/server.go:57 msg="Completed handler setup"
ts=2024-08-28T16:33:20.479Z level=info caller=servermanager/servermanager.go:37 msg="HTTP server initialized..."
ts=2024-08-28T16:33:20.479Z level=info caller=legacy/daemon.go:309 msg="Started controller manager"
ts=2024-08-28T16:33:20.479Z level=info caller=servermanager/servermanager.go:42 msg="Starting HTTP server ..." host=0.0.0.0 port=10093
ts=2024-08-28T16:33:20.479Z level=info caller=server/server.go:208 msg="Starting metrics server"
ts=2024-08-28T16:33:20.480Z level=info caller=server/server.go:69 msg="starting HTTP server... on " addr=0.0.0.0:10093
ts=2024-08-28T16:33:20.480Z level=info caller=pluginmanager/pluginmanager.go:138 msg="Starting plugin manager ..."
ts=2024-08-28T16:33:20.480Z level=info caller=pluginmanager/pluginmanager.go:150 msg="starting watchers"
ts=2024-08-28T16:33:20.480Z level=info caller=watchermanager/watchermanager.go:44 msg="watcher started" watcher_type=*endpoint.EndpointWatcher
ts=2024-08-28T16:33:20.480Z level=info caller=watchermanager/watchermanager.go:44 msg="watcher started" watcher_type=*apiserver.ApiServerWatcher
ts=2024-08-28T16:33:20.481Z level=info caller=manager/server.go:83 msg="starting server" name="health probe" addr=[::]:18081
ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=pod controllerGroup= controllerKind=Pod source="kind source: *v1.Pod"
ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=service controllerGroup= controllerKind=Service source="kind source: *v1.Service"
ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=node controllerGroup= controllerKind=Node source="kind source: *v1.Node"
ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=pod controllerGroup= controllerKind=Pod
ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=namespace controllerGroup= controllerKind=Namespace source="kind source: *v1.Namespace"
ts=2024-08-28T16:33:20.482Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=node controllerGroup= controllerKind=Node
ts=2024-08-28T16:33:20.481Z level=info caller=server/server.go:247 msg="Serving metrics server" bindAddress=:18080 secure=false
ts=2024-08-28T16:33:20.482Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=service controllerGroup= controllerKind=Service
ts=2024-08-28T16:33:20.482Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=namespace controllerGroup= controllerKind=Namespace
ts=2024-08-28T16:33:20.582Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=node controllerGroup= controllerKind=Node worker count=1
ts=2024-08-28T16:33:20.583Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=service controllerGroup= controllerKind=Service worker count=1
ts=2024-08-28T16:33:20.583Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=pod controllerGroup= controllerKind=Pod worker count=1
ts=2024-08-28T16:33:20.583Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=namespace controllerGroup= controllerKind=Namespace worker count=1
ts=2024-08-28T16:33:20.583Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/konnectivity-agent-644d74dc99-n42g2
ts=2024-08-28T16:33:20.584Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/coredns-autoscaler-7bf9ddd894-xfsp9
ts=2024-08-28T16:33:20.584Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/coredns-d696dc46-flbv7
ts=2024-08-28T16:33:20.585Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/coredns-d696dc46-n8nqk
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/metrics_module.go:157 msg="Reconciling metric module" spec= specError="unsupported value type"
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/forward.go:41 msg="Creating forward count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/forward.go:41 msg="Creating forward count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/drops.go:38 msg="Creating drop count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/drops.go:38 msg="Creating drop count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/tcpflags.go:37 msg="Creating TCP Flags count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.480Z level=info caller=metrics/tcpretrans.go:37 msg="Creating TCP retransmit count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:44 msg="Creating DNS count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:44 msg="Creating DNS count metrics" options= optionsError="unsupported value type"
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/drops.go:72 msg="src labels" labels=reason,direction,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:76 msg="src labels" labels=query_type,query,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:91 msg="src labels" labels=return_code,query_type,query,response,num_response,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/forward.go:78 msg="src labels" labels=direction,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/drops.go:72 msg="src labels" labels=reason,direction,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/tcpretrans.go:57 msg="src labels" labels=direction,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.481Z level=info caller=metrics/forward.go:78 msg="src labels" labels=direction,ip,namespace,podname,workload_kind,workload_name
ts=2024-08-28T16:33:21.883Z level=info caller=dropreason/dropreason_linux.go:115 msg="DropReason metric compiled"
ts=2024-08-28T16:33:22.793Z level=info caller=common/common_linux.go:74 msg="perf reader created" Map=PerfEventArray(dropreason_events)#14 PageSize=4096 BufferSize=65536
ts=2024-08-28T16:33:23.648Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=dropreason
ts=2024-08-28T16:33:23.648Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin dropreason"
ts=2024-08-28T16:33:23.648Z level=info caller=dropreason/dropreason_linux.go:241 msg="Start listening for drop reason events..."
ts=2024-08-28T16:33:23.648Z level=info caller=dropreason/dropreason_linux.go:247 msg="setting up enricher since pod level is enabled"
ts=2024-08-28T16:33:25.649Z level=info caller=packetforward/packetforward_linux.go:102 msg="Packet forwarding metric header generated"
ts=2024-08-28T16:33:26.547Z level=info caller=packetforward/packetforward_linux.go:130 msg="Packet forwarding metric compiled"
ts=2024-08-28T16:33:26.549Z level=info caller=packetforward/packetforward_linux.go:176 msg="Packet forwarding metric initialized"
ts=2024-08-28T16:33:26.549Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=packetforward
ts=2024-08-28T16:33:26.549Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin packetforward"
ts=2024-08-28T16:33:26.549Z level=info caller=packetforward/packetforward_linux.go:181 msg="Start collecting packet forward metrics"
ts=2024-08-28T16:33:28.550Z level=info caller=linuxutil/linuxutil_linux.go:42 msg="Initializing linuxutil plugin..."
ts=2024-08-28T16:33:28.550Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=linuxutil
ts=2024-08-28T16:33:28.550Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin linuxutil"
ts=2024-08-28T16:33:28.550Z level=info caller=linuxutil/linuxutil_linux.go:57 msg="Running linuxutil plugin..."
ts=2024-08-28T16:33:30.550Z level=info caller=dns/dns_linux.go:90 msg="Stopped dns plugin"
ts=2024-08-28T16:33:30.586Z level=info caller=dns/dns_linux.go:64 msg="Initialized dns plugin"
ts=2024-08-28T16:33:30.587Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=dns
ts=2024-08-28T16:33:30.587Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin dns"
ts=2024-08-28T16:33:32.587Z level=info caller=packetparser/packetparser_linux.go:82 msg="PacketParser header generated at" path=/go/src/github.com/microsoft/retina/pkg/plugin/packetparser/_cprog/dynamic.h
ts=2024-08-28T16:33:33.575Z level=info caller=packetparser/packetparser_linux.go:108 msg="PacketParser metric compiled"
ts=2024-08-28T16:33:33.575Z level=info caller=packetparser/packetparser_linux.go:241 msg="Stopping packet parser"
ts=2024-08-28T16:33:33.575Z level=info caller=packetparser/packetparser_linux.go:283 msg="Stopped packet parser"
ts=2024-08-28T16:33:34.050Z level=info caller=common/common_linux.go:74 msg="perf reader created" Map=PerfEventArray(packetparser_events)#67 PageSize=4096 BufferSize=131072
ts=2024-08-28T16:33:34.050Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=packetparser
ts=2024-08-28T16:33:34.050Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin packetparser"
ts=2024-08-28T16:33:34.050Z level=info caller=packetparser/packetparser_linux.go:190 msg="Starting packet parser"
ts=2024-08-28T16:33:34.050Z level=info caller=packetparser/packetparser_linux.go:192 msg="setting up enricher since pod level is enabled"
ts=2024-08-28T16:33:34.051Z level=info caller=packetparser/packetparser_linux.go:212 msg="Attaching bpf program to default interface of k8s Node in node namespace"
ts=2024-08-28T16:33:34.051Z level=info caller=packetparser/packetparser_linux.go:223 msg="Attaching Packetparser" outgoingLink.Index=2 outgoingLink.Name=eth0 outgoingLink.HardwareAddr=00:0d:3a:f7:c4:96
ts=2024-08-28T16:33:34.115Z level=info caller=packetparser/packetparser_linux.go:513 msg="Started packet parser"
ts=2024-08-28T16:33:36.051Z level=info caller=pluginmanager/pluginmanager.go:181 msg="successfully started pluginmanager"
ts=2024-08-28T16:33:50.484Z level=info caller=apiserver/apiserver.go:118 msg="New Apiserver IPs:" ip=4.154.192.90
panic: inconsistent label cardinality: expected 0 label values but got 1 in []string{"no_response"}

goroutine 449 [running]:
github.com/prometheus/client_golang/prometheus.(*CounterVec).WithLabelValues(0x31de654?, {0xc0008c7690?, 0x2faff00?, 0xc00374b840?})
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/counter.go:284 +0x34
github.com/microsoft/retina/pkg/module/metrics.(*LatencyMetrics).Init.func1({0xc0011a27d0?, 0x4499bd?}, 0x23e69c2?, 0x2962500?)
        /go/src/github.com/microsoft/retina/pkg/module/metrics/latency.go:127 +0x19b
github.com/jellydator/ttlcache/v3.(*Cache[...]).OnEviction.func1.1()
        /go/pkg/mod/github.com/jellydator/ttlcache/[email protected]/cache.go:699 +0x31
created by github.com/jellydator/ttlcache/v3.(*Cache[...]).OnEviction.func1 in goroutine 271
        /go/pkg/mod/github.com/jellydator/ttlcache/[email protected]/cache.go:698 +0xef

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Platform (please complete the following information):

NAME                                STATUS                     ROLES   AGE   VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION      CONTAINER-RUNTIME
aks-nodepool1-47685273-vmss000000   Ready                      agent   47d   v1.27.16   10.0.0.5      <none>        Ubuntu 22.04.4 LTS               5.15.0-1070-azure   containerd://1.7.20-1
aks-nodepool1-47685273-vmss000001   Ready                      agent   47d   v1.27.16   10.0.0.4      <none>        Ubuntu 22.04.4 LTS               5.15.0-1070-azure   containerd://1.7.20-1
  • Retina Version: mcr.microsoft.com/containernetworking/retina-agent:v0.0.15

Additional context
Add any other context about the problem here.

@nddq
Copy link
Contributor

nddq commented Oct 9, 2024

should be fixed in 0.0.16 #687

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants