Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl using 1200% CPU on MacOS 14.4.1 #1668

Open
philippefutureboy opened this issue Oct 8, 2024 · 12 comments
Open

kubectl using 1200% CPU on MacOS 14.4.1 #1668

philippefutureboy opened this issue Oct 8, 2024 · 12 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@philippefutureboy
Copy link

What happened:

I always keep a terminal open with watch "kubectl get pods" while I work, so that if I can at a glance see the status of my remote cluster.
I noticed today while working that my computer was sluggish. When looking in activity monitor, kubectl was running at 1200% (12 full CPU cores) CPU usage, with low memory usage. At that time, watch "kubectl get pods" had been running for 5d 14h, polling state every 2s while my laptop is not in sleep mode.
I killed the command watch "kubectl get pods" and the process successfully exited, releasing the CPU load.

What you expected to happen:

Not eat 12 full CPU, it's polling once every 2 sec.

How to reproduce it (as minimally and precisely as possible):

No idea really! Anything I can do to help diagnose this?
The only reason that I'm posting here is that high CPU usage like this can be indicative of an exploited security vulnerability, and thus why I'm taking proactive action to open this issue.

  • Is there any sort of log that is left by kubectl locally or remotely (our cluster is in GCP K8S)?
  • How do I check the integrity of the program (checksum perhaps?)?

I think my kubectl is packaged directly with gcloud. I'm not sure; how do I check?

Anything else we need to know?:

Environment:

  • Kubernetes client and server versions (use kubectl version):

Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4-gke.1348000

  • Cloud provider or hardware configuration: Google Cloud Platform, K8S; locally, MacBook Pro 2019 Intel Core i9
  • OS (e.g: cat /etc/os-release): MacOS 14.4.1 Sonoma
@philippefutureboy philippefutureboy added the kind/bug Categorizes issue or PR as related to a bug. label Oct 8, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 8, 2024
@ardaguclu
Copy link
Member

ardaguclu commented Oct 8, 2024

/kind support
I'd recommend passing -v=9 to see what is happening.

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Oct 8, 2024
@philippefutureboy
Copy link
Author

Thank @ardaguclu. I've added the flag and will be monitoring CPU usage. If anything happens I'll let you know :)

@brianpursley
Copy link
Member

How do I check the integrity of the program (checksum perhaps?)?

The sha512 hash (for the gz) is published in the changelog.
For example, https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#client-binaries

Something like this should work:

  1. Download the client binaries archive.
  2. Compute the hash of the archive you downloaded to confirm it matches what the changelog says it should be.
  3. Extract the archive.
  4. Compute the hash for the extracted binary (This is the expected hash).
  5. Compute the hash for your local binary and compare to confirm that it matches what you got in step 4.

Example (your will want to use darwin instead of linux-amd64):

~/Downloads $ shasum -a 512 kubernetes-client-linux-amd64.tar.gz 
7551aba20eef3e2fb2076994a1a524b2ea2ecd85d47525845af375acf236b8afd1cd6873815927904fb7d6cf7375cfa5c56cedefad06bf18aa7d6d46bd28d287  kubernetes-client-linux-amd64.tar.gz
~/Downloads $ tar xvf kubernetes-client-linux-amd64.tar.gz kubernetes
kubernetes/
kubernetes/client/
kubernetes/client/bin/
kubernetes/client/bin/kubectl
kubernetes/client/bin/kubectl-convert
~/Downloads $ shasum -a 512 kubernetes/client/bin/kubectl
1adba880a67e8ad9aedb82cde90343a6656147e7f331ab2e2293d4bc16a280591bd3b912873f099f11cde2f044d8698b963ed45fadedfe1735d99158e21e44a0  kubernetes/client/bin/kubectl

Then get your local kubectl's hash and compare it...

shasum -a 512 $(which kubectl)

@brianpursley
Copy link
Member

The interesting thing about this is that kubectl is not running for 5 days, it is being invoked by watch every 2 seconds for 5 days.

In addition to using -v=9 as @ardaguclu suggested...

If it happens again, try doing the following in another terminal, while the problem is occurring, to collect information that might be helpful to diagnose the problem:

ps -F $(pgrep kubectl)
pgrep kubectl | xargs -L1 lsof -p

@philippefutureboy
Copy link
Author

Fantastic @brianpursley, thanks for the additional tips! I will be checking the checksum tomorrow :)

The interesting thing about this is that kubectl is not running for 5 days, it is being invoked by watch every 2 seconds for 5 days.

I was thinking the same thing - after 5 days, maybe that there's some kind of low-level error that leads to more CPU consumption or some data that accumulates. But every 2 second one execution? Shouldn't be an issue.

I'll follow up shortly!

@philippefutureboy
Copy link
Author

philippefutureboy commented Oct 17, 2024

Hi @brianpursley!
With some extra delays, I've done the sha1 check, and the checksum don't match.
I'm not sure whether they should match in the first place because:

  1. My kubectl binary is the one distributed by google-cloud-sdk
  2. When running kubectl version I get three versions - one for the client, one for the server, and one for kustomize; whereas the downloads on github are separate between the client and server

Here are the steps taken:

  1. Determine the kubectl location on fs & version:
$ which kubectl
/Users/philippe/google-cloud-sdk/bin/kubectl
$ kubectl version
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5-gke.1014001
  1. Download the kubectl binary from source

2.1. Navigate to https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#downloads-for-v1304
2.2. Click on kubernetes-client-darwin-amd64.tar.gz
2.3. Extract kubectl to a folder using tar -xvzf kubernetes-client-darwin-amd64.tar.gz

  1. Assert checksum match
$ shasum -a 512 $(which kubectl)
a49c02cbe3d3b80011a0d53118d3d8f921efbad89e6c986d39e05a5d486702a9020ff324a1c01d79f24234fa5d8783448352c980379c876344efd0eb332377d4  /Users/philippe/google-cloud-sdk/bin/kubectl

and

$ shasum -a 512 /Users/philippe/Downloads/kubernetes/client/bin/kubectl 
78c72e33056778b37c43e272e2b997be272edc70fd6f650e29eb4ab1221c0184955470611f6dba20592f778135c348752e423776c51e97f91a3cad33216430bc  /Users/philippe/Downloads/kubernetes/client/bin/kubectl

As you can see the checksum mismatch even though the client version matches.
How can I assert that the kubectl binary I have hasn't been tampered with/that there wasn't a supply-chain attack?
I can't just download the google-cloud-sdk on a docker image and obtain the shasum of the installed kubectl bin, all that would tell me is that my kubectl binary matches that of the provider, rather than if it matches the one compiled from this project.

Thank you for your help!
Cheers,
Philippe

@philippefutureboy
Copy link
Author

philippefutureboy commented Oct 17, 2024

Note that I also opened a support case with Google Cloud internally to assist in confirming the integrity of the kubectl binary packaged as part of google-cloud-sdk. Any information on your end will still be helpful, and I'll provide any information I got from Google Cloud's support team.

Last note, no spike in CPU usage has been noticed since last encountered.

@brianpursley
Copy link
Member

@philippefutureboy Do the Google Cloud SDK maintainers build their own kubectl binary that has gcloud specific changes?

If so, and if kubectl version reports it as "regular" kubectl, that seems like it could be confusing.

@philippefutureboy
Copy link
Author

@brianpursley that's also what I'm trying to figure out with my support rep. I'll keep you in the loop with any new info.

@philippefutureboy
Copy link
Author

@brianpursley Following up on the Google Cloud SDK kubectl binary - here's the support team's response:

The team confirmed that the checksum is unique to each binary unless it is compiled in the exact same environment and with the same parameters, which explains why the checksums do not match.

Currently, we do not have a checksum for kubectl, as it is part of the Cloud SDK package. However, we do have checksums available for gcloud [1].

If you wish to use a checksum for kubectl from the repository [2], you will need to generate the checksum yourself for the specific binary you are using.

To clarify, we only have checksums for the overall package (gcloud), not for the individual kubectl binary.

I'm not sure how to approach this, as I can't realistically reproduce the environment in which they compiled their version of the kubectl binary. I'll inquire if there's anything I can do to counter-verify the signature of their kubectl binary.

@philippefutureboy
Copy link
Author

Here's the follow-up answer from the support team:

Regarding the questions you shared with us I would like to inform you that I checked with our internal team and this is what they shared with us:

“From what I gather, GCP seems to be compiling the binary on their own rather than using the binary that is provided by the official repository.
Why is that?”

As you mentioned earlier the official way to verify is to validate the checksum provided in our website [1] when downloading gcloud sdk CLI, after that you can install kubectl as part of gcloud components, "gcloud components install kubectl". However, we do not publish SHA checksum for the kubectl command that is packaged as part of gcloud-sdk. Meaning that we can do a checksum for the full package of gcloud components but we do not provide a way to checksum for single components.

“What's the motive to compile it on your end? Do you modify the source code in any way before compilation?”

GCP modifies kubectl binary included in gcloud to add specific features, optimizations, or integrations with Google Cloud services. These modifications would alter the checksum. Also, different build processes or compilation options can result in binaries with different checksums

So from what I understand it is not possible to assert the integrity of the kubectl binary when it is packaged by the Google Cloud team as part of the gcloud utilities. I've asked a follow-up question to see if it is possible to assert the checksum of the gcloud sdk with the kubectl binary against a publicly published list of checksums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants