Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading /proc/$PID/cmdline can hang in certain filesystem scenarios #156

Open
multimeric opened this issue Dec 20, 2024 · 0 comments
Open

Comments

@multimeric
Copy link

multimeric commented Dec 20, 2024

We have a tape-backed filesystem with retrieve-on-read, meaning that processes can go into uninterruptible sleep (D) when trying to read a file, as they wait for it to be retrieved from tape. When this happens, reading /proc/$PID/cmdline on the sleeping process hangs forever, which I believe is explained here. Consequently, the ps calls done by NHC fail, and this triggers a node health check alert. However, this is not desirable: the node is actually running fine, it's just that one or more processes are sleeping and can't report their cmdline.

I'm not entirely sure which ps invocation in NHC is triggering this, as there are several. I wonder if we need to request the cmdline since it has this undesirable property of potentially hanging? Alternatively, if it's helpful to most users, can there be a configuration option to turn this off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant