You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a tape-backed filesystem with retrieve-on-read, meaning that processes can go into uninterruptible sleep (D) when trying to read a file, as they wait for it to be retrieved from tape. When this happens, reading /proc/$PID/cmdline on the sleeping process hangs forever, which I believe is explained here. Consequently, the ps calls done by NHC fail, and this triggers a node health check alert. However, this is not desirable: the node is actually running fine, it's just that one or more processes are sleeping and can't report their cmdline.
I'm not entirely sure which ps invocation in NHC is triggering this, as there are several. I wonder if we need to request the cmdline since it has this undesirable property of potentially hanging? Alternatively, if it's helpful to most users, can there be a configuration option to turn this off?
The text was updated successfully, but these errors were encountered:
We have a tape-backed filesystem with retrieve-on-read, meaning that processes can go into uninterruptible sleep (D) when trying to read a file, as they wait for it to be retrieved from tape. When this happens, reading
/proc/$PID/cmdline
on the sleeping process hangs forever, which I believe is explained here. Consequently, theps
calls done by NHC fail, and this triggers a node health check alert. However, this is not desirable: the node is actually running fine, it's just that one or more processes are sleeping and can't report theircmdline
.I'm not entirely sure which
ps
invocation in NHC is triggering this, as there are several. I wonder if we need to request thecmdline
since it has this undesirable property of potentially hanging? Alternatively, if it's helpful to most users, can there be a configuration option to turn this off?The text was updated successfully, but these errors were encountered: