-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: execute pthread_join only when tid != 0 in flb_stop. #9428
Conversation
Signed-off-by: Phillip Whelan <[email protected]>
). Signed-off-by: Phillip Adair Stewart Whelan <[email protected]>
@pwhelan can we get more details on why CentOS and Redhat don't conform ? I cannot find a reference about it. It seems to me that the proper approach would be to get the return of pthread_join(3) and print the errors if any:
a simple flb_errno() can do the job since that is set in errno |
That would be great, but pthread_join crashes on RHEL and CentOS before it can even return when passing the first argument as zero. |
@pwhelan what are the steps to make it crash ? |
You need to configure a calyptia fleet with an invalid configuration, ie: with a non-existent plugin:
Reproducing it is hard under other circumstances. |
can we get a minimal repro ? it seems this needs hot reload with a speacial crafted config, something we can repro with curl ? |
You need to run fluent-bit under the CentOS/8 container, modify the configuration file once it is running and then send it SIGHUP.
|
closing in favor of #9432 |
Summary
This PR wraps the call to
pthread_join
inflb_stop
in a conditional fortid != 0
.Calling
pthread_join
withtid
== 0 is undefined. Technically it should returnESRCH
but CentOS and RHEL do not seem to conform.The following crash occurs:
Debugging the actual code in
__pthread_timedjoin_ex
we can see it is in fact aNULL
dereference:Root cause analysis
The value of
tid
is derived fromctx->config->worker
here:The call to
pthread_join
checksctx->status
, which is not changed whenctx->config->worker
is set:The value of
ctx->status
can be set to the following values which are defined in flb_lib.h:The places where
ctx->status
is set do not correlate with the value ofctx->config->worker
at all:fluent-bit/src/flb_lib.c
Line 168 in f36c956
fluent-bit/src/flb_lib.c
Line 680 in f36c956
fluent-bit/src/flb_lib.c
Line 726 in f36c956
fluent-bit/src/flb_lib.c
Line 732 in f36c956
fluent-bit/src/flb_lib.c
Line 741 in f36c956
In fact we might want to just check the value of
ctx->config->worker
and notctx->status
before joining it. There is a correlation between both values but I do not see it as being strong enough to depend upon one or the other.The code that calls
pthread_join
behind that conditional is also meant for extraordinary or erroneous circumstances and therefore being defensive to me does not seem to be a bad policy.References
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.