feat: instrument spawned tasks with current tracing span when tracing
feature is enabled
#14547
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Relates to #9415. Does not fully close the issue, but moves forward with a pre-requisite.
Rationale for this change
This allows DataFusion to integrate with users of the
tracing
crate by propagating the trace context as users would expect, without investing in the full integration of thetracing
ecosystem.When the (new)
tracing
feature is enabled, all tasks spawned on new threads (e.g. those spawned during repartitioning or while reading/writing Parquet files) inherit the current tracing span. This enhancement allows to propagate trace context through thread boundaries, into external data sources or custom exec nodes, and allows linking all generated logs and spans to the expected trace context.Previously, tasks spawned on new threads would lose the trace context, as it is thread-local and must be "manually" propagated to the new thread.
What changes are included in this PR?
tracing
feature is enabled by wrapping thetokio::task::JoinSet
in a customJoinSet
.tracing
) in the common-runtime crate, along with necessary dependency updates.datafusion-examples/examples/tracing.rs
that runs a SQL query over thealltypes_tiny_pages_plain.parquet
file to demonstrate end-to-end propagation of the tracing context across multiple threads.README.md
to reflect the availability and usage of the newtrace
feature.Are these changes tested?
Yes. While there are no dedicated unit tests for this feature, the integration example in
datafusion-examples/examples/tracing.rs
serves as a comprehensive test. This example executes a query that triggers task spawns (such as through repartitioning and Parquet reading) and logs tracing output. By reviewing the logs, one can verify that the tracing span context is correctly propagated end to end.Are there any user-facing changes?
No changes are expected for users who do not enable the
tracing
feature. The performance overhead should be inexistent when the feature is disabled, and completely negligible when enabled.