Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: instrument spawned tasks with current tracing span when tracing feature is enabled #14547

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

geoffreyclaude
Copy link

Which issue does this PR close?

Relates to #9415. Does not fully close the issue, but moves forward with a pre-requisite.

Rationale for this change

This allows DataFusion to integrate with users of the tracing crate by propagating the trace context as users would expect, without investing in the full integration of the tracing ecosystem.
When the (new) tracing feature is enabled, all tasks spawned on new threads (e.g. those spawned during repartitioning or while reading/writing Parquet files) inherit the current tracing span. This enhancement allows to propagate trace context through thread boundaries, into external data sources or custom exec nodes, and allows linking all generated logs and spans to the expected trace context.
Previously, tasks spawned on new threads would lose the trace context, as it is thread-local and must be "manually" propagated to the new thread.

What changes are included in this PR?

  • Update the common runtime so that tasks spawned on new threads are instrumented with the current tracing span when the tracing feature is enabled by wrapping the tokio::task::JoinSet in a custom JoinSet.
  • Add a new Cargo.toml feature (tracing) in the common-runtime crate, along with necessary dependency updates.
  • Provide an integration example in datafusion-examples/examples/tracing.rs that runs a SQL query over the alltypes_tiny_pages_plain.parquet file to demonstrate end-to-end propagation of the tracing context across multiple threads.
  • Update root README.md to reflect the availability and usage of the new trace feature.

Are these changes tested?

Yes. While there are no dedicated unit tests for this feature, the integration example in datafusion-examples/examples/tracing.rs serves as a comprehensive test. This example executes a query that triggers task spawns (such as through repartitioning and Parquet reading) and logs tracing output. By reviewing the logs, one can verify that the tracing span context is correctly propagated end to end.

Are there any user-facing changes?

No changes are expected for users who do not enable the tracing feature. The performance overhead should be inexistent when the feature is disabled, and completely negligible when enabled.

@github-actions github-actions bot added documentation Improvements or additions to documentation physical-expr Physical Expressions core Core DataFusion crate common Related to common crate labels Feb 7, 2025
@geoffreyclaude geoffreyclaude force-pushed the feat/trace-span-propagation branch 2 times, most recently from f7001cb to 615dc39 Compare February 8, 2025 16:16
@geoffreyclaude geoffreyclaude force-pushed the feat/trace-span-propagation branch 2 times, most recently from 234deb9 to bea0836 Compare February 13, 2025 17:13
@@ -126,6 +126,7 @@ Optional features:
- `backtrace`: include backtrace information in error messages
- `pyarrow`: conversions between PyArrow and DataFusion types
- `serde`: enable arrow-schema's `serde` feature
- `tracing`: propagates the current span across thread boundaries
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is the new tracing feature required? Or should we just make it the default?

}
impl<T: 'static> JoinSet<T> {
/// [JoinSet::spawn](tokio::task::JoinSet::spawn) - Spawn a new task.
pub fn spawn<F>(&mut self, task: F) -> AbortHandle
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: All public stable methods of the tokio::task::JoinSet are wrapped. Should only the methods used in DataFusion be wrapped for conciseness?

@geoffreyclaude geoffreyclaude force-pushed the feat/trace-span-propagation branch from bea0836 to 6a025d8 Compare February 18, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant