Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run / Test all examples in Documentation #14435

Closed
3 of 13 tasks
Tracked by #7013
alamb opened this issue Feb 3, 2025 · 6 comments · Fixed by #14544 or #14606
Closed
3 of 13 tasks
Tracked by #7013

Run / Test all examples in Documentation #14435

alamb opened this issue Feb 3, 2025 · 6 comments · Fixed by #14544 or #14606
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Feb 3, 2025

Is your feature request related to a problem or challenge?

The https://datafusion.apache.org/library-user-guide/index.html has some great examples ❤

However, not all of these examples are actually tested during CI which results in issues like this one found (and fixed!) by @nuno-faria

It is possible to prevent this type of problem by testing the examples automatically, following the instructions here

// Instructions for Documentation Examples
//
// The following commands test the examples from the user guide as part of
// `cargo test --doc`
//
// # Adding new tests:
//
// Simply add code like this to your .md file and ensure your md file is
// included in the lists below.
//
// ```rust
// <code here will be tested>
// ```
//
// Note that sometimes it helps to author the doctest as a standalone program
// first, and then copy it into the user guide.
//
// # Debugging Test Failures
//
// Unfortunately, the line numbers reported by doctest do not correspond to the
// line numbers of in the .md files. Thus, if a doctest fails, use the name of
// the test to find the relevant file in the list below, and then find the
// example in that file to fix.
//
// For example, if `user_guide_expressions(line 123)` fails,
// go to `docs/source/user-guide/expressions.md` to find the relevant problem.

Describe the solution you'd like

I would like to test all examples in the entire User Guide and Library User Guide.

Here are the files in the docs/source/library-user-guide directory. The ones that are not checked do not have their examples checked yet

  • docs/source/library-user-guide/adding-udfs.md
  • docs/source/library-user-guide/api-health.md
  • docs/source/library-user-guide/building-logical-plans.md
  • docs/source/library-user-guide/catalogs.md
  • docs/source/library-user-guide/custom-table-providers.md
  • docs/source/library-user-guide/extending-operators.md
  • docs/source/library-user-guide/extensions.md
  • docs/source/library-user-guide/index.md
  • docs/source/library-user-guide/profiling.md
  • docs/source/library-user-guide/query-optimizer.md
  • docs/source/library-user-guide/using-the-dataframe-api.md
  • docs/source/library-user-guide/using-the-sql-api.md
  • docs/source/library-user-guide/working-with-exprs.md

Describe alternatives you've considered

To test the examples on a particular page, such as adding-udfs.md

Step 1: Add entry to lib.rs

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
diff --git a/datafusion/core/src/lib.rs b/datafusion/core/src/lib.rs
index 780b22983..509ef601c 100644
--- a/datafusion/core/src/lib.rs
+++ b/datafusion/core/src/lib.rs
@@ -873,6 +873,12 @@ doc_comment::doctest!(
     user_guide_expressions
 );

+#[cfg(doctest)]
+doc_comment::doctest!(
+    "../../../docs/source/library-user-guide/adding-udfs.md",
+    library_user_guide_adding_udfs
+);
+
 #[cfg(doctest)]
 doc_comment::doctest!(
     "../../../docs/source/library-user-guide/using-the-sql-api.md",

Step 2: Run tests:

cargo test --doc -- library_user_guide_adding_udfs

Step 3: Fix issues / run again

(iterate and repeat until test passes)

Additional context

No response

@alamb alamb added the enhancement New feature or request label Feb 3, 2025
@alamb alamb changed the title Run / Test all examples in Documenation Run / Test all examples in Documentation Feb 3, 2025
@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2025

I think this would be a good first issue for people to work on as it is self contained, well described, doesn't require deep internals experience, and would give you exposure to the API

@alamb alamb added good first issue Good for newcomers documentation Improvements or additions to documentation labels Feb 3, 2025
@Chen-Yuan-Lai
Copy link
Contributor

@alamb I found that some tests failed because multiple code blocks shared some imports. For example, should we repeat the necessary imports in the second block?

Image

@ugoa
Copy link
Contributor

ugoa commented Feb 3, 2025

Hey I would like to have a try, I am quite interested in the datafusion and daft project and have been exploring both recently

@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2025

@alamb I found that some tests failed because multiple code blocks shared some imports. For example, should we repeat the necessary imports in the second block?

Yes I think that is what is needed for now.

If that makes the documentation too messy maybe we can figure out how to do the rustdoc style of hiding lines that start with #

Thanks @Chen-Yuan-Lai

@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2025

Hey I would like to have a try, I am quite interested in the datafusion and daft project and have been exploring both recently

Thank you @ugoa -- that is great!

I recommend you pick of of the files above and then work through the errors to get all the tests passing

@ugoa
Copy link
Contributor

ugoa commented Feb 4, 2025

Hey @alamb @Chen-Yuan-Lai folks, here is my attempt to fix doctest for file docs/source/library-user-guide/adding-udfs.md, the below test command is able to pass successfully:

cargo test --doc -- library_user_guide_adding_udfs

However, I do need advice on how to improve the issue pointed out by @Chen-Yuan-Lai. For this MR I take the (#) approach as suggested, but the doc becomes quite long, and the build process is also required to change because the sphinx doesn't support the rustdoc format (I think?).

Anyway looking forward to your feedback, cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers
Projects
None yet
3 participants