Create new CI workflow and run unit tests using it #17700
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR has 4 parts:
I'll describe the above 4 parts separately below.
Creation of a new workflow
This PR adds 2 new workflow yml files:
ci.yml
worker.yml
Eventually,
ci.yml
is expected to be the new starting point for our entire CI workflow. However, right now, since we're migrating only the unit tests (phase 1) to use ci.yml, it's not the starting point for now (during this transition of individual parts one-by-one to the new workflow).ci.yml
is responsible for orchestrating the workflows with the help of aworker.yml
.worker.yml
is responsible for doing the actual work, as per requested byci.yml
via the input arguments.Motivation behind this change:
ci.yml
will make it much easier to understand the dependency graph between all the different workflows.worker.yml
takes ascript
as one of the input args. The idea behind this design is to make it easy for folks to be able to run the same scripts locally, whenever they require to run the workflows locally (during development or debugging). This becomes difficult to figure out when we have a bunch of yml files and we're passing around input args through them, hence we wanted to keep it as simple as possible to understand and reproduce locally.Moving unit tests (phase 1) to the new workflow
The unit tests (phase 1) that was being run via
unit-tests.yml
,reusable-unit-tests.yml
andunit_tests_script.sh
- has been updated to run viaci.yml
,worker.yml
andrun-unit-tests.sh
. As a result,unit-tests.yml
,reusable-unit-tests.yml
andunit_tests_script.sh
have also been removed since they're unused now (the same has been done for unit tests phase 2, as can be read in the next section).Currently, we were running unit tests for 4 different set of modules:
indexing
,processing
,server
, and "everything else". This resulted in longer duration of the job, with processing modules taking 50+ minutes, for example. Also, the design with "everything else" wouldn't scale in the long term, since it's everything that isn't covered in the other 3 modules.This design has been updated to run using regex patterns, on the starting character of the test class. We have the following in
ci.yml
:Hence, all unit test classes starting with
A
andB
gets run as part of a single job, across all modules. Similar thing happens forC
, similar forD
,E
,F
, and so on.The character groupings have been done to try and avoid skewness across them. This also allows scaling in the long term, as well as the flexibility to tune these groupings, if and when needed.
With this change, most groups finish in around 15-18 minutes, and a few of them finishing in around 25 minutes.
The reporting mechanism for this design of running unit tests is being covered in the
New reporting mechanism
section below.Moving unit tests (phase 2) to a weekly cron workflow
For context, the unit tests phase 1 are being run against JDK 17, whereas the unit tests phase 2 are being run against JDKs 11 and 21.
It's very rare for a test to have different result in JDK 17 versus JDK 11/21, hence it is redundant to run unit tests phase 2 on every commit on every PR.
Hence, the unit tests phase 2 have been removed from the regular PR workflow, and instead have been added as a weekly cron workflow. This ensures that we are still running them once a week, and allows us to track failures, if any (which we ideally don't expect to have).
Unit tests phase 2 have also been updated to use the new workflow files and the new regex pattern approach, as done for unit tests phase 1.
New reporting mechanism
There are a few aspects to this:
ci.yml
where we are passing-Dmaven.test.failure.ignore=true
while running unit tests.reporting-unit-test-failures
job, which downloads all the artifacts uploaded in point (2) - hence gathering all the individual surefire reports in the same place/hierarchy, and then runs a reporter (mikepenz/action-junit-report@v5
) against those gathered surefire reports. I'll add some sample run links below for how the report looks like.reporting-jacoco-coverage-failures
job, which does something similar, and then usescreate-jacoco-coverage-report.sh
to create a jacoco report. This new script is mostly derived (with some simplifications) from the existingunit_tests_script.sh
we had, but the overall logic and the way of reporting (that is, the final output) is the same as what we had currently.Some examples for the unit test report:
Summary
section of the workflow run: https://github.com/Akshat-Jain/druid/actions/runs/13152426940/attempts/1#summary-36703177495report-unit-test-failures
job run itself: https://github.com/Akshat-Jain/druid/actions/runs/13152426940/job/36703177495 (you need to expand thePublish results
dropdown).Attaching sample images for both of the above examples below, for quicker reference:
Image for (1)
Image for (2)
This PR has: