Add Pathways Recipe Support for Scale Testing #1220
+1,299
−1,028
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR contains the following changes:
Adds the following 2 recipes:
a. Benchmarking Pathways and McJAX code
b. Long running test on Pathways.
This is done to improve code/recipe sharing between developers during testing and make benchmarking simpler by
sharing a single, well-defined config. This will be used for testing and verification on v6e capacity.
By default allows all configs to run pathways (enable_single_controller, disable zarr3, enable_pathways_goodput etc.), with additional configs that can be added by the user with
pathways_tuning_params
.Allow removing problematic XLA flags for pathways with
pathways_xla_flag_options
.Cleans up a lot of old pathways specific configs.
Tests
Run multiple rounds on multiple v6e clusters
Checklist
Before submitting this PR, please make sure (put X in square brackets):