Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use of atlas float fields #79

Merged
merged 5 commits into from
Feb 28, 2024
Merged

Conversation

twsearle
Copy link
Collaborator

@twsearle twsearle commented Feb 26, 2024

Description

Memory limitations on the HPC mean that it is difficult to fit the larger models with lots of variables into memory.

This change adds a configuration option to specify either float or double for the atlas fields. The default is also updated to float, which ought to approximately half memory use by orca-jedi. I have included additional logging to the trace output stream to track the memory usage over run time.

This change also includes making the reading of nemo data more robust to the underlying type of the netcdf data. There is probably a better way of managing this in future changes. Either using the type in the nemo file to determine the type of the atlas field, or avoiding reading all data in from the file -> std::vector buffer to mpi broadcast data to other processes -> filling the atlas field with specified type.

However, due to limitations in atlas::interpolation::NonLinear, all fields must be the same type, and the configuration of missing values adjustment to the weights must include the fortran label of the type after the name (e.g "-real32"). An atlas change is required to allow one run, one interpolator, with two fields of different type.

For now, in this change, I am keeping separate the concepts of the netcdf type of the nemo file variable and the atlas field datatype. Changes to the types used inside a run are controlled for now in two places:

  1. via "geometry:nemo variables: - {field precision: double, ...}"
  2. which when set to float and using missing values as part of interpolation requires specifying "get values: atlas-interpolator: non_linear: -real32"

Issues

Fixes #78

Dependencies

Mutually dependent on merge of MetOffice/jjdocs/pull/128

Checklist

  • I have updated the unit tests to cover the change
  • New functions are documented briefly via Doxygen comments in the code
  • I have linted my code using cpplint
  • I have run the unit tests
  • I have run mo-bundle to check integration with the rest of JEDI and run the unit tests under all environments

Testing

Tested using adjusted apps from jjdocs/feature/atlas-field-float-precision and the MPI sith branch

Tests using the jjdocs branch and the develop branch of sith:

* Make reading of nemo data more robust to netcdf type
* Add new option to set the atlas field type independently of nemo data
* NOTE: due to limitations in `atlas::interpolation::NonLinear`, all
  fields must be the same type, and the configuration of missing values
  adjustment to the weights must include the fortran label of the type
  after the name (e.g "-real32").
* An atlas change is required to allow one run, one interpolator, with
  two fields of different type.
@twsearle
Copy link
Collaborator Author

twsearle commented Feb 26, 2024

I have run a few experiments with this change, and I find that ocean profile runs will now run until running out of wallclock time instead of memory. This is an improvement but not yet conclusive evidence in support of this change.

sith_271_spice_ocnd_eorca12_oceansound_multi_20210701T0000Z

For comparison, this is the memory profile for an orca025 run using floats:

sith_271_spice_ocnd_oceansound_multi_20210701T0000Z

@twsearle twsearle self-assigned this Feb 26, 2024
@twsearle twsearle requested a review from s-good February 26, 2024 15:31
@twsearle
Copy link
Collaborator Author

I have made sure now that the MPI buffer is the same type as the netcdf data and atlas fields where possible. I am hoping this improves the performance a little.

Copy link
Collaborator

@s-good s-good left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of questions but all looks good to me and I'm happy to approve assuming there are no issues in reproducing the KGO in sith.

src/tests/orca-jedi/test_interpolator.cc Show resolved Hide resolved
src/orca-jedi/nemo_io/NemoFieldReader.cc Show resolved Hide resolved
@twsearle
Copy link
Collaborator Author

I left a couple of questions but all looks good to me and I'm happy to approve assuming there are no issues in reproducing the KGO in sith.

Great thanks! I am not sure that everything is working correctly yet, and I am also not sure if I am getting any performance improvements, but I wanted you to see what I was up to. We would also need to coordinate any change here with a change in jjdocs.

@twsearle twsearle marked this pull request as ready for review February 28, 2024 10:38
@twsearle
Copy link
Collaborator Author

I added an atlas issue based on this change, in case there is a better way of solving this problem, now or in the future with further atlas changes: ecmwf/atlas#173

@twsearle
Copy link
Collaborator Author

twsearle commented Feb 28, 2024

For further info, I ran this change using a checkerboard distribution of the orca12 grid on 16 processors. It is clear that most of the memory usage now comes from data in the obspace which ought to scale much better (the stepped part at the beginning third is allocating the fields, with the later spike happening once obs-filtering begins and the various filters/variables are constructed). The runtime was also down to around half an hour (the graph is misleading as I am not logging anything towards the end of the run). However, this run is not accurate as the orca grid doesn't yet include halos, which are required for interpolation over the checkerboard distribution.

j79-271-checkerboard_xc_ocnd_eorca12_20210701T0000Z

@twsearle twsearle merged commit 597c3e7 into develop Feb 28, 2024
2 checks passed
@twsearle twsearle deleted the feature/use-atlas-float-fields branch February 28, 2024 17:09
Copy link

@odlomax odlomax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a suggestion to refactor out all the if... else if... blocks that @s-good commented on. Otherwise looks good to me.

src/orca-jedi/nemo_io/NemoFieldReader.cc Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce memory consumption of orca-jedi
3 participants