Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warmup SharedTrieCache at startup #7533

Open
alexggh opened this issue Feb 11, 2025 · 4 comments
Open

Warmup SharedTrieCache at startup #7533

alexggh opened this issue Feb 11, 2025 · 4 comments
Assignees

Comments

@alexggh
Copy link
Contributor

alexggh commented Feb 11, 2025

Add logic at the node level, that based on the runtime configuration/maybe cli flag, decides that entire state needs to fit into memory, warms up the SharedTrieCache and performs some sanity checks to confirm memory usage is within reasonable bounds.

In case memory is insufficient or we are running low on it the node should abort graciously.

@bkchr
Copy link
Member

bkchr commented Feb 11, 2025

Add logic at the collator level

Just general node level.

Not sure we need some sanity checks. As we read the state, we could warn when we see that the memory usage is too high or abort "nicely".

@alexggh alexggh changed the title Collator warmup SharedTrieCache at startup Warmup SharedTrieCache at startup Feb 11, 2025
@alexggh
Copy link
Contributor Author

alexggh commented Feb 11, 2025

Not sure we need some sanity checks. As we read the state, we could warn when we see that the memory usage is too high or abort "nicely".

Yes, the end result should be either the node succeeded warming the SharedTrieCache and the spare memory is still in some reasonable bounds and if either can't fit the entire state into the SharedTrieCache or if it is running low on memory post warming it should abort nicely.

Updated the description to make it more clear.

@AndreiEres
Copy link
Contributor

Here is my understanding of what we need to do.

We’re launching smart contracts on AssetHub, so we expect many small reads and writes during contract execution. The current performance is insufficient. We aim to reduce storage access time by fitting the entire Trie cache into memory.

The current task involves fitting the state into memory based on a CLI flag or runtime configuration. It requires warming up the state and performing sanity checks to ensure memory usage remains within reasonable limits; otherwise, it should abort gracefully.

Here is a breakdown of the task by parts. The current statements are not rock solid and may change over time.

Configuration. For a proof of concept, we will start with a CLI flag. After successful testing, it may be changed to a runtime configuration.

Warm up. In a given CLI configuration, we populate the value cache with values from the database, skipping the node cache. If the storage needs to be initialized, we skip the warm-up because the storage is empty.

Question: which root hashes we should iterate over?

Sanity checks. We can start by stating that the trie cache cannot use more than 50% of the total system memory. Since we load all database records into the cache, we can use the database size as an initial value. If this size exceeds our available 50% of memory, we should abort.

Question: how will the cache grow over time?

Aborting. If we don’t have enough memory, it means we already know from the start that the node will underperform. Simply generating an error won’t help, as we can’t rely on node owners to monitor the logs carefully. At this stage, a node shutdown seems like a better solution.

@bkchr, @alexggh, Could you check if this is correct and help with my questions?

@bkchr
Copy link
Member

bkchr commented Feb 18, 2025

Warm up. In a given CLI configuration, we populate the value cache with values from the database, skipping the node cache. If the storage needs to be initialized, we skip the warm-up because the storage is empty.

We can not skip the node cache. Firstly, values are only weak references to the actual node. Secondly, we need the nodes for proof generation for example.

To the cache grow question. It is a little bit complicated. For the start we could probably set the maximum size to the entire state size plus some percentage. A more advanced version would try to update the maximum every x blocks based on the block state size (which is maybe not that fast to determine as we need to iterate the entire state tree). But this would also mean that the memory requirement would grow all the time.

When thinking about this, we can not really guarantee that we always hit the cache, because there can be forks that may push some data out of the cache.

For the initial size we can also not use the database size. The database size is all the blocks, plus all the non pruned state which is much bigger than the actual state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants