Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize side table #711

Merged
merged 23 commits into from
Jan 10, 2025
Merged

Conversation

zhouwfang
Copy link
Member

@zhouwfang zhouwfang commented Dec 27, 2024

I'd like to get some feedback about the design: the "section table" is used to improve the performance of these accessors in module.rs, which are used in execution. For these three accessors that return Parser, we can precompute the ParseRange during validation, and create a parser based on the range during execution.

I'm not sure about the other accessors that return types or even Option<ExportDesc>. IIUC, we want to store the section table in flash as the side table. Does that mean we would have to design some encoding rules for these return types?

Update 1

We will only optimize func() and func_type() in module.rs, because the other accessors should be fast.

Update 2

  • Rename the previous SideTableEntry as BranchTableEntry.
  • Create per-function SideTable to include branch table and other function metadata.

#46

@zhouwfang zhouwfang requested a review from ia0 as a code owner December 27, 2024 05:50
@zhouwfang zhouwfang marked this pull request as draft December 27, 2024 05:50
@ia0
Copy link
Member

ia0 commented Dec 27, 2024

On the principle this looks good. From an organization point of view, I would put everything in the side table (which represents pre-computed information on the wasm bytecode). This side table would contain information about how to take branches/jumps, where to find functions, etc. It needs to be serializable with easy random access.

Regarding the other type of accessors, it's probably enough to store their position in the wasm bytecode and deserialize them on demand. This is still O(1). We can probably avoid storing the "end" parser position by always using the end of the wasm bytecode. For functions, we could also store only one position, which would be right before the u32 encoded size, meaning we need to parse_u32 and split_at for each access, but we save 50% storage, so it's some tradeoff.

@zhouwfang zhouwfang force-pushed the improve-module-accessors branch from 1eefb4b to d093a8c Compare January 2, 2025 21:45
@zhouwfang zhouwfang changed the title Add section table Optimize func() and func_type() in module.rs Jan 3, 2025
@zhouwfang
Copy link
Member Author

@ia0 Could you take another look at the design? Thanks.

ia0 pushed a commit that referenced this pull request Jan 3, 2025
Based on experiments, these are the smallest masks to pass CI. In other
words, `SideTableEntry` only requires `u35` at this point. I'll add the
fields from `func_type()` and `func()` in `module.rs` to the side table
in #711, and `SideTableEntry` as `u64` might not be sufficient.

#46

---------

Co-authored-by: Zhou Fang <[email protected]>
Copy link
Member

@ia0 ia0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (generalized) side table should logically be a mapping from function index to function metadata, where a function metadata has the following components:

  • The type (index) of the function.
  • The body (parser) of the function.
  • The "side table" of the function. Let's call this branch table.

We want to serialize this logical value somehow. One idea is to have something like this:

side-table := index flatten-metadata (16-bits aligned)
index := u16 u16 ... u16 (N - 1 times where N is the number of functions in the module)
flatten-metadata := metadata metadata ... metadata (N times where `flatten-metadata[i + 1]` starts at byte position `index[i]`)
metadata := parser:(u32 u32) type:u16 branch-table (if 32-bits aligned)
          | type:u16 parser:(u32 u32) branch-table (otherwise)
branch-table := branch branch ... branch
branch := delta-ip:u16 delta-stp:u16 val-cnt:u8 pop-cnt:u12

crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
@zhouwfang
Copy link
Member Author

zhouwfang commented Jan 5, 2025

metadata := parser:(u32 u32) type:u16 branch-table (if 32-bits aligned)
| type:u16 parser:(u32 u32) branch-table (otherwise)

Could you elaborate on the two cases here? By "otherwise", do you mean 16 bits aligned?

@zhouwfang zhouwfang force-pushed the improve-module-accessors branch from fa1544b to af54e07 Compare January 6, 2025 03:59
@zhouwfang zhouwfang changed the title Optimize func() and func_type() in module.rs Generalize SideTableEntry Jan 6, 2025
@zhouwfang
Copy link
Member Author

zhouwfang commented Jan 6, 2025

@ia0 Please take a look at the generalized SideTable. Let's first agree on the design, and I'll add unit tests later. Thanks.

@zhouwfang zhouwfang requested a review from ia0 January 6, 2025 04:01
@zhouwfang zhouwfang changed the title Generalize SideTableEntry Generalize SideTable Jan 6, 2025
@zhouwfang zhouwfang changed the title Generalize SideTable Generalize side table Jan 6, 2025
@zhouwfang zhouwfang marked this pull request as ready for review January 6, 2025 04:11
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
@ia0
Copy link
Member

ia0 commented Jan 6, 2025

metadata := parser:(u32 u32) type:u16 branch-table (if 32-bits aligned)
| type:u16 parser:(u32 u32) branch-table (otherwise)

Could you elaborate on the two cases here? By "otherwise", do you mean 16 bits aligned?

Yes. Every "item" is at least 16 bits aligned. metadata may either be at least 32-bits aligned in which case the parser comes first, or exactly 16-bits aligned (in which case the type index comes first).

@zhouwfang zhouwfang requested a review from ia0 January 7, 2025 04:28
@zhouwfang
Copy link
Member Author

metadata := parser:(u32 u32) type:u16 branch-table (if 32-bits aligned)
| type:u16 parser:(u32 u32) branch-table (otherwise)

Could you elaborate on the two cases here? By "otherwise", do you mean 16 bits aligned?

Yes. Every "item" is at least 16 bits aligned. metadata may either be at least 32-bits aligned in which case the parser comes first, or exactly 16-bits aligned (in which case the type index comes first).

Why does the order matter? We can read exactly based on how we write, and the alignment is known by design when we write.

crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
@ia0
Copy link
Member

ia0 commented Jan 7, 2025

metadata := parser:(u32 u32) type:u16 branch-table (if 32-bits aligned)
| type:u16 parser:(u32 u32) branch-table (otherwise)

Could you elaborate on the two cases here? By "otherwise", do you mean 16 bits aligned?

Yes. Every "item" is at least 16 bits aligned. metadata may either be at least 32-bits aligned in which case the parser comes first, or exactly 16-bits aligned (in which case the type index comes first).

Why does the order matter? We can read exactly based on how we write, and the alignment is known by design when we write.

For the CPUs we use it probably doesn't matter because they are quite simple. But Rust which supports much more, makes a distinction between aligned and unaligned accesses. But maybe the solution is to just use &[u8] everywhere and use {to,from}_ne_bytes() functions. The code will probably be optimized as expected.

Regarding the order, every metadata has an odd amount of u16 because parser and branch-table have both an even amount of u16. So metadata are alternating between 32-bit aligned or only 16-bit aligned. If we want the u32 to be 32-bit aligned then the order matter (and the order is alternating).

There's the same problem for writing. We need to write with the proper alignment. It is not known, it depends how many branches are in each function before the current function. We could add a u16 padding after some functions to make sure all metadata start 32-bit aligned, so that even indices and odd indices can be used to tell the alignment.

But I think at this point, it's probably better to just not rely on alignment and read all values as &[u8]. All flash controllers we use so far support reading at byte granularity directly from linear memory. So it's just a matter of unaligned reads having the same performance as aligned reads for the CPU we consider.

@zhouwfang zhouwfang requested a review from ia0 January 8, 2025 08:06
crates/interpreter/Cargo.toml Outdated Show resolved Hide resolved
crates/interpreter/src/exec.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
@zhouwfang zhouwfang force-pushed the improve-module-accessors branch from e9e4e31 to 17f13e3 Compare January 9, 2025 01:44
@zhouwfang zhouwfang requested a review from ia0 January 9, 2025 06:31
Copy link
Member

@ia0 ia0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think it looks good now, modulo the redoing of the BranchTableEntry representation (using [u16; 3] and no bitfield).

crates/interpreter/src/exec.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
@zhouwfang zhouwfang requested a review from ia0 January 10, 2025 06:07
ia0
ia0 previously approved these changes Jan 10, 2025
Copy link
Member

@ia0 ia0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM modulo the nits. I'm creating a review commit and merging.

crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Outdated Show resolved Hide resolved
crates/interpreter/src/side_table.rs Show resolved Hide resolved
ia0
ia0 previously approved these changes Jan 10, 2025
@ia0 ia0 merged commit 4c7ce53 into google:dev/fast-interp Jan 10, 2025
20 checks passed
@ia0 ia0 mentioned this pull request Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants