-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement common swizzle operations. #335
base: master
Are you sure you want to change the base?
Conversation
crates/core_simd/src/swizzle.rs
Outdated
/// ``` | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho this should include something like lanewise
in it's name, since I expect Rust to gain a bitwise grev
at some point and we'd want the bitwise simd and scalar integer operations to have matching names (probably gen_rev
or generalized_reverse
or grev
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For something that effectively implements a very specific xor-striding pattern, general_reverse
is a non-descript name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, grev
is what that particular bitwise op has become named, thanks to RISC-V afaict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grev -- general bit reverse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because an ISA has picked a terrible name doesn't mean we need to copy it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the precedent of u32::count_ones
, it looks like the Rust standard library takes the convention of providing a more meaningful name (count_ones
) while maintaining searchability for the common name popcnt
using #[doc(alias = "popcnt")]
, https://github.com/rust-lang/rust/blob/f1b1ed7e18f1fbe5226a96626827c625985f8285/library/core/src/num/int_macros.rs#L104. I think such an approach could be warranted here too, keeping grev
as an alias. Indeed, grev
is a much less established term than popcnt
, so the precedent from grev is weaker.
Other names I considered:
- butterfly_shuffle. The idea here is that, in the common case of SWAP_MASK being a power of 2, it implements one stage of a butterfly network. https://en.wikipedia.org/wiki/Butterfly_network
- swap_lanes_xor. The "swap lanes" part is pretty self-explanatory. The "xor" part is confusing, however: it suggests that the data bits are being xored, whereas it's actually the lane indices that are being xored.
Currently I lean towards swap_lanes_xor. Open to other suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swizzle_to_xor_indexes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverse_pow2_lane_groups
? Conceptually this operation performs the reversal of blocks of k
lanes within n
-lane groups, where k
and n
are both powers of two, k ≤ n ≤ LANES
, and the choice of k
and n
is determined uniquely up to operation uniqueness by choosing where index 0
will be swizzled to (it’ll be exactly SWAP_MASK
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverse_pow2_lane_groups
?
afaict grev
is actually more powerful than that, it can do any arbitrary combination of those k
-n
reversals for arbitrary k
and n
.
e.g. grev(v, 5)
is equivalent to simd_swizzle!(v, [5, 4, 7, 6, 1, 0, 3, 2])
which is a combination of reversing adjacent pairs (blocks of 1) and swapping blocks of 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the suggestions. Of the proposals so far, my preference order is:
swizzle_to_xor_indices
butterfly_swizzle
grev
I have gone with swizzle_to_xor_indices
. Let me know what you think.
crates/core_simd/src/swizzle.rs
Outdated
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`. | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concat_to
? Why not just concat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose concat_to
because I imagine we'll want to eventually (when generic_const_exprs
stabilizes) deprecate this function in favor of a function that uses generic_const_exprs
. I wanted to reserve the better name concat
for that new function, which would have signature:
fn concat(self, other: Self) -> Simd<T, {LANES * 2}>
The idea of the concat_to
naming is that most call sites will look like x.concat_to::<8>(y)
or similar, so it can be read as "concatenate x to length 8 using y".
Open to other suggestions. One option would be to call it concat
now, and don't worry about introducing a breaking change in future once the generic_const_exprs
approach works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hm.
crates/core_simd/src/swizzle.rs
Outdated
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`. | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES> | ||
where | ||
LaneCount<DOUBLE_LANES>: SupportedLaneCount, | ||
{ | ||
const fn concat_index<const DOUBLE_LANES: usize>(lanes: usize) -> [Which; DOUBLE_LANES] { | ||
assert!(lanes * 2 == DOUBLE_LANES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking: Are you aware that when you put something in a const fn
, and then the const fn
is evaluated at runtime, that this means an assert!
in it will not be evaluated at compilation time, but rather will be evaluated at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only used at compile time, so that doesn't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I see now, it happens in the associated constant. Then that requires monomorphization which puts it at risk of depending on trait and const evaluation details. Hmm. I do want to allow us to use certain post-monomorphization errors but I currently don't fully understand the evaluation patterns that rustc will do to intuit in what cases this will/will not happen.
There are cases where, due to various inputs you can give to the compiler, "dead" code can potentially get resurrected and monomorphized anyways (starting, of course, with -Clink-dead-code
), so I want to better know whether this will trigger on those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guarantee: if you don't instantiate this generic at an invalid input/output type pair, then you won't get a post-monomorphization error. So this is like a type error, except with worse ergonomics because it's raised during monomorphization rather than type checking.
If we had generic_const_exprs
we'd make this a genuine type error rather than monomorphization-time error, by giving this function the type concat_to(self, other: Self) -> Simd<T, {LANES * 2}>
. I've created commit reinerp@1c2972b on a separate branch, to show how this would work. Unfortunately, the test doesn't compile due to what seems like a bug in generic_const_exprs
.
More broadly, I would like to be able to use concat
and split
operations in my own code without requiring the generic_const_exprs
language feature. Hence why I took my current approach for avoiding generic_const_exprs
. From my perspective the generic_const_exprs
feature is much less stable than portable_simd
, and indeed portable_simd
is the only unstable feature I allow in my own code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very helpful comments! I have switched from slice
to a split
API, and have made a few minor cleanups. Tests should pass now, I believe.
crates/core_simd/src/swizzle.rs
Outdated
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`. | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES> | ||
where | ||
LaneCount<DOUBLE_LANES>: SupportedLaneCount, | ||
{ | ||
const fn concat_index<const DOUBLE_LANES: usize>(lanes: usize) -> [Which; DOUBLE_LANES] { | ||
assert!(lanes * 2 == DOUBLE_LANES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guarantee: if you don't instantiate this generic at an invalid input/output type pair, then you won't get a post-monomorphization error. So this is like a type error, except with worse ergonomics because it's raised during monomorphization rather than type checking.
If we had generic_const_exprs
we'd make this a genuine type error rather than monomorphization-time error, by giving this function the type concat_to(self, other: Self) -> Simd<T, {LANES * 2}>
. I've created commit reinerp@1c2972b on a separate branch, to show how this would work. Unfortunately, the test doesn't compile due to what seems like a bug in generic_const_exprs
.
More broadly, I would like to be able to use concat
and split
operations in my own code without requiring the generic_const_exprs
language feature. Hence why I took my current approach for avoiding generic_const_exprs
. From my perspective the generic_const_exprs
feature is much less stable than portable_simd
, and indeed portable_simd
is the only unstable feature I allow in my own code.
crates/core_simd/src/swizzle.rs
Outdated
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`. | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose concat_to
because I imagine we'll want to eventually (when generic_const_exprs
stabilizes) deprecate this function in favor of a function that uses generic_const_exprs
. I wanted to reserve the better name concat
for that new function, which would have signature:
fn concat(self, other: Self) -> Simd<T, {LANES * 2}>
The idea of the concat_to
naming is that most call sites will look like x.concat_to::<8>(y)
or similar, so it can be read as "concatenate x to length 8 using y".
Open to other suggestions. One option would be to call it concat
now, and don't worry about introducing a breaking change in future once the generic_const_exprs
approach works.
crates/core_simd/src/swizzle.rs
Outdated
/// ``` | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the precedent of u32::count_ones
, it looks like the Rust standard library takes the convention of providing a more meaningful name (count_ones
) while maintaining searchability for the common name popcnt
using #[doc(alias = "popcnt")]
, https://github.com/rust-lang/rust/blob/f1b1ed7e18f1fbe5226a96626827c625985f8285/library/core/src/num/int_macros.rs#L104. I think such an approach could be warranted here too, keeping grev
as an alias. Indeed, grev
is a much less established term than popcnt
, so the precedent from grev is weaker.
Other names I considered:
- butterfly_shuffle. The idea here is that, in the common case of SWAP_MASK being a power of 2, it implements one stage of a butterfly network. https://en.wikipedia.org/wiki/Butterfly_network
- swap_lanes_xor. The "swap lanes" part is pretty self-explanatory. The "xor" part is confusing, however: it suggests that the data bits are being xored, whereas it's actually the lane indices that are being xored.
Currently I lean towards swap_lanes_xor. Open to other suggestions!
Co-authored-by: Jacob Lifshay <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all comments are now addressed. Please take another look.
crates/core_simd/src/swizzle.rs
Outdated
/// ``` | ||
#[inline] | ||
#[must_use = "method returns a new vector and does not mutate the original inputs"] | ||
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the suggestions. Of the proposals so far, my preference order is:
swizzle_to_xor_indices
butterfly_swizzle
grev
I have gone with swizzle_to_xor_indices
. Let me know what you think.
Hi folks, I believe I have responded to all review comments so far. My sense is there are some areas where opinions differ between people on this thread on the best path forward, namely:
My sense is that this PR is stuck in review, pending consensus on these decisions. I'm not sure what process your project uses to make decisions in such cases. Could you please advise on the process? I am very excited about the |
I've definitely been neglecting this PR. Some thoughts:
|
concat
andslice
address #277.general_reverse
is useful for horizontal reductions, bitonic sorting, and many similar cases.