You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
We saw the issue that a broadcast tensor from a single-dimension parameter is marked sharded by XLA sharding propagator. This sharded tensor, while doing computation with other tensor which has incompatible sharding spec, will incur additional communications. However, these communications can be avoided since the value of the broadcast tensor is just copies of the single-dimension parameter.
This happens commonly when the broadcast tensor is reused for computations with tensors of different sharding spec, eg. optimizer parameters doing point-wise computation with weight update.
We want to propose to change the sharding spec of broadcast tensors to replicate if the input of the broadcast is replicated. And this will be done in the SPMD partitioner pass after the shape partitioning is finished.
The text was updated successfully, but these errors were encountered:
Hello,
We saw the issue that a broadcast tensor from a single-dimension parameter is marked sharded by XLA sharding propagator. This sharded tensor, while doing computation with other tensor which has incompatible sharding spec, will incur additional communications. However, these communications can be avoided since the value of the broadcast tensor is just copies of the single-dimension parameter.
HLO example:
This happens commonly when the broadcast tensor is reused for computations with tensors of different sharding spec, eg. optimizer parameters doing point-wise computation with weight update.
We want to propose to change the sharding spec of broadcast tensors to replicate if the input of the broadcast is replicated. And this will be done in the SPMD partitioner pass after the shape partitioning is finished.
The text was updated successfully, but these errors were encountered: