-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ds-fusion] Add runtime support for host calculation of offsets in ds fusion #20332
Conversation
3f2863d
to
0096be0
Compare
0096be0
to
a8ddde8
Compare
Ping for review! |
xla/service/gpu/fusions/custom.cc
Outdated
if (callers[0]->opcode() == HloOpcode::kWhile) { | ||
return callers[0]; | ||
} | ||
op_ptr = callers[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent here to go through fusions, conditionals, etc...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That is the intent. Until we find the first while loop, we keep going to the parents (fusion, condition, command buffer etc).
@@ -47,14 +47,28 @@ limitations under the License. | |||
namespace xla { | |||
namespace gpu { | |||
|
|||
namespace { | |||
|
|||
std::unique_ptr<Literal>& Indvar(DynamicSliceThunk* thunk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves documentation.
if (indvar_update_ != nullptr) { | ||
Indvar(this) = HloEvaluator() | ||
.Evaluate(*indvar_update_, {Indvar(this).get()}) | ||
->CloneToUnique(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to clone to unique? Couldn't we just move
?
namespace { | ||
|
||
std::unique_ptr<Literal>& Indvar(DynamicSliceThunk* thunk) { | ||
static thread_local absl::flat_hash_map<DynamicSliceThunk*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that this design only works if there is a single offset we can calculate for each DynamicSliceThunk
?
Presumably, we could have several offset_modules
to evaluate per thunk, right?
If there is an invariant that we always have a single offset_module
per DynamicSliceThunk
, then this needs to be enforced somewhere and documented.
If not, then we need a test with several offset modules, and we need to make sure that this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this works even if there are multiple offsets. Added test for that. There is only one "induction variable" per thunk. Because the surrounding while loop has to be the same, and so the induction variable has to be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this works even if there are multiple offsets. Added test for that. There is only one "induction variable" per thunk. Because the surrounding while loop has to be the same, and so the induction variable has to be the same.
But nothing guarantees that a loop has a single induction variable, right? We could be updating offsets
based on different induction variables. Or what is it that would enforce the invariant that the offset is unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But nothing guarantees that a loop has a single induction variable, right?
There are checks for this. The dynamic slice fusion rewriter would not fuse this if there was no unique induction variable (here)
what is it that would enforce the invariant that the offset is unique?
I am not sure I understand this, we are making sure that the offset is a function of the unique induction variable while fusing, and asserting this while lowering to avoid any correctness issues. Can you elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are checks for this. The dynamic slice fusion rewriter would not fuse this if there was no unique induction variable (here)
Yeah, the point is that this is not enforced at all in the thunk---so in effect you are assuming here that things behave a certain way in another part of the pipeline, and that this behaviour will never change.
This assumption about how things work needs to be clearly documented in the thunk as well.
I am not sure I understand this, we are making sure that the offset is a function of the unique induction variable while fusing, and asserting this while lowering to avoid any correctness issues. Can you elaborate on this?
Basically my concern is that we rely on assumed invariants that are not documented nor guaranteed to be upheld by any static mechanism locally. I.e., when we try to refactor/update this code later, it'll be hard to ensure correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the point is that this is not enforced at all in the thunk---so in effect you are assuming here that things behave a certain way in another part of the pipeline, and that this behaviour will never change.
This should be enforced by the code emitting the thunk. I have added checks for that here. I don't know how we can enforce this within the thunk -- would it make sense for the thunk to accept a while operation, and check this itself, or would simply documenting this be enough? The fact that the thunk accepts only one indvar_init
and indvar_update
should force the user of this thunk to ensure that there is one induction variable, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the thunk accepts only one indvar_init and indvar_update should force the user of this thunk to ensure that there is one induction variable, wouldn't it?
Not necessarily; an induction variable doesn't strictly need to be a scalar---so indvar_init
and indvar_update
could return something else.
would simply documenting this be enough?
I think documenting would be a good step. I think it would be good also add CHECK
s that expected invariants are upheld in the constructor of the thunk---for instance, that the prototypes are as you expect, etc. Then I think it'll be reasonable.
(The goal here is to make sure we understand the invariants when looking at the thunk code in isolation---we can't predict how this logic may end up getting reused in the future :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the checks in the PR #20794
Thank you for the detailed review @bchetioui. As the changes for |
ASSERT_EQ(output_offsets.size(), 2); | ||
|
||
// The first value of offset must be an HloModule | ||
HloModule** offset_0 = std::get_if<HloModule*>(&output_offsets[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we check that the offset computations are actually happening on the host and not on the device?
We are sort of testing this by ensuring that the offset is an HloModule*
, and the d2h copy would require a slice on the device. It would crash if the offset was an HloModule, but it tried to do a device-to-host copy.
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 1b511365f20fcff612b0a69bfe09572d286f247f by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- 28d6b62f84bfa14c36e5e49942f5248724715f98 by Shraiysh Vaishay <[email protected]>: Addressed comments -- 956574290958ce78d2d91ffc518d605a5669dc97 by Shraiysh Vaishay <[email protected]>: Rebase -- 9a0b89b3382e186b5b15d8432df4037904806939 by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 9a0b89b3382e186b5b15d8432df4037904806939 PiperOrigin-RevId: 725220980
This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
9a0b89b
to
f98d9dc
Compare
@bchetioui thanks for approving this request. I rebased it with main because of the recent changes to |
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725220980
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e PiperOrigin-RevId: 725220980
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725220980
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e PiperOrigin-RevId: 725220980
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725220980
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR #20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73 by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e PiperOrigin-RevId: 725575581
…fsets in ds fusion Imported from GitHub PR openxla/xla#20332 This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. Copybara import of the project: -- 5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>: Add runtime support for host calculation of offsets in ds fusion This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases. -- b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>: Addressed comments -- decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>: Rebase -- f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>: Rebase Merging this change closes #20332 PiperOrigin-RevId: 726440397
This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.