Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ds-fusion] Add runtime support for host calculation of offsets in ds fusion #20332

Closed
wants to merge 4 commits into from

Conversation

shraiysh
Copy link
Contributor

@shraiysh shraiysh commented Dec 9, 2024

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.

@shraiysh shraiysh requested a review from bchetioui December 9, 2024 22:09
@shraiysh shraiysh self-assigned this Dec 9, 2024
@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Dec 9, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 9, 2024
@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Dec 9, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 9, 2024
@shraiysh shraiysh force-pushed the ds_fusion_3 branch 3 times, most recently from 3f2863d to 0096be0 Compare December 11, 2024 18:23
@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Dec 11, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 11, 2024
@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Dec 11, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 11, 2024
@shraiysh
Copy link
Contributor Author

Ping for review!

@shraiysh shraiysh changed the title Add runtime support for host calculation of offsets in ds fusion [ds-fusion] Add runtime support for host calculation of offsets in ds fusion Dec 20, 2024
if (callers[0]->opcode() == HloOpcode::kWhile) {
return callers[0];
}
op_ptr = callers[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent here to go through fusions, conditionals, etc...?

Copy link
Contributor Author

@shraiysh shraiysh Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is the intent. Until we find the first while loop, we keep going to the parents (fusion, condition, command buffer etc).

@@ -47,14 +47,28 @@ limitations under the License.
namespace xla {
namespace gpu {

namespace {

std::unique_ptr<Literal>& Indvar(DynamicSliceThunk* thunk) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves documentation.

if (indvar_update_ != nullptr) {
Indvar(this) = HloEvaluator()
.Evaluate(*indvar_update_, {Indvar(this).get()})
->CloneToUnique();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to clone to unique? Couldn't we just move?

namespace {

std::unique_ptr<Literal>& Indvar(DynamicSliceThunk* thunk) {
static thread_local absl::flat_hash_map<DynamicSliceThunk*,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this design only works if there is a single offset we can calculate for each DynamicSliceThunk?

Presumably, we could have several offset_modules to evaluate per thunk, right?

If there is an invariant that we always have a single offset_module per DynamicSliceThunk, then this needs to be enforced somewhere and documented.

If not, then we need a test with several offset modules, and we need to make sure that this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this works even if there are multiple offsets. Added test for that. There is only one "induction variable" per thunk. Because the surrounding while loop has to be the same, and so the induction variable has to be the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this works even if there are multiple offsets. Added test for that. There is only one "induction variable" per thunk. Because the surrounding while loop has to be the same, and so the induction variable has to be the same.

But nothing guarantees that a loop has a single induction variable, right? We could be updating offsets based on different induction variables. Or what is it that would enforce the invariant that the offset is unique?

Copy link
Contributor Author

@shraiysh shraiysh Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But nothing guarantees that a loop has a single induction variable, right?

There are checks for this. The dynamic slice fusion rewriter would not fuse this if there was no unique induction variable (here)

what is it that would enforce the invariant that the offset is unique?

I am not sure I understand this, we are making sure that the offset is a function of the unique induction variable while fusing, and asserting this while lowering to avoid any correctness issues. Can you elaborate on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are checks for this. The dynamic slice fusion rewriter would not fuse this if there was no unique induction variable (here)

Yeah, the point is that this is not enforced at all in the thunk---so in effect you are assuming here that things behave a certain way in another part of the pipeline, and that this behaviour will never change.

This assumption about how things work needs to be clearly documented in the thunk as well.

I am not sure I understand this, we are making sure that the offset is a function of the unique induction variable while fusing, and asserting this while lowering to avoid any correctness issues. Can you elaborate on this?

Basically my concern is that we rely on assumed invariants that are not documented nor guaranteed to be upheld by any static mechanism locally. I.e., when we try to refactor/update this code later, it'll be hard to ensure correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the point is that this is not enforced at all in the thunk---so in effect you are assuming here that things behave a certain way in another part of the pipeline, and that this behaviour will never change.

This should be enforced by the code emitting the thunk. I have added checks for that here. I don't know how we can enforce this within the thunk -- would it make sense for the thunk to accept a while operation, and check this itself, or would simply documenting this be enough? The fact that the thunk accepts only one indvar_init and indvar_update should force the user of this thunk to ensure that there is one induction variable, wouldn't it?

Copy link
Member

@bchetioui bchetioui Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the thunk accepts only one indvar_init and indvar_update should force the user of this thunk to ensure that there is one induction variable, wouldn't it?

Not necessarily; an induction variable doesn't strictly need to be a scalar---so indvar_init and indvar_update could return something else.

would simply documenting this be enough?

I think documenting would be a good step. I think it would be good also add CHECKs that expected invariants are upheld in the constructor of the thunk---for instance, that the prototypes are as you expect, etc. Then I think it'll be reasonable.

(The goal here is to make sure we understand the invariants when looking at the thunk code in isolation---we can't predict how this logic may end up getting reused in the future :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the checks in the PR #20794

@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Jan 13, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 13, 2025
@shraiysh
Copy link
Contributor Author

Thank you for the detailed review @bchetioui. As the changes for dynamic_slice_thunk.(cc|h) have been split to #20794, I will be addressing those comments there. I have tried to address the remaining comments. Please let me know if I have missed something. Opening this for review again (changes except the ones in dynamic_slice_thunk.)

@shraiysh shraiysh requested a review from bchetioui January 13, 2025 22:15
ASSERT_EQ(output_offsets.size(), 2);

// The first value of offset must be an HloModule
HloModule** offset_0 = std::get_if<HloModule*>(&output_offsets[0]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we check that the offset computations are actually happening on the host and not on the device?

We are sort of testing this by ensuring that the offset is an HloModule*, and the d2h copy would require a slice on the device. It would crash if the offset was an HloModule, but it tried to do a device-to-host copy.

@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Jan 16, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 16, 2025
@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Jan 17, 2025
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 10, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
1b511365f20fcff612b0a69bfe09572d286f247f by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
28d6b62f84bfa14c36e5e49942f5248724715f98 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
956574290958ce78d2d91ffc518d605a5669dc97 by Shraiysh Vaishay <[email protected]>:

Rebase

--
9a0b89b3382e186b5b15d8432df4037904806939 by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 9a0b89b3382e186b5b15d8432df4037904806939
PiperOrigin-RevId: 725220980
This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.
@shraiysh shraiysh added the kokoro:force-run Forces CI to rerun label Feb 10, 2025
@shraiysh shraiysh requested a review from bchetioui February 10, 2025 17:49
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 10, 2025
@shraiysh
Copy link
Contributor Author

@bchetioui thanks for approving this request. I rebased it with main because of the recent changes to CreateExecutable. Can you check this once again?

copybara-service bot pushed a commit that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725220980
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e
PiperOrigin-RevId: 725220980
copybara-service bot pushed a commit that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725220980
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e
PiperOrigin-RevId: 725220980
copybara-service bot pushed a commit that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725220980
copybara-service bot pushed a commit that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 11, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit that referenced this pull request Feb 12, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 12, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit that referenced this pull request Feb 13, 2025
…fsets in ds fusion

Imported from GitHub PR #20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73 by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20332 from shraiysh:ds_fusion_3 f98d9dc
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 13, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20332 from shraiysh:ds_fusion_3 f98d9dc15b6d0b9b7a2368f55de607a65da1c56e
PiperOrigin-RevId: 725575581
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Feb 13, 2025
…fsets in ds fusion

Imported from GitHub PR openxla/xla#20332

This patch adds the support for calculating offset on the host at runtime when the offset depends on the loop induction variable. This is done by extracting the offset computation, the induction variable initialization and the induction variable update as independent computations and they are evaluated on the host at runtime. This avoids device-to-host copy for this fusion in these cases.
Copybara import of the project:

--
5c85fe7140d462c3282e422c2d9090028afd6eb4 by Shraiysh Vaishay <[email protected]>:

Add runtime support for host calculation of offsets in ds fusion

This patch adds the support for calculating offset on the host at
runtime when the offset depends on the loop induction variable. This is
done by extracting the offset computation, the induction variable
initialization and the induction variable update as independent
computations and they are evaluated on the host at runtime. This avoids
device-to-host copy for this fusion in these cases.

--
b5573b0ceb00d302634c0d95afb9d1ff7d697e64 by Shraiysh Vaishay <[email protected]>:

Addressed comments

--
decde73b33ba03a74ce896918c9cf9aaff13ddca by Shraiysh Vaishay <[email protected]>:

Rebase

--
f98d9dc15b6d0b9b7a2368f55de607a65da1c56e by Shraiysh Vaishay <[email protected]>:

Rebase

Merging this change closes #20332

PiperOrigin-RevId: 726440397
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants