Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Delay metalizing of multi-slot until iterative scheduling is converged #182

Open
wants to merge 1 commit into
base: aie-public
Choose a base branch
from

Conversation

krishnamtibrewala
Copy link
Collaborator

@krishnamtibrewala krishnamtibrewala commented Sep 6, 2024

This PR allows Multi-Slot Instr. to be used during iterative scheduling of loop.
Before this PR we were materializing Multi-Slot Instr. to selected OpCode/Slot after first iteration of iterative scheduling.
Now we wait until PostRA scheduling is converged to an acceptable schedule and we have decided to commit the schedule and move to next MBB.

  • When is the materialization triggered now? : When we leave a MBB.
  • Does this change depending on the region type? : The changes are agnostic to region type.
  • Could you think of the case where the materialization is not triggered before moving on to another MBB? : None that I can think of

Note : Given the information of what Alternate opcode/desc was selected is stored in Hazard Recognizer for a region.
And by the time we come to the end of MBB ( i.e leaveMBB() ) we do not have access to the instance of those Hazard Recognizer, therefore we need to make the Alternate opcode/desc part of the BlockState

@krishnamtibrewala
Copy link
Collaborator Author

@martien-de-jong , @andcarminati.
Kindly review and provide an early feedback toward the approach

@andcarminati
Copy link
Collaborator

Hi @krishnamtibrewala, nice work! Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).

Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test example where we see improvement?

llvm/lib/Target/AIE/AIEInterBlockScheduling.h Outdated Show resolved Hide resolved
llvm/lib/Target/AIE/AIEInterBlockScheduling.h Outdated Show resolved Hide resolved
@krishnamtibrewala
Copy link
Collaborator Author

Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).

@andcarminati I tried but I did not see any change, still investigating why.

Could we have a test example where we see improvement?

@martien-de-jong still figuring out.

Based on discussion with @gbossu we were expecting some impact but with current implementation QoR have no change.

$wh10 = VMOV_mv_w $wl0
JNZ $r3, %bb.1
DelayedSchedBarrier
bb.2:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached is a diff, I am still trying to figure out how things are interacting.
Also will try to come up with a smaller test case.

image

@andcarminati
Copy link
Collaborator

As a general advice, I think we should have a class to manage the description handling. We can encapsulate and use it in HazardRecognizer and InterBlockScheduling.

@@ -263,14 +266,22 @@ class PipelineExtractor : public PipelineScheduleVisitor {
// Prologue and epilogue obtain copies.
MachineInstr *ToBeEmitted =
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prefer not using CloneMachineInstr, since this is not added to MBB and other structure like BlockState

Copy link
Collaborator

@martien-de-jong martien-de-jong Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, using CloneMachineInstr is essentially what the post pipeliner needs. When we call this extraction, everything in the schedule has been frozen. We can materialize the pseudos just before calling the extractor.
The original instructions reside in the loop block's region. We materialize them first, then cloning them will see the correct descriptor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see a better way to emit an exact copy of an instruction into another basic block, please let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had discussed that offline with Krishnam. The post-pipeliner should only deal with "finalized" instructions when materializing the schedule, i.e. no multi-slot opcodes anywhere. So cloning is fine at this stage, because state like AlternateOpcodes is outdated (and using it should be forbidden with asserts if possible).

@krishnamtibrewala krishnamtibrewala changed the title [AIEX] Re-assign multi-slot instructions during iterative scheduling [AIEX] Delay metalizing of multi-slot until iterative scheduling is converged Oct 14, 2024
@krishnamtibrewala krishnamtibrewala force-pushed the aie-loop-multiOpcode branch 2 times, most recently from 02d0b50 to 8bf356f Compare October 14, 2024 11:24
@krishnamtibrewala krishnamtibrewala force-pushed the aie-loop-multiOpcode branch 2 times, most recently from 7a241b2 to fdab2af Compare October 22, 2024 18:32
@gbossu
Copy link
Collaborator

gbossu commented Nov 4, 2024

Could you summarize what this PR does? Maybe in the PR description. I'm particularly interested in:

  • When is the materialization triggered now?
  • Does this change depending on the region type?
  • Could you think of the case where the materialization is not triggered before moving on to another MBB?

@@ -616,34 +632,15 @@ void AIEPostRASchedStrategy::leaveRegion(const SUnit &ExitSU) {
assert(BS.getCurrentRegion().Bundles.empty());
BS.addBundles(TopBundles);
BS.addBundles(BotBundles);
if (!ReAssignMultiSlotInstr)
materializeMultiSlotInstrs();
Copy link
Collaborator

@gbossu gbossu Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep the materialization call where it used to be? Is there any reason why you now do it after computing the Bundles?

@@ -330,14 +331,19 @@ class PipelineExtractor : public PipelineScheduleVisitor {
// Prologue and epilogue obtain copies.
MachineInstr *ToBeEmitted =
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI);
CurrentBundle.add(ToBeEmitted);
if (auto AltDesc = AlternateDesc.getSelectedDescriptor(MI);
AltDesc.has_value())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just if (auto SelectedDesc = AlternateDesc.getSelectedDescriptor(MI))?

@@ -330,14 +331,19 @@ class PipelineExtractor : public PipelineScheduleVisitor {
// Prologue and epilogue obtain copies.
MachineInstr *ToBeEmitted =
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave a note that so far we force the prologue/epilogue copies to have the same descriptor as the "original" instruction in the steady state.

@@ -269,7 +273,8 @@ void AIEPostRASchedStrategy::initializeBotScoreBoard(ScoreboardTrust Trust) {
/// by starting in the earliest possible cycle, -Depth
auto InsertInCycle = [=](MachineInstr &MI, int Cycle) {
BotHazardRec->emitInScoreboard(
MI.getDesc(), BotHazardRec->getMemoryBanks(&MI), MI.operands(),
*BotHazardRec->getSelectedAltDescs().getDesc(&MI),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This BotHazardRec->emitInScoreboard(...) will be called with instructions that have already been materialized. So here I would actually assert that MI has no "selected descriptor"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants