-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Delay metalizing of multi-slot until iterative scheduling is converged #182
base: aie-public
Are you sure you want to change the base?
Conversation
@martien-de-jong , @andcarminati. |
Hi @krishnamtibrewala, nice work! Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a test example where we see improvement?
46df116
to
54977ed
Compare
Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).
Could we have a test example where we see improvement?
Based on discussion with @gbossu we were expecting some impact but with current implementation QoR have no change. |
54977ed
to
12024f9
Compare
$wh10 = VMOV_mv_w $wl0 | ||
JNZ $r3, %bb.1 | ||
DelayedSchedBarrier | ||
bb.2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general advice, I think we should have a class to manage the description handling. We can encapsulate and use it in HazardRecognizer and InterBlockScheduling. |
98514f1
to
4033a60
Compare
4033a60
to
aaf0aa8
Compare
@@ -263,14 +266,22 @@ class PipelineExtractor : public PipelineScheduleVisitor { | |||
// Prologue and epilogue obtain copies. | |||
MachineInstr *ToBeEmitted = | |||
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should prefer not using CloneMachineInstr
, since this is not added to MBB and other structure like BlockState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, using CloneMachineInstr is essentially what the post pipeliner needs. When we call this extraction, everything in the schedule has been frozen. We can materialize the pseudos just before calling the extractor.
The original instructions reside in the loop block's region. We materialize them first, then cloning them will see the correct descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see a better way to emit an exact copy of an instruction into another basic block, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had discussed that offline with Krishnam. The post-pipeliner should only deal with "finalized" instructions when materializing the schedule, i.e. no multi-slot opcodes anywhere. So cloning is fine at this stage, because state like AlternateOpcodes
is outdated (and using it should be forbidden with asserts if possible).
aaf0aa8
to
904d379
Compare
904d379
to
49f3ad1
Compare
02d0b50
to
8bf356f
Compare
7a241b2
to
fdab2af
Compare
Could you summarize what this PR does? Maybe in the PR description. I'm particularly interested in:
|
fdab2af
to
74da8ee
Compare
@@ -616,34 +632,15 @@ void AIEPostRASchedStrategy::leaveRegion(const SUnit &ExitSU) { | |||
assert(BS.getCurrentRegion().Bundles.empty()); | |||
BS.addBundles(TopBundles); | |||
BS.addBundles(BotBundles); | |||
if (!ReAssignMultiSlotInstr) | |||
materializeMultiSlotInstrs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep the materialization call where it used to be? Is there any reason why you now do it after computing the Bundles?
@@ -330,14 +331,19 @@ class PipelineExtractor : public PipelineScheduleVisitor { | |||
// Prologue and epilogue obtain copies. | |||
MachineInstr *ToBeEmitted = | |||
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI); | |||
CurrentBundle.add(ToBeEmitted); | |||
if (auto AltDesc = AlternateDesc.getSelectedDescriptor(MI); | |||
AltDesc.has_value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: just if (auto SelectedDesc = AlternateDesc.getSelectedDescriptor(MI))
?
@@ -330,14 +331,19 @@ class PipelineExtractor : public PipelineScheduleVisitor { | |||
// Prologue and epilogue obtain copies. | |||
MachineInstr *ToBeEmitted = | |||
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave a note that so far we force the prologue/epilogue copies to have the same descriptor as the "original" instruction in the steady state.
@@ -269,7 +273,8 @@ void AIEPostRASchedStrategy::initializeBotScoreBoard(ScoreboardTrust Trust) { | |||
/// by starting in the earliest possible cycle, -Depth | |||
auto InsertInCycle = [=](MachineInstr &MI, int Cycle) { | |||
BotHazardRec->emitInScoreboard( | |||
MI.getDesc(), BotHazardRec->getMemoryBanks(&MI), MI.operands(), | |||
*BotHazardRec->getSelectedAltDescs().getDesc(&MI), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This BotHazardRec->emitInScoreboard(...)
will be called with instructions that have already been materialized. So here I would actually assert that MI
has no "selected descriptor"
This PR allows Multi-Slot Instr. to be used during iterative scheduling of loop.
Before this PR we were materializing Multi-Slot Instr. to selected OpCode/Slot after first iteration of iterative scheduling.
Now we wait until PostRA scheduling is converged to an acceptable schedule and we have decided to commit the schedule and move to next MBB.
Note : Given the information of what Alternate opcode/desc was selected is stored in Hazard Recognizer for a region.
And by the time we come to the end of MBB ( i.e leaveMBB() ) we do not have access to the instance of those Hazard Recognizer, therefore we need to make the Alternate opcode/desc part of the
BlockState