-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit: Conditional Escape Analysis and Cloning #111473
Jit: Conditional Escape Analysis and Cloning #111473
Conversation
Enhance escape analysis to determine if an object escapes only under failed GDV tests. If so, clone to create a path of code so that the object doesn't escape, and then stack allocate the object. More details in the included document. Contributes to dotnet#108913
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@dotnet/jit-contrib PTAL Not sure who wants to sign up to review this. Let me know. SPMI is not going to be useful here. We can try MihuBot but that may not show much either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice writeup! I haven't looked at the code yet, though you should probably get someone else to look at it
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md
Outdated
Show resolved
Hide resolved
One of the libraries failures is that we're not updating the enclosing EH extents properly.
It should be possible to reorder these and put the clone after, but the EH lexicality constraints have been a consistent source of trouble. |
@EgorBo this is the PR I was mentioning earlier |
Hmm, still not getting the EH right. |
Failures on linux libraries tests to look into. |
/azp run runtime-coreclr pgo, runtime-coreclr pgostress, runtime-coreclr libraries-pgo, |
Azure Pipelines successfully started running 4 pipeline(s). |
Looking at outerloop failures
So more things to investigate |
Yes, I have a fix for this in #111684 |
/azp run runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
libraries-pgo testing still ragged... think the failures are unrelated but there are way too many to check. Will wait for CI issues to be sorted, merge up, and try again. |
/azp run runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
I think the remaining libraries-pgo failures are unrelated. @amanasifkhalid still seeing some inconsistent profiles after loop inversion. |
src/coreclr/jit/objectalloc.cpp
Outdated
unsigned const lclNum = tree->AsLclVarCommon()->GetLclNum(); | ||
GenTree* const tree = *use; | ||
unsigned const lclNum = tree->AsLclVarCommon()->GetLclNum(); | ||
LclVarDsc* const varDsc = m_compiler->lvaGetDesc(lclNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not.
|
||
for (unsigned p = 0; p < m_numPseudoLocals; p++) | ||
{ | ||
unsigned const pseudoLocal = p + comp->lvaCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the base analysis actually need to allocate an index for all locals in the first place? Shouldn't it be sufficient to allocate indices for only TYP_REF
, TYP_BYREF
, TYP_I_IMPL
and TYP_STRUCT
locals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is using bigger BVs than necessary. We would need to create some convenient dense numbering scheme. I can revisit this when I look at field-sensitive approaches perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably just reuse LclVarDsc::lvVarIndex
for the dense numbering scheme. Doing this later sounds good to me.
It also seems like there should be some kind of limit on the number of tracked locals here, since this analysis is inherently super-linear.
src/coreclr/jit/objectalloc.cpp
Outdated
return nullptr; | ||
} | ||
|
||
GenTree* const tree = jumpTree->AsOp()->gtOp1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (here and also below)
GenTree* const tree = jumpTree->AsOp()->gtOp1; | |
GenTree* const tree = jumpTree->gtGetOp1(); |
src/coreclr/jit/objectalloc.cpp
Outdated
} | ||
else | ||
{ | ||
JITDUMP("... no pseudo local?\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this would be more readable with some early outs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised
{ | ||
assert(comp->m_dfsTree != nullptr); | ||
assert(comp->m_domTree == nullptr); | ||
comp->m_domTree = FlowGraphDominatorTree::Build(comp->m_dfsTree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit to waiting until we actually want to use it to compute it? Or do we know at this point we are going to end up using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we pretty much know we're going to use it.
BitVecTraits traits(comp->compBasicBlockID, comp); | ||
BitVec visitedBlocks(BitVecOps::MakeEmpty(&traits)); | ||
toVisit.Push(allocBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this can use post order traits.
|
||
JITDUMP("walking through " FMT_BB "\n", visitBlock->bbNum); | ||
|
||
for (BasicBlock* const succ : visitBlock->Succs()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VisitRegularSuccs
would be better to use here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this actually use VisitAllSuccs
? It seems this needs to take into account any possible control flow path if it's trying to validate some form of post dominance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note below, since we're in an inlined GDV region, and we're validing all blocks are in the same EH region, any EH successor would bypass assignment and so would not flow into the cloned code we're going to create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
if (BitVecOps::IsMember(&traits, visitedBlocks, succ->bbID)) | ||
{ | ||
continue; | ||
} | ||
toVisit.Push(succ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add to visitedBlocks
here to avoid unnecessarily pushing the same block on toVisit
unnecessarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably revise this bit, but would like to hold off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok with me to do this later. There is another graph walk below that can use the same kind of enhancement.
} | ||
toVisit.Push(succ); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this loop missing some check for "sink" blocks like BBJ_RETURN
that cannot reach defBlock
? What would happen in those cases? Unnecessary cloning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose so.
But with the current hinting the only allocation sites we look at are those that are part of an inlinee on the fast side of a GDV, so with normal flow they can't really avoid reaching the bottom of the GDV diamond.
We also know the allocation site is not in a loop (via our lexical analysis) or we wouldn't consider it eligible in the first place. Might be nice to replace that with a DFS based check but I don't know if we have a general "this block is in a cycle" utility.
So the only other possibility for unexpected flow is some kind of exception. If the exception is handled within the fast GDV side we would expect it to assign some kind of return value, otherwise the GetEnumerator
we've inlined is defective. If it's not handled or the GetEnumerator
is defective then if the program tries to use the enumerator it should get a null ref (or whatever the reaching def is for the enumerator var).
So I don't think there is anything wasteful going on here, but if we generalize the analysis to consider other kinds of conditional allocations then we'll need to reconsider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks.
src/coreclr/jit/objectalloc.cpp
Outdated
auto latestStmt = [](Statement* stmt1, Statement* stmt2) { | ||
if (stmt1 == stmt2) | ||
{ | ||
return stmt1; | ||
} | ||
|
||
Statement* cursor1 = stmt1->GetNextStmt(); | ||
Statement* cursor2 = stmt2->GetNextStmt(); | ||
|
||
while (true) | ||
{ | ||
if ((cursor1 == stmt2) || (cursor2 == nullptr)) | ||
{ | ||
return stmt2; | ||
} | ||
|
||
if ((cursor2 == stmt1) || (cursor1 == nullptr)) | ||
{ | ||
return stmt1; | ||
} | ||
|
||
cursor1 = cursor1->GetNextStmt(); | ||
cursor2 = cursor2->GetNextStmt(); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this and the other versions into a Compiler
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did so.
src/coreclr/jit/objectalloc.cpp
Outdated
} | ||
} | ||
|
||
// todo: proper check for same block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No
Sorry I missed this; taking a look |
|
||
// Verify this appearance is under the same guard | ||
// | ||
if ((info.m_local == lclNum) && (pseudoGuardInfo->m_local == lclNum) && (info.m_type == pseudoGuardInfo->m_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the m_type == pseudoGuardInfo->m_type
check work correctly for NAOT? From what I can tell one is an actual compile time class handle from GenTreeAllocObj::gtAllocObjClsHnd
, while the other is the constant that we were pointing out is not a compile time handle above. Perhaps that one needs to be changed to use GenTreeIntCon::gtCompileTimeHandle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be -- I will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I pointed out that we should never cast IntCon (including VN) to CLASS_HANDLE and use a helper instead (feel free to rename it)🙂. I suspect it shouldn't be too hard to eliminate compile time handle for classes just like we don't need it in many other handle types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should use the compile time handle from the icon. It is currently hard to reach this opt in R2R/NAOT because it requires GDV and hence PGO.
I'll have a PR up in a bit with this and some of the other deferred fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for R2R/NAOT the priority may be not that high given that now we have late devirt inlining. It should already unblock many cases of stack allocating an enumerator, unless the MoveNext
is too large to inline.
* main: JIT: Set PGO data inconsistent when flow disappears in cast expansion (dotnet#112147) [H/3] Fix handling H3_NO_ERROR (dotnet#112125) Change some workflows using `pull_request` to use `pull_request_target` instead (dotnet#112161) Annotate ConfiguredCancelableAsyncEnumerable T with allows ref struct and update extensions (dotnet#111953) Delete copy of performance pipelines in previous location (dotnet#112113) Optimize BigInteger.Divide (dotnet#96895) Use current STJ in HostModel and remove unnecessary audit suppressions (dotnet#109852) JIT: Unify handling of InstParam argument during inlining (dotnet#112119) Remove unneeded DiagnosticSource content (dotnet#112116) Improve compare-and-branch sequences produced by Emitter (dotnet#111797) Jit: Conditional Escape Analysis and Cloning (dotnet#111473) Re-enable HKDF-SHA3 on Azure Linux Remove fstream usage from corehost (dotnet#111859)
Enhance escape analysis to determine if an object escapes only under failed GDV tests. If so, clone to create a path of code so that the object doesn't escape, and then stack allocate the object.
More details in the included document.
Contributes to #108913