Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit: Conditional Escape Analysis and Cloning #111473

Merged
merged 27 commits into from
Feb 4, 2025

Conversation

AndyAyersMS
Copy link
Member

Enhance escape analysis to determine if an object escapes only under failed GDV tests. If so, clone to create a path of code so that the object doesn't escape, and then stack allocate the object.

More details in the included document.

Contributes to #108913

Enhance escape analysis to determine if an object escapes only under
failed GDV tests. If so, clone to create a path of code so that
the object doesn't escape, and then stack allocate the object.

More details in the included document.

Contributes to dotnet#108913
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 15, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jan 15, 2025

@dotnet/jit-contrib PTAL

Not sure who wants to sign up to review this. Let me know.

SPMI is not going to be useful here. We can try MihuBot but that may not show much either.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice writeup! I haven't looked at the code yet, though you should probably get someone else to look at it

@AndyAyersMS
Copy link
Member Author

One of the libraries failures is that we're not updating the enclosing EH extents properly.

Assertion failed 'found && "BBJ_EHFINALLYRET predecessor of block that doesn't follow a BBJ_CALLFINALLY!"' in 'System.IO.File:InternalWriteAllLines(System.IO.StreamWriter,System.Collections.Generic.IEnumerable`1[System.String])' during 'Allocate Objects' (IL size 84; hash 0xfef048f0; Tier1)

fgCloneTryRegion assumes the cloned try will be lexically after the original, so any EH extent that ended at the original now ends at the clone. But here we're often putting the clone before the original, and so we get the enclosing extent wrong.

It should be possible to reorder these and put the clone after, but the EH lexicality constraints have been a consistent source of trouble.

@AndyAyersMS
Copy link
Member Author

@EgorBo this is the PR I was mentioning earlier

@AndyAyersMS
Copy link
Member Author

Hmm, still not getting the EH right.

@AndyAyersMS
Copy link
Member Author

Failures on linux libraries tests to look into.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr pgo, runtime-coreclr pgostress, runtime-coreclr libraries-pgo,
runtime-coreclr jitstress

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@AndyAyersMS
Copy link
Member Author

Looking at outerloop failures

So more things to investigate

@amanasifkhalid
Copy link
Member

inconsistent profile data at Invert Loops (@amanasifkhalid are these known?) for System.Text.Json.Tests and System.Linq.Tests (possibly related)

Yes, I have a fix for this in #111684

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

libraries-pgo testing still ragged... think the failures are unrelated but there are way too many to check. Will wait for CI issues to be sorted, merge up, and try again.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

I think the remaining libraries-pgo failures are unrelated.

@amanasifkhalid still seeing some inconsistent profiles after loop inversion.

unsigned const lclNum = tree->AsLclVarCommon()->GetLclNum();
GenTree* const tree = *use;
unsigned const lclNum = tree->AsLclVarCommon()->GetLclNum();
LclVarDsc* const varDsc = m_compiler->lvaGetDesc(lclNum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not.


for (unsigned p = 0; p < m_numPseudoLocals; p++)
{
unsigned const pseudoLocal = p + comp->lvaCount;
Copy link
Member

@jakobbotsch jakobbotsch Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the base analysis actually need to allocate an index for all locals in the first place? Shouldn't it be sufficient to allocate indices for only TYP_REF, TYP_BYREF, TYP_I_IMPL and TYP_STRUCT locals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is using bigger BVs than necessary. We would need to create some convenient dense numbering scheme. I can revisit this when I look at field-sensitive approaches perhaps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably just reuse LclVarDsc::lvVarIndex for the dense numbering scheme. Doing this later sounds good to me.

It also seems like there should be some kind of limit on the number of tracked locals here, since this analysis is inherently super-linear.

return nullptr;
}

GenTree* const tree = jumpTree->AsOp()->gtOp1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (here and also below)

Suggested change
GenTree* const tree = jumpTree->AsOp()->gtOp1;
GenTree* const tree = jumpTree->gtGetOp1();

}
else
{
JITDUMP("... no pseudo local?\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this would be more readable with some early outs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

{
assert(comp->m_dfsTree != nullptr);
assert(comp->m_domTree == nullptr);
comp->m_domTree = FlowGraphDominatorTree::Build(comp->m_dfsTree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to waiting until we actually want to use it to compute it? Or do we know at this point we are going to end up using it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we pretty much know we're going to use it.

Comment on lines +2429 to +2431
BitVecTraits traits(comp->compBasicBlockID, comp);
BitVec visitedBlocks(BitVecOps::MakeEmpty(&traits));
toVisit.Push(allocBlock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this can use post order traits.


JITDUMP("walking through " FMT_BB "\n", visitBlock->bbNum);

for (BasicBlock* const succ : visitBlock->Succs())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VisitRegularSuccs would be better to use here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually use VisitAllSuccs? It seems this needs to take into account any possible control flow path if it's trying to validate some form of post dominance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note below, since we're in an inlined GDV region, and we're validing all blocks are in the same EH region, any EH successor would bypass assignment and so would not flow into the cloned code we're going to create.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Comment on lines +2465 to +2469
if (BitVecOps::IsMember(&traits, visitedBlocks, succ->bbID))
{
continue;
}
toVisit.Push(succ);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add to visitedBlocks here to avoid unnecessarily pushing the same block on toVisit unnecessarily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably revise this bit, but would like to hold off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with me to do this later. There is another graph walk below that can use the same kind of enhancement.

}
toVisit.Push(succ);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this loop missing some check for "sink" blocks like BBJ_RETURN that cannot reach defBlock? What would happen in those cases? Unnecessary cloning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so.

But with the current hinting the only allocation sites we look at are those that are part of an inlinee on the fast side of a GDV, so with normal flow they can't really avoid reaching the bottom of the GDV diamond.

We also know the allocation site is not in a loop (via our lexical analysis) or we wouldn't consider it eligible in the first place. Might be nice to replace that with a DFS based check but I don't know if we have a general "this block is in a cycle" utility.

So the only other possibility for unexpected flow is some kind of exception. If the exception is handled within the fast GDV side we would expect it to assign some kind of return value, otherwise the GetEnumerator we've inlined is defective. If it's not handled or the GetEnumerator is defective then if the program tries to use the enumerator it should get a null ref (or whatever the reaching def is for the enumerator var).

So I don't think there is anything wasteful going on here, but if we generalize the analysis to consider other kinds of conditional allocations then we'll need to reconsider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.

Comment on lines 2491 to 2515
auto latestStmt = [](Statement* stmt1, Statement* stmt2) {
if (stmt1 == stmt2)
{
return stmt1;
}

Statement* cursor1 = stmt1->GetNextStmt();
Statement* cursor2 = stmt2->GetNextStmt();

while (true)
{
if ((cursor1 == stmt2) || (cursor2 == nullptr))
{
return stmt2;
}

if ((cursor2 == stmt1) || (cursor1 == nullptr))
{
return stmt1;
}

cursor1 = cursor1->GetNextStmt();
cursor2 = cursor2->GetNextStmt();
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this and the other versions into a Compiler method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did so.

}
}

// todo: proper check for same block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

@amanasifkhalid
Copy link
Member

@amanasifkhalid still seeing some inconsistent profiles after loop inversion.

Sorry I missed this; taking a look

@AndyAyersMS AndyAyersMS merged commit f6c74b8 into dotnet:main Feb 4, 2025
114 checks passed

// Verify this appearance is under the same guard
//
if ((info.m_local == lclNum) && (pseudoGuardInfo->m_local == lclNum) && (info.m_type == pseudoGuardInfo->m_type))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the m_type == pseudoGuardInfo->m_type check work correctly for NAOT? From what I can tell one is an actual compile time class handle from GenTreeAllocObj::gtAllocObjClsHnd, while the other is the constant that we were pointing out is not a compile time handle above. Perhaps that one needs to be changed to use GenTreeIntCon::gtCompileTimeHandle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be -- I will take a look.

Copy link
Member

@EgorBo EgorBo Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I pointed out that we should never cast IntCon (including VN) to CLASS_HANDLE and use a helper instead (feel free to rename it)🙂. I suspect it shouldn't be too hard to eliminate compile time handle for classes just like we don't need it in many other handle types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should use the compile time handle from the icon. It is currently hard to reach this opt in R2R/NAOT because it requires GDV and hence PGO.

I'll have a PR up in a bit with this and some of the other deferred fixes.

Copy link
Contributor

@hez2010 hez2010 Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for R2R/NAOT the priority may be not that high given that now we have late devirt inlining. It should already unblock many cases of stack allocating an enumerator, unless the MoveNext is too large to inline.

grendello added a commit to grendello/runtime that referenced this pull request Feb 5, 2025
* main:
  JIT: Set PGO data inconsistent when flow disappears in cast expansion (dotnet#112147)
  [H/3] Fix handling H3_NO_ERROR (dotnet#112125)
  Change some workflows using `pull_request` to use `pull_request_target` instead (dotnet#112161)
  Annotate ConfiguredCancelableAsyncEnumerable T with allows ref struct and update extensions (dotnet#111953)
  Delete copy of performance pipelines in previous location (dotnet#112113)
  Optimize BigInteger.Divide (dotnet#96895)
  Use current STJ in HostModel and remove unnecessary audit suppressions (dotnet#109852)
  JIT: Unify handling of InstParam argument during inlining (dotnet#112119)
  Remove unneeded DiagnosticSource content (dotnet#112116)
  Improve compare-and-branch sequences produced by Emitter (dotnet#111797)
  Jit: Conditional Escape Analysis and Cloning (dotnet#111473)
  Re-enable HKDF-SHA3 on Azure Linux
  Remove fstream usage from corehost (dotnet#111859)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants