Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Relax unrolling for structs with gc refs #112227

Merged
merged 3 commits into from
Feb 6, 2025

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 6, 2025

Currently, when we need to copy a block with GC references (from anywhere to non-heap), we can actually use SIMD if we put the whole copy part under no-GC - this is valid because nobody else is expected to read it from stack in parallel (it's UB). The existing code is a bit conservative in that regard, it only allows blocks <= 64 bytes (and avoids SIMD in favor of slow rep movsq or bulk write barrier).

This is primary needed to address possible performance regressions #112060 may introduce, but it should be a goodness regardless whether that PR lands or not.

NOTE: arm64 has a similar logic already.

Example (I use ref structs here, but it works for regular structs as well):

void Foo(ref MyStruct a, ref MyStruct b)
{
    a = b;
}

[InlineArray(16)]
ref struct MyStruct {
    public string _element0;
}

Main:

; Assembly listing for method Benchmarks:Foo(byref,byref):this (FullOpts)
       sub      rsp, 40
       vzeroupper 
       cmp      byte  ptr [rdx], dl
       mov      rcx, rdx
       cmp      byte  ptr [r8], r8b
       mov      rdx, r8
       mov      r8d, 128
       call     [CORINFO_HELP_BULK_WRITEBARRIER] ;; JIT knows it doesn't need WB, but uses for large sizes
       nop      
       add      rsp, 40
       ret      
; Total bytes of code 36

PR:

; Assembly listing for method Benchmarks:Foo(byref,byref):this (FullOpts)
G_M54163_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
G_M54163_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0104 {rdx r8}, byref, nogc
                            ; byrRegs +[rdx r8]
       vmovdqu32 zmm0, zmmword ptr [r8]
       vmovdqu32 zmmword ptr [rdx], zmm0
       vmovdqu32 zmm0, zmmword ptr [r8+0x40]
       vmovdqu32 zmmword ptr [rdx+0x40], zmm0
G_M54163_IG03:        ; bbWeight=1, epilog, nogc, extend
       vzeroupper 
       ret      
; Total bytes of code 30

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 6, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 6, 2025

@MihuBot

@EgorBo
Copy link
Member Author

EgorBo commented Feb 6, 2025

PTAL @jakobbotsch @dotnet/jit-contrib Diffs. We either replace rep movsq (super slow) or bulk barrier (slow) with an unrolled SIMD. This is needed for #112060 because due to the conservative behavior we could've ended with double bulk barrier calls

@EgorBo EgorBo requested a review from jakobbotsch February 6, 2025 12:02
@EgorBo EgorBo merged commit 666bb9d into dotnet:main Feb 6, 2025
109 of 112 checks passed
@EgorBo EgorBo deleted the fix-unrolling-for-gc-structs branch February 6, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants