Realm performance worse when threads are allocated correctly #1267

TimothyGu · 2022-05-23T18:48:24Z

The overall setting is the same as in #1266.

With -ll:ocpu 1 -ll:othr 9 -ll:util 1 on Sapling, the Realm runtime prints many warnings:

[0 - 7f3af9792d00]    0.000178 {4}{threads}: reservation ('dedicated worker (generic) #2') cannot be satisfied
[1 - 7f982b32dd00]    0.000174 {4}{threads}: reservation ('utility proc 1d00010000000000') cannot be satisfied
[2 - 7f5e4563bd00]    0.000217 {4}{threads}: reservation ('dedicated worker (generic) #2') cannot be satisfied
...

@rohany and I tried a few different ways to allow Realm to allocate threads correctly. Compare:

# othr=9 util=1 (implicitly, cpu=1) -- many "reservation cannot be satisfied" warnings
$ mpirun -H c0001:2,c0002:2,c0003:2,c0004:2 --bind-to socket \
    /scratch2/tigu/taco/distal/build/bin/chemTest-05-20 -n 99 -tblis -gx 4 -gy 2 \
    -ll:ocpu 1 -ll:othr 9 -ll:util 1 -ll:nsize 10G -ll:ncsize 0 \
    -lg:prof 8 -lg:prof_logfile prof99-socket-%.log.gz

# othr=8 util=1 (implicitly, cpu=1) -- no warnings
$ mpirun -H c0001:2,c0002:2,c0003:2,c0004:2 --bind-to socket \
    /scratch2/tigu/taco/distal/build/bin/chemTest-05-20 -n 99 -tblis -gx 4 -gy 2 \
    -ll:ocpu 1 -ll:othr 8 -ll:util 1 -ll:nsize 10G -ll:ncsize 0 \
    -lg:prof 8 -lg:prof_logfile prof99-socket-othr8-%.log.gz

# othr=9 util=1 cpu=0 -- no warnings
$ mpirun -H c0001:2,c0002:2,c0003:2,c0004:2 --bind-to socket \
    /scratch2/tigu/taco/distal/build/bin/chemTest-05-20 -n 99 -tblis -gx 4 -gy 2 \
    -ll:ocpu 1 -ll:othr 9 -ll:cpu 0 -ll:util 1 -ll:nsize 10G -ll:ncsize 0 \
    -lg:prof 8 -lg:prof_logfile prof99-socket-cpu0-%.log.gz

We confirmed through -ll:show_rsrv that the latter two commands reserve threads correctly. However, both of the latter two perform worse. Profiles are available:

/scratch2/tigu/taco/distal/build/prof99-socket-*.log.gz – http://sapling.stanford.edu/~tigu/prof99-socket
/scratch2/tigu/taco/distal/build/prof99-socket-othr8-*.log.gz – http://sapling.stanford.edu/~tigu/prof99-socket-othr8
/scratch2/tigu/taco/distal/build/prof99-socket-cpu0-*.log.gz – http://sapling.stanford.edu/~tigu/prof99-socket-cpu0

It makes sense at some level that -ll:othr 8 performs worse than -ll:othr 9: we are taking away a whole core from the computation. (Indeed, the leaf computation time takes a bit longer on average, 230ms vs 260ms.) But it's not clear to us why -ll:cpu 0 doesn't help (in fact hinders) the performance. From the original profile, the CPUs don't seem to be doing much work anyway, and the leaf computation time is around the same as the original (230ms), but the compute graph is a lot more ragged than without -ll:cpu 0.

The text was updated successfully, but these errors were encountered:

manopapad · 2022-05-28T04:28:20Z

My hunch here was that you're only leaving 1 core for all Legion/Realm meta-work, as explained in #1266 (comment).

However, @TimothyGu's results from #1266 (comment) are not supporting this. It might still be useful to look at profiles for the runs in this comment.

Also, since your experience has been that other OpenMP libraries are more well-behaved than TBLIS, it would be a good idea to verify that this behavior occurs with those others libraries as well.

Finally, it would be interesting to see what happens when you repeat these experiments:

othr=9 util=1 (implicitly, cpu=1)
othr=8 util=1 (implicitly, cpu=1)
othr=9 util=1 cpu=0

with REALM_SYNTHETIC_CORE_MAP="", which disables Realm's thread pinning.

manopapad mentioned this issue May 23, 2022

Rank-per-node slower than rank-per-socket with OpenMP workload #1266

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realm performance worse when threads are allocated correctly #1267

Realm performance worse when threads are allocated correctly #1267

TimothyGu commented May 23, 2022 •

edited

Loading

manopapad commented May 28, 2022

Realm performance worse when threads are allocated correctly #1267

Realm performance worse when threads are allocated correctly #1267

Comments

TimothyGu commented May 23, 2022 • edited Loading

manopapad commented May 28, 2022

TimothyGu commented May 23, 2022 •

edited

Loading