Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc,server: investigate regression caused by #138368 #139339

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tbg
Copy link
Member

@tbg tbg commented Jan 17, 2025

Branch here.

We observed that #138368 reliably lowers performance by ~1% as measured by time/op in BenchmarkSysbench/SQL/3node/oltp_read_write.

This commit (server: give Node.BatchStream its own impl) not only claws back this 1% regression, it actually achieves another 1% improvement on top (when comparing "pr plus my commit" vs "master before pr"). Remarkably, the code change in this commit is a partial revert of a refactor introduced in the PR, and since nothing in the PR should've made performance better (after all, the PR introduced another setting to check, and the setting is off), it's surprising that the commit actually gains 1.73% vs master:

old:  f9df57e Merge #137984
new:  826b922 server: give Node.BatchStream its own impl
args: benchdiff "--old" "lastmerge" "./pkg/sql/tests" "-b" "-r" "Sysbench/SQL/3node/oltp_read_write" "-d" "1000x" "-c" "20"
name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-24    11.7ms ± 1%    11.5ms ± 1%  -1.73%  (p=0.000 n=19+20)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-24    2.20MB ± 3%    2.19MB ± 3%    ~     (p=0.968 n=20+20)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-24     10.9k ± 2%     10.9k ± 2%    ~     (p=0.703 n=20+20)

To circle in on this more, I wanted to see if pr+commit vs pr produces a 2% speed-up. I didn't even have to run this for very long, it's pretty clear that it does (p-value is basically zero):

old:  b0ea57c server: fix drpc end-to-end test
new:  826b922 server: give Node.BatchStream its own impl
args: benchdiff "--old" "HEAD~" "./pkg/sql/tests" "-b" "-r" "Sysbench/SQL/3node/oltp_read_write" "-d" "1000x" "-c" "20"

running benchmarks:
 name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_write-24    11.9ms ± 1%    11.6ms ± 1%  -2.33%  (p=0.000 n=8+8)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_write-24    2.17MB ± 4%    2.19MB ± 2%    ~     (p=0.959 n=8+8)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_write-24     10.8k ± 2%     10.8k ± 2%    ~     (p=0.627 n=8+8)

I don't know what to make of this yet, but am reminded of this segment of "Performance Matters" by Emery Berger.

I'm nearing the end of my day, but leaving this here as food for thought for now.

Closes #139022.

avoid calling batchStreamImpl due to suspected overhead

gg# Commit message recommendation:
Copy link

blathers-crl bot commented Jan 17, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I was not able to reproduce the gains with a similar change, but this looks better regardless.

I'd refactor the other caller of batchStreamImpl as well since there's only one, no need for all the interface stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rpc: investigate regression caused by #138368
3 participants