Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: investigate regression caused by #138368 #139022

Open
tbg opened this issue Jan 14, 2025 · 2 comments · May be fixed by #139339
Open

rpc: investigate regression caused by #138368 #139022

tbg opened this issue Jan 14, 2025 · 2 comments · May be fixed by #139339
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months

Comments

@tbg
Copy link
Member

tbg commented Jan 14, 2025

See #138368 (comment).

The PR causes a >1% regression (in latency) in BenchmarkSysbench/oltp_read_write. A first cursory investigation did not yield why, and allocations seem identical. We want to understand and fix this regression.

Epic: CRDB-42584

Jira issue: CRDB-46476

@tbg tbg added branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months labels Jan 14, 2025
@tbg tbg self-assigned this Jan 14, 2025
@tbg tbg added the T-kv KV Team label Jan 14, 2025
@exalate-issue-sync exalate-issue-sync bot removed the T-kv KV Team label Jan 14, 2025
@RaduBerinde RaduBerinde self-assigned this Jan 15, 2025
@RaduBerinde
Copy link
Member

I tried inlining batchStreamImpl and doing away with the func and intermediary interfaces, I tried commenting out the drpc-related code in the dialer, and I could not reliably produce an improvement. I looked at profiles but couldn't find any of the changed code being responsible for anything signficant. It is possible that it's just due to importing a new library and having more code (and perhaps realigning certain hot code blocks w.r.t cachelines - something PGO in go1.23 is supposed to help with). I will try again looking at exactly the merge commit and the parent one and perhaps importing the change piece by piece.

Unrelated, but I did notice that according to the profile, shouldUseBatchStreamPoolClient takes about 0.5% and we should improve that:
Image

@RaduBerinde
Copy link
Member

It's hard to get accurate results since the difference is less than the usual standard deviation. But I think reverting all the client-side code gets part of it back (maybe 0.2-0.3%) - probably just a handful of otherwise insignificant changes. Even after reverting most server-side code I still see some regression. I will stop spending time on this now :)

@RaduBerinde RaduBerinde removed their assignment Jan 15, 2025
@tbg tbg linked a pull request Jan 17, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants