Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve batch performance #303

Merged
merged 2 commits into from
Apr 5, 2024
Merged

perf: improve batch performance #303

merged 2 commits into from
Apr 5, 2024

Conversation

Sora233
Copy link
Contributor

@Sora233 Sora233 commented Mar 22, 2024

For now, Batch.pendingWrites prevents for large batch put.
Actually, Put in batch is O(N^2).
(Maybe not only Put, I havn't check other operations yet.)

rosedb/batch.go

Lines 121 to 126 in a776163

for i := len(b.pendingWrites) - 1; i >= 0; i-- {
if bytes.Equal(key, b.pendingWrites[i].Key) {
record = b.pendingWrites[i]
break
}
}

try example:

package main

import (
	"fmt"
	"github.com/rosedblabs/rosedb/v2"
	"strconv"
	"time"
)

func main() {
	opts := rosedb.DefaultOptions
	db, err := rosedb.Open(opts)
	if err != nil {
		return
	}
	if err != nil {
		panic(err)
	}
	defer os.RemoveAll(opts.DirPath)
	defer db.Close()

	batch := db.NewBatch(rosedb.DefaultBatchOptions)

	start := time.Now()

	for i := 0; i < 1e5; i++ {
		batch.Put([]byte(strconv.Itoa(i)), []byte("xxx"))
	}
	err = batch.Commit()
	if err != nil {
		panic(err)
	}
	fmt.Println(time.Now().Sub(start))
}

It shows 22sec+ on my pc.

Solution:

I add a pendingWritesMap to help finding out the key in pendingWrites.
I also add a benchmark for batch.

Before:

goos: linux
goarch: amd64
pkg: github.com/rosedblabs/rosedb/v2/benchmark
cpu: AMD Ryzen 7 5800X3D 8-Core Processor
BenchmarkBatchPutGet
BenchmarkBatchPutGet/batchPut
BenchmarkBatchPutGet/batchPut-16                   24601             86200 ns/o
  11499 B/op          12 allocs/op
BenchmarkBatchPutGet/batchGet
BenchmarkBatchPutGet/batchGet-16                 1789173               658.6 ns/
op           135 B/op          4 allocs/op

After:

goos: linux
goarch: amd64
pkg: github.com/rosedblabs/rosedb/v2/benchmark
cpu: AMD Ryzen 7 5800X3D 8-Core Processor
BenchmarkBatchPutGet
BenchmarkBatchPutGet/batchPut
BenchmarkBatchPutGet/batchPut-16                   64142             19151 ns/o
  11317 B/op          13 allocs/op
BenchmarkBatchPutGet/batchGet
BenchmarkBatchPutGet/batchGet-16                 1770322               677.3 ns/
op           135 B/op          4 allocs/op

@roseduan
Copy link
Collaborator

Thanks, but I do not know whether it is necessary, even though it has a little performance improvement, it also takes more memory to hold the pending map.

@Sora233
Copy link
Contributor Author

Sora233 commented Mar 23, 2024

If increase 1e5 to 1e7 in example, the program likely runs for more than one hour.
Maybe a better trade-off is only using the map for more than 100(or any reasonable number) pendings.

@roseduan
Copy link
Collaborator

If increase 1e5 to 1e7 in example, the program likely runs for more than one hour. Maybe a better trade-off is only using the map for more than 100(or any reasonable number) pendings.

I have rethought the issue, we can hold the map to improve performance, but we do not need to record the key in the map.
Because if the key size is large, it may have more memory cost.

So we can record the key hash in the map, like map[hash(key)][index], the hash function is not necessary to be persistent, because we only need it when the batch exists, if the batch commit or rollback, it will no longer use.
So we can use a memory hash algorithm for this, you can see the usage in badger: https://github.com/dgraph-io/badger/blob/main/txn.go#L396

@Sora233
Copy link
Contributor Author

Sora233 commented Apr 2, 2024

So we can use a memory hash algorithm for this, you can see the usage in badger: https://github.com/dgraph-io/badger/blob/main/txn.go#L396

I have implemented this hash mechanism by looking into the code you provided.

@roseduan roseduan merged commit 78f11b1 into rosedblabs:main Apr 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants