-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: improve TestConcurrentReadAndWrite
to verify read/write linerizability
#455
Conversation
concurrent_test.go
Outdated
return rs | ||
} | ||
|
||
func analyzeHistoryRecords(rs historyRecords) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a linearization validation as it doesn't account for operation execution time. It depends on Txid for order, making it very limited. For example it doesn't detect stale reads as View can just return old transation id, even though newer transaction was already committed.
I'm not sure what concurrency issues this approach is mean to find or can find any. Please let me know if you need help with integrating linearizability checker https://github.com/anishathalye/porcupine like in robustness tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on Txid is enough, because in bboltDB txid is strictly incremental. Any data written by a transaction should be visible to following reading transaction, otherwise it must be an issue. My understanding is that porcupine might generate false alarm, but in this test it will never generate false alarm unless it's a test bug. Also we are testing only local boltDB library instead of a distributed system, so I don't see any reason to use porcupine.
Note that there is no stale read, a read transaction should always see the same data during the transaction lifecycle, and it's exactly what repeatable reading means. Otherwise it must be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In summary, there are two points:
- A read transaction should always see the same data view during its lifecycle.
- Any data written by a writing transaction should be visible to any following reading transactions (with txid >= previous writing txid)
Also try to check whether it will result in data corruption under concurrent transaction operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are mistaking serializability and linearizability. https://jepsen.io/consistency
Txid gives you serializability, but linearizability is real time property, not bound to any counter.
My understanding is that porcupine might generate false alarm, but in this test it will never generate false alarm unless it's a test bug
No, depending on history porcupine might take a long time to validate it, causing test to timeout. However, assuming that tests is correctly written and operation history doesn't include a lot of failed request it should not be a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are mistaking serializability and linearizability.
Note that in bboltDB, only one writing transaction is allowed at a time. So in practice serializability is the same thing as linearizability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I am planning to add multiple writing transaction in followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename all *linearizable*
to *serializable*
when I start to support multiple writing transactions. thx for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not be opposed to testing just serializability, but I think you need to add some additional validation to so it makes sense.
Linearizability bounds function execution by client observed time. Serializability no longer cares about time, but cares about the order. Operations observed by client should be bound by order. Meaning client should observe increase in transaction ID.
I know that you do validation on transaction ID, but not on the order observed by each client. I would recommend adding client side validation that transaction ID is always growing for each client, before we sort it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend adding client side validation that transaction ID is always growing for each client, before we sort it.
Agreed. It's a good idea. Actually we also see such issue (txid isn't somehow strictly incremental). see #402 (comment)
Let me do it in a separate PR. This PR is already too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, I mistook Serializable vs Sequential. Still validating sequential
here would be useful and ok to be done in followup.
Addressed some your comments, and I will resolve other comments in followup PR, thx. @serathius PTAL cc @ptabor as well. |
…ablity Signed-off-by: Benjamin Wang <[email protected]>
…ion history locally firstly Signed-off-by: Benjamin Wang <[email protected]>
@ptabor @serathius PTAL, thx |
t.Errorf("The history records are not linearizable:\n %v", err) | ||
} | ||
|
||
saveDataIfFailed(t, db, records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer it to ensure it is executed even on fatal or panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If somehow panicking during test, then the records hasn't been collected yet. It needs big change. Let me think about it separately. thx
Signed-off-by: Benjamin Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of follow up, which is not to my liking.
Still, as whole the PR is an improvement. Please make sure to do followups as postponing work is a bad example for other contributors. We usually ask other contributors to do all cleanups in the PR.
Thank you! BoltDB test (to reproduce the data corruption issue) is my top priority. Some followup is already my planned work. |
Based on my previous experience, I am definitely not the blocker; instead, lack of active review is. |
Validate the linearizablity: each reading transaction should read the same value written by previous writing transaction on the same key.