-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition false positive (?) (seq_cst fences involved). #26
Comments
Hi @RafaelGago1, I don't remember that standard requirement, it's possible that it was added after Relacy was implemented. |
So it seems confirmed that this is a false positive. The excerpt seems to be present on this C++11 draft. 29.3 paragraph 4. EDIT: I have confirmed that this is also present om 29.4 paragraph 4 on the final C++11 document. (ISO/IEC 14882:2011(E)). Notice the next paragraphs make it bidirectional, so any store before the fence on that thread is also "promoted" to "seq_cst" at the point the fence is sequenced. This example is also using this for "promoting" the precedent "release" store on "data_read_seq_" to "seq_cst".
The wording here implies implies that this is memory order independent. If you remember something about this codebase, is this an easy change or a big refactor? Unrelated, but thanks for pointing out this: It turns out that that code can be modified to invoke the callbacks/tasks with the underlying nodes in quiescent state if instead of retaining the node's memory (to avoid leaving the list empty) a stub node is used (as in your MPSC intrusive queue). That way the nodes can reclaim the memory inside the task's callback itself. That for the price of an additional XCHG per combined batch (impl hint, there is one case where one node's execution has to be delegated from a combiner to an already-detected next combiner to avoid leaving the list empty). It also turns out that it can be signaled to the callback when the node seems to be the last of a batch of combined nodes. So if you didn't see it coming, that's exactly the combining mechanism I'm using for this left right implementation instead of a mutex or a combining array. That modified combiner lock based on the algorithm you pointed, can flip the instance when the bool of last batch is set on the callback, and even the nodes of the combiner are reused to apply the modification at the mirrored instance of the data structure protected by left right. Fits like a glove :) EDIT3: grammar |
Today I had some spare time to look at this. I very soon realized that this wasn't a 1 hour modification and I'd probably be unable to pull it off without help, as for what I have seen this isn't just a single localized change. What I found is that there were two "#defines" available in the codebase related to seq_cst: "RL_IMPROVED_SEQ_CST_FENCE" and "RL_IMPROVED_SEQ_CST_RMW". Both of these are enabled for Java. With those enabled the sample program works as expected. One thing pretty suspicious is that "RL_IMPROVED_SEQ_CST_RMW" throws a full "seq_cst" fence after each RMW operation. Seems too heavy-handed for C++, as RMW on a single variable should be weaker than a seq_cst fence. |
Unfortunately I don't remember much about the code now. Isn't this requirement what makes Peterson algorithm with seq_cst fence work? As far as I remember that test worked. And I would assume we can even remove |
Well, I'm on a 3 week vacation without computer and today I had a bad night sleep, so I haven't a lot of brain cycles to spare or even an environment suitable for thinking. What I can see (and today I see very little) is that the linked Peterson example probably doesn't have the requirement for a relaxed load after a seq_cst fence seeing at least a seq_cst (rmw) store before the fence or newer. I think that a test for this exact case can be as simple as having 2 threads, where one thread does one relaxed store, a rmw seq_cst and a final third relaxed store, (e.g. 1, 2, 3) then when it is ensured that the 3 stores have been sequenced, on a different thread run a seq_cst fence + relaxed load and Assert that the load saw the value of the rmw or later (2 or 3). Just one variable involved, one property tested and doesn't require thinking about Peterson on a cognitively bad day :) Edit: for what I read on the sources it seem that the number of history values on each variable is three, so the test above should fail. |
Then an opposite test for the stores, so a relaxed store followed by a seq_cst fence is always seen by a seq_cst load on another thread as long as the fence is sequenced. I instinctively think that this part is already working on the Petersson algorithm. With these, 2 of the paragraphs on the standard for seq_cst fences would be covered. For the relaxed + fence + relaxed interactions I think there were changes on C++20, but we use older than that at daily job so I haven't cared. Edit c++20 posal that I think went through: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0668r5.html |
0001-Add-test-for-relaxed-load-after-a-seq_cst-fence.zip |
In the test B is not seq_cst, so it's not in S:
Does it still fail if the load is seq_cst? Also, is there a similar statement in later standard revisions? C++11 was refined and fixed a lot. It makes little sense for Relacy to comply to C++11 at this point (e.g. compilers won't comply). |
Sorry. I copied the wrong excerpt to the code comment. On this thread I had it correct. This is what is tested: "For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced before B, then B observes either the last memory_order_seq_cst modification of M preceding X in the total order S or a later modification of M in its modification order." |
I'm not an expert in the standard and it has been a while, but as far as I remember from reading third parties, later revisions only strengthen seq_cst fences, so the seq_cst fence on newer standards make stores before the fence being visible for every load after the fence, no matter the memory order. In the test case everything could be relaxed and a value of 2 or later would be guaranteed. On C++11 the writes have to be seq_cst. I remember reading something about Itanium, but the details are fuzzy. As I use C++17 and the wording was changed a lot, I still haven't made the effort to grasp the changes. EDIT: Found the proposal https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html |
I have suspicions that relacy is issuing a false positive.
I'm doing a basic textbook implementation of left right.
https://hal.science/hal-01207881/document
Here is a minimum viable implementation of left right for relacy.
https://gist.github.com/RafaelGago1/4a75e25dd29e0881016dc3db19f21aa2
There are two implementations toggled by the "USE_SEQ_CST_FENCE" define: one that that uses multiple seq_cst loads and stores on modification and another that uses a release store, a seq_cst fence and an acquire fence. The goal of the second is to make the refcount distributed to don't kill reader scalability.
The first one is the canonical implementation and it's obviously correct. The 2nd one is being detected as a race in relacy when I think it shouldn't. I have detailed comments on the gist above about why I think it should work. For reproducing the error a single counter per side is enough, just changing the code to use fences reproduces the false positive.
This is the simplified log, with the variables annotated:
`thread 0:
[20] 0: <data_read_seq> atomic load, value=0, order=relaxed, in Modify, left_right.cpp:48
[21] 0: <value[1]> load, value=0, in operator(), left_right.cpp:147
[22] 0: <value[1]> store, value=1, in operator(), left_right.cpp:147
[23] 0: <data_read_seq> atomic store, value=1, (prev value=0), order=release, in Modify, left_right.cpp:57
[26] 0: seq_cst fence, in Modify, left_right.cpp:65
[27] 0: <refcounter[1]> atomic load, value=0, order=relaxed, in RefCountIsZero, left_right.cpp:87
[28] 0: <refcount_seq> atomic store, value=1, (prev value=0), order=relaxed, in Modify, left_right.cpp:70
[29] 0: <refcounter[0]> atomic load, value=0 [NOT CURRENT], order=relaxed, in RefCountIsZero, left_right.cpp:87
[30] 0: acquire fence, in Modify, left_right.cpp:75
[31] 0: <value[0]> load, value=0, in operator(), left_right.cpp:147
[32] 0: <value[0]> store, value=0, in operator(), left_right.cpp:147
[33] 0: DATA RACE (data race detected), in operator(), left_right.cpp:147
thread 1:
[16] 1: <data_read_seq> atomic load, value=0, order=relaxed, in Read, left_right.cpp:36
[17] 1: <refcounter[0]> fetch_add , prev=0, arg=1, new=1, order=seq_cst, in Read, left_right.cpp:38
[18] 1: <data_read_seq> atomic load, value=0, order=seq_cst, in Read, left_right.cpp:39
[19] 1: <value[0]> load, value=0, in operator(), left_right.cpp:149
[24] 1: <refcounter[0]> fetch_sub , prev=1, arg=1, new=0, order=release, in Read, left_right.cpp:41
[25] 1: [THREAD FINISHED], in fiber_proc_impl, relacy/relacy/context.hpp:457`
The issue here is that on step 29 it's making a [NOT_CURRENT] load on "refcounter[0]" returning 0. I think that a [NOT CURRENT] load of 0 is not possible at that point. At that point if it returns 0 it should be the current value.
If I'm correct, the oldest possible load value at step 29 for "refcounter[0]" is the value that it had on its last "seq_cst" store before the "seq_cst" fence (step 26), that would be step 17, the "seq_cst" "fetch_add", hence 1. So it either should see "1" [NOT CURRENT] or 0 current.
Standard quote:
For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced before B, then B observes either the last memory_order_seq_cst modification of M preceding X in the total order S or a later modification of M in its modification order.
This is from an old standard version, but I think that newer only strengthen the wording.
The text was updated successfully, but these errors were encountered: