Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition false positive (?) (seq_cst fences involved). #26

Open
RafaelGago1 opened this issue Dec 19, 2024 · 10 comments
Open

Race condition false positive (?) (seq_cst fences involved). #26

RafaelGago1 opened this issue Dec 19, 2024 · 10 comments

Comments

@RafaelGago1
Copy link

RafaelGago1 commented Dec 19, 2024

I have suspicions that relacy is issuing a false positive.

I'm doing a basic textbook implementation of left right.
https://hal.science/hal-01207881/document

Here is a minimum viable implementation of left right for relacy.
https://gist.github.com/RafaelGago1/4a75e25dd29e0881016dc3db19f21aa2

There are two implementations toggled by the "USE_SEQ_CST_FENCE" define: one that that uses multiple seq_cst loads and stores on modification and another that uses a release store, a seq_cst fence and an acquire fence. The goal of the second is to make the refcount distributed to don't kill reader scalability.

The first one is the canonical implementation and it's obviously correct. The 2nd one is being detected as a race in relacy when I think it shouldn't. I have detailed comments on the gist above about why I think it should work. For reproducing the error a single counter per side is enough, just changing the code to use fences reproduces the false positive.

This is the simplified log, with the variables annotated:

`thread 0:
[20] 0: <data_read_seq> atomic load, value=0, order=relaxed, in Modify, left_right.cpp:48
[21] 0: <value[1]> load, value=0, in operator(), left_right.cpp:147
[22] 0: <value[1]> store, value=1, in operator(), left_right.cpp:147
[23] 0: <data_read_seq> atomic store, value=1, (prev value=0), order=release, in Modify, left_right.cpp:57
[26] 0: seq_cst fence, in Modify, left_right.cpp:65
[27] 0: <refcounter[1]> atomic load, value=0, order=relaxed, in RefCountIsZero, left_right.cpp:87
[28] 0: <refcount_seq> atomic store, value=1, (prev value=0), order=relaxed, in Modify, left_right.cpp:70
[29] 0: <refcounter[0]> atomic load, value=0 [NOT CURRENT], order=relaxed, in RefCountIsZero, left_right.cpp:87
[30] 0: acquire fence, in Modify, left_right.cpp:75
[31] 0: <value[0]> load, value=0, in operator(), left_right.cpp:147
[32] 0: <value[0]> store, value=0, in operator(), left_right.cpp:147
[33] 0: DATA RACE (data race detected), in operator(), left_right.cpp:147

thread 1:
[16] 1: <data_read_seq> atomic load, value=0, order=relaxed, in Read, left_right.cpp:36
[17] 1: <refcounter[0]> fetch_add , prev=0, arg=1, new=1, order=seq_cst, in Read, left_right.cpp:38
[18] 1: <data_read_seq> atomic load, value=0, order=seq_cst, in Read, left_right.cpp:39
[19] 1: <value[0]> load, value=0, in operator(), left_right.cpp:149
[24] 1: <refcounter[0]> fetch_sub , prev=1, arg=1, new=0, order=release, in Read, left_right.cpp:41
[25] 1: [THREAD FINISHED], in fiber_proc_impl, relacy/relacy/context.hpp:457`

The issue here is that on step 29 it's making a [NOT_CURRENT] load on "refcounter[0]" returning 0. I think that a [NOT CURRENT] load of 0 is not possible at that point. At that point if it returns 0 it should be the current value.

If I'm correct, the oldest possible load value at step 29 for "refcounter[0]" is the value that it had on its last "seq_cst" store before the "seq_cst" fence (step 26), that would be step 17, the "seq_cst" "fetch_add", hence 1. So it either should see "1" [NOT CURRENT] or 0 current.

Standard quote:
For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced before B, then B observes either the last memory_order_seq_cst modification of M preceding X in the total order S or a later modification of M in its modification order.

This is from an old standard version, but I think that newer only strengthen the wording.

@dvyukov
Copy link
Owner

dvyukov commented Dec 19, 2024

Hi @RafaelGago1,

I don't remember that standard requirement, it's possible that it was added after Relacy was implemented.
Original semantics of seq_cst fences were rather weak.

@RafaelGago1 RafaelGago1 changed the title Is this a race condition false positive? (seq_cst fences involved). Race condition false positive (?) (seq_cst fences involved). Dec 19, 2024
@RafaelGago1
Copy link
Author

RafaelGago1 commented Dec 19, 2024

So it seems confirmed that this is a false positive.

The excerpt seems to be present on this C++11 draft. 29.3 paragraph 4.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf
Web format
https://timsong-cpp.github.io/cppwp/n3337/atomics.order

EDIT: I have confirmed that this is also present om 29.4 paragraph 4 on the final C++11 document. (ISO/IEC 14882:2011(E)).
EDIT2: If not perfect, it's still very impressive what you managed here that early :) I have been using it for years and this is extremely useful still in 2024. Without it I wouldn't have managed to do some algorithms and at the same time learn.

Notice the next paragraphs make it bidirectional, so any store before the fence on that thread is also "promoted" to "seq_cst" at the point the fence is sequenced. This example is also using this for "promoting" the precedent "release" store on "data_read_seq_" to "seq_cst".

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there is a memory_order_seq_cst fence X such that A is sequenced before X and B follows X in S, then B observes either the effects of A or a later modification of M in its modification order.

The wording here implies implies that this is memory order independent.

If you remember something about this codebase, is this an easy change or a big refactor?

Unrelated, but thanks for pointing out this:
https://groups.google.com/g/lock-free/c/EDOXJTWYqGY

It turns out that that code can be modified to invoke the callbacks/tasks with the underlying nodes in quiescent state if instead of retaining the node's memory (to avoid leaving the list empty) a stub node is used (as in your MPSC intrusive queue). That way the nodes can reclaim the memory inside the task's callback itself.

That for the price of an additional XCHG per combined batch (impl hint, there is one case where one node's execution has to be delegated from a combiner to an already-detected next combiner to avoid leaving the list empty).

It also turns out that it can be signaled to the callback when the node seems to be the last of a batch of combined nodes.

So if you didn't see it coming, that's exactly the combining mechanism I'm using for this left right implementation instead of a mutex or a combining array.

That modified combiner lock based on the algorithm you pointed, can flip the instance when the bool of last batch is set on the callback, and even the nodes of the combiner are reused to apply the modification at the mirrored instance of the data structure protected by left right. Fits like a glove :)

EDIT3: grammar

@RafaelGago1
Copy link
Author

RafaelGago1 commented Dec 20, 2024

@dvyukov

Today I had some spare time to look at this. I very soon realized that this wasn't a 1 hour modification and I'd probably be unable to pull it off without help, as for what I have seen this isn't just a single localized change.

What I found is that there were two "#defines" available in the codebase related to seq_cst: "RL_IMPROVED_SEQ_CST_FENCE" and "RL_IMPROVED_SEQ_CST_RMW". Both of these are enabled for Java.

With those enabled the sample program works as expected. One thing pretty suspicious is that "RL_IMPROVED_SEQ_CST_RMW" throws a full "seq_cst" fence after each RMW operation. Seems too heavy-handed for C++, as RMW on a single variable should be weaker than a seq_cst fence.

@dvyukov
Copy link
Owner

dvyukov commented Dec 22, 2024

Unfortunately I don't remember much about the code now.

Isn't this requirement what makes Peterson algorithm with seq_cst fence work?
https://github.com/dvyukov/relacy/blob/master/example/peterson/peterson.cpp#L119-L146

As far as I remember that test worked. And I would assume we can even remove turn variable and just check that at least 1 thread observes store of the other thread.

@RafaelGago1
Copy link
Author

RafaelGago1 commented Dec 22, 2024

Well, I'm on a 3 week vacation without computer and today I had a bad night sleep, so I haven't a lot of brain cycles to spare or even an environment suitable for thinking.

What I can see (and today I see very little) is that the linked Peterson example probably doesn't have the requirement for a relaxed load after a seq_cst fence seeing at least a seq_cst (rmw) store before the fence or newer.

I think that a test for this exact case can be as simple as having 2 threads, where one thread does one relaxed store, a rmw seq_cst and a final third relaxed store, (e.g. 1, 2, 3) then when it is ensured that the 3 stores have been sequenced, on a different thread run a seq_cst fence + relaxed load and Assert that the load saw the value of the rmw or later (2 or 3). Just one variable involved, one property tested and doesn't require thinking about Peterson on a cognitively bad day :)

Edit: for what I read on the sources it seem that the number of history values on each variable is three, so the test above should fail.

@RafaelGago1
Copy link
Author

RafaelGago1 commented Dec 22, 2024

Then an opposite test for the stores, so a relaxed store followed by a seq_cst fence is always seen by a seq_cst load on another thread as long as the fence is sequenced. I instinctively think that this part is already working on the Petersson algorithm.

With these, 2 of the paragraphs on the standard for seq_cst fences would be covered. For the relaxed + fence + relaxed interactions I think there were changes on C++20, but we use older than that at daily job so I haven't cared.

Edit c++20 posal that I think went through: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0668r5.html

@RafaelGago1
Copy link
Author

RafaelGago1 commented Jan 13, 2025

0001-Add-test-for-relaxed-load-after-a-seq_cst-fence.zip
Minimal test confirming the issue attached on this message. Can be applied with "git am".

@dvyukov
Copy link
Owner

dvyukov commented Jan 14, 2025

+// "For atomic operations A and B on an atomic object M, where A modifies M and 
+// B takes its value, if there is a memory_order_seq_cst fence X such that A is
+// sequenced before X and B follows X in S, then B observes either the effects 
+// of A or a later modification of M in its modification order".

In the test B is not seq_cst, so it's not in S:

+            int y = x.load(std::memory_order_relaxed);

Does it still fail if the load is seq_cst?

Also, is there a similar statement in later standard revisions? C++11 was refined and fixed a lot. It makes little sense for Relacy to comply to C++11 at this point (e.g. compilers won't comply).

@RafaelGago1
Copy link
Author

RafaelGago1 commented Feb 7, 2025

Sorry. I copied the wrong excerpt to the code comment. On this thread I had it correct. This is what is tested:

"For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced before B, then B observes either the last memory_order_seq_cst modification of M preceding X in the total order S or a later modification of M in its modification order."

@RafaelGago1
Copy link
Author

RafaelGago1 commented Feb 7, 2025

I'm not an expert in the standard and it has been a while, but as far as I remember from reading third parties, later revisions only strengthen seq_cst fences, so the seq_cst fence on newer standards make stores before the fence being visible for every load after the fence, no matter the memory order.

In the test case everything could be relaxed and a value of 2 or later would be guaranteed. On C++11 the writes have to be seq_cst. I remember reading something about Itanium, but the details are fuzzy.

As I use C++17 and the wording was changed a lot, I still haven't made the effort to grasp the changes.

EDIT: Found the proposal https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html
EDIT2: Sorry for the late reply BTW. I missed the notification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants