Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soundness of CriticalSectionDefaultMutex implementation #513

Closed
peterkrull opened this issue Jan 31, 2025 · 4 comments · Fixed by #514
Closed

Soundness of CriticalSectionDefaultMutex implementation #513

peterkrull opened this issue Jan 31, 2025 · 4 comments · Fixed by #514
Labels
kind/bug Something isn't working

Comments

@peterkrull
Copy link
Contributor

I am currently experiencing some deadlocks with the WaitQueue when using the cs-backed mutex in a single-core system with interrupts. The stack trace seems to be stuck at the SpinLock inside the CriticalSectionDefaultMutex, which in my mind is a place the cs-mutex should never be on a single-core system.

I looked at the code:

https://github.com/hawkw/mycelium/blob/main/maitake-sync/src/blocking/default_mutex.rs#L294-L326

Specifically the ScopedRawMutex trait function

fn with_lock<R>(&self, f: impl FnOnce() -> R) -> R {
    self.0.with_lock(|| critical_section::with(|_cs| f()))
}

with the SpinLock::with_lock being

fn with_lock<R>(&self, f: impl FnOnce() -> R) -> R {
    self.lock();
    // Using a drop guard ensures that the mutex is unlocked when this
    // function exits, even if `f()` panics.
    let _unlock = Unlock(self);
    f()
}

Is it not incorrect to first lock the spin-lock, and then enter the critical section after? If an interrupt fires between locking the SpinLock and entering the critical section with f() the interrupt context would just get stuck in the spin loop trying to lock.

@hawkw hawkw added the kind/bug Something isn't working label Jan 31, 2025
@hawkw
Copy link
Owner

hawkw commented Jan 31, 2025

Agh, I think you're right; we need the critical_section::with to be outside of self.0.with_lock rather than inside it. Should be an easy fix, at least. If you're interested in submitting a patch, I'll happily merge it and publish a point release, otherwise I can go fix it.

@peterkrull
Copy link
Contributor Author

Will do, just a moment

@peterkrull
Copy link
Contributor Author

Just a gentle bump :) Also for #515

@hawkw
Copy link
Owner

hawkw commented Feb 6, 2025

Alright, published maitake-sync v0.2.1 with the fix for this, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants