Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pitfalls of using rdtsc and possible improvements of Trace::timestamp() #5430

Open
atopia opened this issue Jan 21, 2025 · 0 comments
Open

Comments

@atopia
Copy link
Contributor

atopia commented Jan 21, 2025

The discussion at atopia@272e77a#r151401339 has brought up some possible room for improvement regarding the Trace::timestamp() method and highlighted some pitfalls regarding the non-serializing nature of rdtsc that I want to outline here mostly for documentation:

With respect to later instructions, rdtsc in the lfence/rdtsc sequence may be dispatched for execution simultaneously with later instructions. This behavior may not be desirable if you also want to precisely time these later instructions as well. This is generally not a problem because the reservation station scheduler prioritizes older uops for dispatching as long as there are no structural hazards.

Seeing that contrary to the Linux kernel (cf. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be261ffce6f13229dad50f59c5e491f933d3167f ) we don't set lfence to be dispatch serializing on AMD across our supported kernels and the limited usefulness of rdtsc for measuring virtualized code execution, the current use of cpuid for serialization in Trace::timestamp() is probably fine. If at some point we need a more precise measurement, we could set the MSR on AMD and switch to a full lfence; rdtsc; lfence (or alternatively rdtscp; lfence) sequence for maximum reliability with minimal measurement overhead.

atopia added a commit to atopia/genode that referenced this issue Jan 21, 2025
While implementing TSC calibration in genodelabs#5215, the issue of properly serializing
TSC reads came up. Some learnings of the discussion were noted in genodelabs#5430.

Using `cpuid` for serialization as in Trace::timestamp() is portable,
but will cause VM exits on VMX and SVM and is therefore unsuitable to
retain a roughly working calibration loop while running virtualized.
On the other hand on most AMD systems, dispatch serializing `lfence`
needs to be explicitly enabled via a non-architectural MSR.

Enable setting up dispatch serializing lfence on AMD systems and always
serialize rdtsc accesses in Hw::Tsc::rdtsc() for maximum reliability.

Issues genodelabs#5215, genodelabs#5430
atopia referenced this issue in atopia/genode Jan 21, 2025
To get the Time Stamp Counter's frequency, hw relied on a complex and
incomplete algorithm.

Since this is a one-time initialization issue, move TSC calibration to
bootstrap and implement it using the ACPI timer.

Issue genodelabs#5215
atopia added a commit to atopia/genode that referenced this issue Jan 22, 2025
Since rdtsc() provides ordered timestamps now, we should reordering of
statements by the compiler too.

Issues genodelabs#5215, genodelabs#5430
chelmuth pushed a commit that referenced this issue Jan 22, 2025
While implementing TSC calibration in #5215, the issue of properly serializing
TSC reads came up. Some learnings of the discussion were noted in #5430.

Using `cpuid` for serialization as in Trace::timestamp() is portable,
but will cause VM exits on VMX and SVM and is therefore unsuitable to
retain a roughly working calibration loop while running virtualized.
On the other hand on most AMD systems, dispatch serializing `lfence`
needs to be explicitly enabled via a non-architectural MSR.

Enable setting up dispatch serializing lfence on AMD systems and always
serialize rdtsc accesses in Hw::Tsc::rdtsc() for maximum reliability.

Issues #5215, #5430
chelmuth pushed a commit that referenced this issue Jan 23, 2025
Since rdtsc() provides ordered timestamps now, we should reordering of
statements by the compiler too.

Issues #5215, #5430
chelmuth pushed a commit that referenced this issue Jan 30, 2025
While implementing TSC calibration in #5215, the issue of properly serializing
TSC reads came up. Some learnings of the discussion were noted in #5430.

Using `cpuid` for serialization as in Trace::timestamp() is portable,
but will cause VM exits on VMX and SVM and is therefore unsuitable to
retain a roughly working calibration loop while running virtualized.
On the other hand on most AMD systems, dispatch serializing `lfence`
needs to be explicitly enabled via a non-architectural MSR.

Enable setting up dispatch serializing lfence on AMD systems and always
serialize rdtsc accesses in Hw::Tsc::rdtsc() for maximum reliability.

Issues #5215, #5430
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant