arm64/vmm: Preserve PSR_C64 when injecting an exception #2255

markjdb · 2024-11-27T22:12:56Z

I'm not sure if this might be simplistic, but it resolves a problem I see with breakpoint injection from bhyve's gdb stub. This arises when the debugger has installed a breakpoint, and the guest triggers a breakpoint exception some other way, e.g., a dtrace FBT probe.

sys/arm64/vmm/vmm_arm64.c

jrtc27 · 2024-11-27T22:26:30Z

This looks to be the bug @kwitaszczyk was running into, and the cause of the problem aligns with what @bsdjhb and I had managed to ascertain at the PI meeting.

jrtc27 · 2024-11-27T22:37:35Z

The handling of vbar_el1 also looks a bit dodgy. If CPACR_EL1.CEN[0] is 0 then CVBAR_EL1 is just interpreted as VBAR_EL1 by the architecture when trapping to EL1 (setting PCC's address to it), which means we need to derive a capability for tf_elr from elr_el1.

jrtc27 · 2024-11-27T23:39:55Z

The handling of vbar_el1 also looks a bit dodgy. If CPACR_EL1.CEN[0] is 0 then CVBAR_EL1 is just interpreted as VBAR_EL1 by the architecture when trapping to EL1 (setting PCC's address to it), which means we need to derive a capability for tf_elr from elr_el1.

This should be testable by running GDB against a plain FreeBSD VM, setting a breakpoint from GDB and triggering a breakpoint from within the VM, just as for the SPSR C64 issue except with a FreeBSD guest. I'm not sure how exactly I expect it to break, whether it'll get stuck in a trap loop or end up in bhyve. Hopefully at least it doesn't wedge the host, which should be true as long as some of this code is preemptible... otherwise there are other ways a malicious guest could trigger the same kinds of issues even with correct handling here.

kwitaszczyk · 2024-11-29T10:04:35Z

This looks to be the bug @kwitaszczyk was running into, and the cause of the problem aligns with what @bsdjhb and I had managed to ascertain at the PI meeting.

Unfortunately, it doesn't seem the host kernel enters the block of if (hypctx->has_exception) in my case. It does enter it but once the guest kernel ends up in kdb_enter() after the panic.

bsdjhb · 2024-11-29T15:07:52Z

Can you add a trace to see if vmmops_setreg is ever called to write a value to VM_REG_GUEST_CPSR?

kwitaszczyk · 2024-12-02T15:20:09Z

Can you add a trace to see if vmmops_setreg is ever called to write a value to VM_REG_GUEST_CPSR?

It doesn't seem vmmops_setreg() is ever called for this purpose. I've added

diff --git a/sys/arm64/vmm/vmm_arm64.c b/sys/arm64/vmm/vmm_arm64.c
index a66d5ed1ba97..23efd3e4887d 100644
--- a/sys/arm64/vmm/vmm_arm64.c
+++ b/sys/arm64/vmm/vmm_arm64.c
@@ -1447,6 +1447,12 @@ vmmops_setreg(void *vcpui, int reg, uintcap_t val)
 #endif
                *(uintcap_t *)regp = val;
                break;
+       case VM_REG_GUEST_CPSR:
+               printf("%s:%d\nval=%lu\nelr_el1=%#lp\ntf_elr=%#lp\nspsr=0x%lx\n",
+                   __func__, __LINE__, (uint64_t)val, (void *)hypctx->elr_el1,
+                   (void *)hypctx->tf.tf_elr, hypctx->tf.tf_spsr);
+               *(uint64_t *)regp = (uint64_t)val;
+               break;
        default:
                *(uint64_t *)regp = (uint64_t)val;
                break;

and I don't get anything in the serial console.

bsdjhb

I do think there are other things to check (CEN) that Jess noted, but this is certainly an improvement over what is there now.

markjdb · 2024-12-18T22:01:32Z

I do think there are other things to check (CEN) that Jess noted, but this is certainly an improvement over what is there now.

I have a patch to address that comment, but had been holding off on pushing it until I could test with hybrid kernels. Now I'm looking at an apparent regression with the VHE merge after I rebased onto dev; hopefully it won't take too long to fix.

bsdjhb · 2025-01-25T15:56:58Z

Is this blocked on the VHE regression or can this be merged as-is?

markjdb · 2025-02-12T18:02:25Z

The handling of vbar_el1 also looks a bit dodgy. If CPACR_EL1.CEN[0] is 0 then CVBAR_EL1 is just interpreted as VBAR_EL1 by the architecture when trapping to EL1 (setting PCC's address to it), which means we need to derive a capability for tf_elr from elr_el1.

This should be testable by running GDB against a plain FreeBSD VM, setting a breakpoint from GDB and triggering a breakpoint from within the VM, just as for the SPSR C64 issue except with a FreeBSD guest. I'm not sure how exactly I expect it to break, whether it'll get stuck in a trap loop or end up in bhyve. Hopefully at least it doesn't wedge the host, which should be true as long as some of this code is preemptible... otherwise there are other ways a malicious guest could trigger the same kinds of issues even with correct handling here.

The latest PR addresses this. Without the patch, the bhyve vcpu threads end up in a loop somewhere, but the VM is killable, the bug doesn't bring down the host.

sys/arm64/vmm/vmm_arm64.c

It's not required, and causes the host to freeze when single-stepping a guest, for reasons that I don't understand. The problem appeared only after we started using VHE by default.

When the CPU is configured to trap upon execution of Morello-specific instructions or access of Morello-specific control registers, i.e., vmm is executing a vanilla aarch64 guest, we need to derive a capability from elr_el1, since upon exception entry the capability value of PCC is set to VBAR_ELx.

jrtc27 · 2025-02-13T18:54:30Z

Re MDSCR_EL1.KDE, the relevant points are:

nVHE calls into EL2 via a trap, which sets all of PSTATE.{D,A,I,F}
VHE is in EL2 so just makes a function call directly to enter_guest so only PSTATE.{I,F} are set (from vmmops_run)
There's a window where we have MDSCR_EL1.{SS,KDE} set whilst still in the host
If either MDCR_EL2.TDE or HCR_EL2.TGE is set, this means EL_D is EL2 and so non-breakpoint exceptions at EL2 enabled thanks to MDSCR_EL1.KDE (breakpoints are always enabled regardless of its value)
Therefore we effectively end up single-stepping the host

I think though this also means that breakpoints are enabled in this window for VHE, so were one to place a guest breakpoint at an address that is in the window where we're world switching we'd see that breakpoint be hit in the host. Single-stepping with MDSCR_EL1.KDE just makes this issue more obvious because it immediately takes effect wherever you are.

Now, I agree that MDSCR_EL1.KDE is unnecessary here. EL_D is always going to be EL2 already thanks to MDCR_EL2.TDE so there's no need to set it for single-stepping EL1. However, the fact that it breaks things also shows that there is this other bug with regards to PSTATE.D; MDSCR_EL1.KDE's value should be entirely irrelevant. I believe that, were one to set PSTATE.D for the VHE case like happens implicitly from the trap in the nVHE case, the single-step issue would also be fixed.

jrtc27 reviewed Nov 27, 2024

View reviewed changes

sys/arm64/vmm/vmm_arm64.c Outdated Show resolved Hide resolved

markjdb force-pushed the dev-vmm-exinject branch from 2210edc to 4fea6a9 Compare November 27, 2024 22:40

markjdb force-pushed the dev-vmm-exinject branch from 4fea6a9 to 5788478 Compare December 18, 2024 20:10

bsdjhb approved these changes Dec 18, 2024

View reviewed changes

markjdb force-pushed the dev-vmm-exinject branch from 5788478 to 4641655 Compare February 12, 2025 17:51

jrtc27 reviewed Feb 12, 2025

View reviewed changes

sys/arm64/vmm/vmm_arm64.c Outdated Show resolved Hide resolved

markjdb and others added 3 commits February 12, 2025 19:21

arm64/vmm: Don't set MDSCR_KDE in the guest when single stepping

4cd354b

It's not required, and causes the host to freeze when single-stepping a guest, for reasons that I don't understand. The problem appeared only after we started using VHE by default.

arm64/vmm: Preserve PSR_C64 when injecting an exception

0d15d77

markjdb force-pushed the dev-vmm-exinject branch from 4641655 to 0d15d77 Compare February 12, 2025 19:24

bsdjhb approved these changes Feb 13, 2025

View reviewed changes

jrtc27 approved these changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm64/vmm: Preserve PSR_C64 when injecting an exception #2255

arm64/vmm: Preserve PSR_C64 when injecting an exception #2255

markjdb commented Nov 27, 2024

jrtc27 commented Nov 27, 2024

jrtc27 commented Nov 27, 2024

jrtc27 commented Nov 27, 2024

kwitaszczyk commented Nov 29, 2024

bsdjhb commented Nov 29, 2024

kwitaszczyk commented Dec 2, 2024 •

edited

Loading

bsdjhb left a comment

markjdb commented Dec 18, 2024

bsdjhb commented Jan 25, 2025

markjdb commented Feb 12, 2025

jrtc27 commented Feb 13, 2025

arm64/vmm: Preserve PSR_C64 when injecting an exception #2255

Are you sure you want to change the base?

arm64/vmm: Preserve PSR_C64 when injecting an exception #2255

Conversation

markjdb commented Nov 27, 2024

jrtc27 commented Nov 27, 2024

jrtc27 commented Nov 27, 2024

jrtc27 commented Nov 27, 2024

kwitaszczyk commented Nov 29, 2024

bsdjhb commented Nov 29, 2024

kwitaszczyk commented Dec 2, 2024 • edited Loading

bsdjhb left a comment

Choose a reason for hiding this comment

markjdb commented Dec 18, 2024

bsdjhb commented Jan 25, 2025

markjdb commented Feb 12, 2025

jrtc27 commented Feb 13, 2025

kwitaszczyk commented Dec 2, 2024 •

edited

Loading