Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torvalds/linux.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-11-02KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTLSean Christopherson
Ignore guest CPUID for host userspace writes to the DEBUGCTL MSR, KVM's ABI is that setting CPUID vs. state can be done in any order, i.e. KVM allows userspace to stuff MSRs prior to setting the guest's CPUID that makes the new MSR "legal". Keep the vmx_get_perf_capabilities() check for guest writes, even though it's technically unnecessary since the vCPU's PERF_CAPABILITIES is consulted when refreshing LBR support. A future patch will clean up vmx_get_perf_capabilities() to avoid the RDMSR on every call, at which point the paranoia will incur no meaningful overhead. Note, prior to vmx_get_perf_capabilities() checking that the host fully supports LBRs via x86_perf_get_lbr(), KVM effectively relied on intel_pmu_lbr_is_enabled() to guard against host userspace enabling LBRs on platforms without full support. Fixes: c646236344e9 ("KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-5-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()Sean Christopherson
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only caller. Setting bits only to clear them a few instructions later is rather silly, and splitting the logic makes things seem more complicated than they actually are. Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single reference to the pair of bits. The extra layer of indirection provides no meaningful value and makes it unnecessarily tedious to understand what KVM is doing. No functional change. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-4-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailableEmanuele Giuseppe Esposito
Clear enable_sgx if ENCLS-exiting is not supported, i.e. if SGX cannot be virtualized. When KVM is loaded, adjust_vmx_controls checks that the bit is available before enabling the feature; however, other parts of the code check enable_sgx and not clearing the variable caused two different bugs, mostly affecting nested virtualization scenarios. First, because enable_sgx remained true, SECONDARY_EXEC_ENCLS_EXITING would be marked available in the capability MSR that are accessed by a nested hypervisor. KVM would then propagate the control from vmcs12 to vmcs02 even if it isn't supported by the processor, thus causing an unexpected VM-Fail (exit code 0x7) in L1. Second, vmx_set_cpu_caps() would not clear the SGX bits when hardware support is unavailable. This is a much less problematic bug as it only happens if SGX is soft-disabled (available in the processor but hidden in CPUID) or if SGX is supported for bare metal but not in the VMCS (will never happen when running on bare metal, but can theoertically happen when running in a VM). Last but not least, this ensures that module params in sysfs reflect KVM's actual configuration. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128 Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221025123749.2201649-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Make an event request when pending an MTF nested VM-ExitSean Christopherson
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run through inject_pending_event() and thus vmx_check_nested_events() prior to re-entering the guest. MTF currently works by virtue of KVM's hack that calls kvm_check_nested_events() from kvm_vcpu_running(), but that hack will be removed in the near future. Until that call is removed, the patch introduces no real functional change. Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behaviorSean Christopherson
Document the oddities of ICEBP interception (trap-like #DB is intercepted as a fault-like exception), and how using VMX's inner "skip" helper deliberately bypasses the pending MTF and single-step #DB logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-24-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Morph pending exceptions to pending VM-Exits at queue timeSean Christopherson
Morph pending exceptions to pending VM-Exits (due to interception) when the exception is queued instead of waiting until nested events are checked at VM-Entry. This fixes a longstanding bug where KVM fails to handle an exception that occurs during delivery of a previous exception, KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow paging), and KVM determines that the exception is in the guest's domain, i.e. queues the new exception for L2. Deferring the interception check causes KVM to esclate various combinations of injected+pending exceptions to double fault (#DF) without consulting L1's interception desires, and ends up injecting a spurious #DF into L2. KVM has fudged around the issue for #PF by special casing emulated #PF injection for shadow paging, but the underlying issue is not unique to shadow paging in L0, e.g. if KVM is intercepting #PF because the guest has a smaller maxphyaddr and L1 (but not L0) is using shadow paging. Other exceptions are affected as well, e.g. if KVM is intercepting #GP for one of SVM's workaround or for the VMware backdoor emulation stuff. The other cases have gone unnoticed because the #DF is spurious if and only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1 would have injected #DF anyways. The hack-a-fix has also led to ugly code, e.g. bailing from the emulator if #PF injection forced a nested VM-Exit and the emulator finds itself back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves the async #PF in L2 mess; no need to set a magic flag and token, simply queue a #PF nested VM-Exit. Deal with event migration by flagging that a pending exception was queued by userspace and check for interception at the next KVM_RUN, e.g. so that KVM does the right thing regardless of the order in which userspace restores nested state vs. event state. When "getting" events from userspace, simply drop any pending excpetion that is destined to be intercepted if there is also an injected exception to be migrated. Ideally, KVM would migrate both events, but that would require new ABI, and practically speaking losing the event is unlikely to be noticed, let alone fatal. The injected exception is captured, RIP still points at the original faulting instruction, etc... So either the injection on the target will trigger the same intercepted exception, or the source of the intercepted exception was transient and/or non-deterministic, thus dropping it is ok-ish. Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0") Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Make kvm_queued_exception a properly named, visible structSean Christopherson
Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch in anticipation of adding a second instance in kvm_vcpu_arch to handle exceptions that occur when vectoring an injected exception and are morphed to VM-Exit instead of leading to #DF. Opportunistically take advantage of the churn to rename "nr" to "vector". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-15-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exceptionSean Christopherson
Rename the kvm_x86_ops hook for exception injection to better reflect reality, and to align with pretty much every other related function name in KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-14-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCSSean Christopherson
Deliberately truncate the exception error code when shoving it into the VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12). Intel CPUs are incapable of handling 32-bit error codes and will never generate an error code with bits 31:16, but userspace can provide an arbitrary error code via KVM_SET_VCPU_EVENTS. Failure to drop the bits on exception injection results in failed VM-Entry, as VMX disallows setting bits 31:16. Setting the bits on VM-Exit would at best confuse L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to reinject the exception back into L2. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-3-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_configVitaly Kuznetsov
Like other host VMX control MSRs, MSR_IA32_VMX_MISC can be cached in vmcs_config to avoid the need to re-read it later, e.g. from cpu_has_vmx_intel_pt() or cpu_has_vmx_shadow_vmcs(). No (real) functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-33-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRsVitaly Kuznetsov
Using raw host MSR values for setting up nested VMX control MSRs is incorrect as some features need to disabled, e.g. when KVM runs as a nested hypervisor on Hyper-V and uses Enlightened VMCS or when a workaround for IA32_PERF_GLOBAL_CTRL is applied. For non-nested VMX, this is done in setup_vmcs_config() and the result is stored in vmcs_config. Use it for setting up allowed-1 bits in nested VMX MSRs too. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-32-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of ↵Vitaly Kuznetsov
setup_vmcs_config() As a preparation to reusing the result of setup_vmcs_config() for setting up nested VMX control MSRs, move LOAD_IA32_PERF_GLOBAL_CTRL errata handling to vmx_vmexit_ctrl()/vmx_vmentry_ctrl() and print the warning from hardware_setup(). While it seems reasonable to not expose LOAD_IA32_PERF_GLOBAL_CTRL controls to L1 hypervisor on buggy CPUs, such change would inevitably break live migration from older KVMs where the controls are exposed. Keep the status quo for now, L1 hypervisor itself is supposed to take care of the errata. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-30-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: VMX: Replace some Intel model numbers with mnemonicsJim Mattson
Intel processor code names are more familiar to many readers than their decimal model numbers. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-29-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setupSean Christopherson
Clear the CR3 and INVLPG interception controls at runtime based on whether or not EPT is being _used_, as opposed to clearing the bits at setup if EPT is _supported_ in hardware, and then restoring them when EPT is not used. Not mucking with the base config will allow using the base config as the starting point for emulating the VMX capability MSRs. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-28-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Add missing CPU based VM execution controls to vmcs_configVitaly Kuznetsov
As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, add the CPU based VM execution controls which KVM doesn't use but supports for nVMX to KVM_OPT_VMX_CPU_BASED_VM_EXEC_CONTROL and filter them out in vmx_exec_control(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-27-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Add missing VMEXIT controls to vmcs_configVitaly Kuznetsov
As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, add the VMEXIT controls which KVM doesn't use but supports for nVMX to KVM_OPT_VMX_VM_EXIT_CONTROLS and filter them out in vmx_vmexit_ctrl(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-26-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of ↵Vitaly Kuznetsov
setup_vmcs_config() As a preparation to reusing the result of setup_vmcs_config() in nested VMX MSR setup, move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering to vmx_exec_control(). No functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-25-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Extend VMX controls macro shenanigansVitaly Kuznetsov
When VMX controls macros are used to set or clear a control bit, make sure that this bit was checked in setup_vmcs_config() and thus is properly reflected in vmcs_config. Opportunistically drop pointless "< 0" check for adjust_vmx_controls()'s return value. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-24-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVMSean Christopherson
Don't toggle VM_ENTRY_IA32E_MODE in 32-bit kernels/KVM and instead bug the VM if KVM attempts to run the guest with EFER.LMA=1. KVM doesn't support running 64-bit guests with 32-bit hosts. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-23-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING in ↵Vitaly Kuznetsov
setup_vmcs_config() SECONDARY_EXEC_ENCLS_EXITING is the only control which is conditionally added to the 'optional' checklist in setup_vmcs_config() but the special case can be avoided by always checking for its presence first and filtering out the result later. Note: the situation when SECONDARY_EXEC_ENCLS_EXITING is present but cpu_has_sgx() is false is possible when SGX is "soft-disabled", e.g. if software writes MCE control MSRs or there's an uncorrectable #MC. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-22-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING in setup_vmcs_config()Vitaly Kuznetsov
CPU_BASED_{INTR,NMI}_WINDOW_EXITING controls are toggled dynamically by vmx_enable_{irq,nmi}_window, handle_interrupt_window(), handle_nmi_window() but setup_vmcs_config() doesn't check their existence. Add the check and filter the controls out in vmx_exec_control(). Note: KVM explicitly supports CPUs without VIRTUAL_NMIS and all these CPUs are supposedly lacking NMI_WINDOW_EXITING too. Adjust cpu_has_virtual_nmis() accordingly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-21-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config()Vitaly Kuznetsov
VM_ENTRY_IA32E_MODE control is toggled dynamically by vmx_set_efer() and setup_vmcs_config() doesn't check its existence. On the contrary, nested_vmx_setup_ctls_msrs() doesn set it on x86_64. Add the missing check and filter the bit out in vmx_vmentry_ctrl(). No (real) functional change intended as all existing CPUs supporting long mode and VMX are supposed to have it. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-20-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Get rid of eVMCS specific VMX controls sanitizationVitaly Kuznetsov
With the updated eVMCSv1 definition, there's no known 'problematic' controls which are exposed in VMX control MSRs but are not present in eVMCSv1: all known Hyper-V versions either don't expose the new fields by not setting bits in the VMX feature controls or support the new eVMCS revision. Get rid of VMX control MSRs filtering for KVM on Hyper-V. Note: VMX control MSRs filtering for Hyper-V on KVM (nested_evmcs_filter_control_msr()) stays as even the updated eVMCSv1 definition doesn't have all the features implemented by KVM and some fields are still missing. Moreover, nested_evmcs_filter_control_msr() has to support the original eVMCSv1 version when VMM wishes so. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-17-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Support PERF_GLOBAL_CTRL with enlightened VMCSVitaly Kuznetsov
Enlightened VMCS v1 got updated and now includes the required fields for loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT features. For KVM on Hyper-V enablement, KVM can just observe VMX control MSRs and use the features (with or without eVMCS) when possible. Hyper-V on KVM is messier as Windows 11 guests fail to boot if the controls are advertised and a new PV feature flag, CPUID.0x4000000A.EBX BIT(0), is not set. Honor the Hyper-V CPUID feature flag to play nice with Windows guests. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220830133737.1539624-16-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Treat eVMCS as enabled for guest iff Hyper-V is also enabledSean Christopherson
When querying whether or not eVMCS is enabled on behalf of the guest, treat eVMCS as enable if and only if Hyper-V is enabled/exposed to the guest. Note, flows that come from the host, e.g. KVM_SET_NESTED_STATE, must NOT check for Hyper-V being enabled as KVM doesn't require guest CPUID to be set before most ioctls(). Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-7-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Do not declare vmread_error() asmlinkageUros Bizjak
There is no need to declare vmread_error() asmlinkage, its arguments can be passed via registers for both 32-bit and 64-bit targets. Function argument registers are considered call-clobbered registers, they are saved in the trampoline just before the function call and restored afterwards. Dropping "asmlinkage" patch unifies trampoline function argument handling between 32-bit and 64-bit targets and improves generated code for 32-bit targets. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20220817144045.3206-1-ubizjak@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: Add extra information in kvm_page_fault trace pointWonhyuk Yang
Currently, kvm_page_fault trace point provide fault_address and error code. However it is not enough to find which cpu and instruction cause kvm_page_faults. So add vcpu id and instruction pointer in kvm_page_fault trace point. Cc: Baik Song An <bsahn@etri.re.kr> Cc: Hong Yeon Kim <kimhy@etri.re.kr> Cc: Taeung Song <taeung@reallinux.co.kr> Cc: linuxgeek@linuxgeek.io Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com> Link: https://lore.kernel.org/r/20220510071001.87169-1-vvghjk1234@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19KVM: VMX: Heed the 'msr' argument in msr_write_intercepted()Jim Mattson
Regardless of the 'msr' argument passed to the VMX version of msr_write_intercepted(), the function always checks to see if a specific MSR (IA32_SPEC_CTRL) is intercepted for write. This behavior seems unintentional and unexpected. Modify the function so that it checks to see if the provided 'msr' index is intercepted for write. Fixes: 67f4b9969c30 ("KVM: nVMX: Handle dynamic MSR intercept toggling") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220810213050.2655000-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-28Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"Paolo Bonzini
Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"), KVM has taken ownership of the "load IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls, trying to set these bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports MPX, and clear otherwise. The intent of the patch was to apply it to L0 in order to work around L1 kernels that lack the fix in commit 691bd4340bef ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS", 2017-07-04): by hiding the control bits from L0, L1 hides BNDCFGS from KVM_GET_MSR_INDEX_LIST, and the L1 bug is neutralized even in the lack of commit 691bd4340bef. This was perhaps a sensible kludge at the time, but a horrible idea in the long term and in fact it has not been extended to other CPUID bits like these: X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE, VMX_MISC_SAVE_EFER_LMA X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING, SECONDARY_EXEC_TSC_SCALING X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES These days it's sort of common knowledge that any MSR in KVM_GET_MSR_INDEX_LIST must allow *at least* setting it with KVM_SET_MSR to a default value, so it is unlikely that something like commit 5f76f6f5ff96 will be needed again. So revert it, at the potential cost of breaking L1s with a 6 year old kernel. While in principle the L0 owner doesn't control what runs on L1, such an old hypervisor would probably have many other bugs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bitsSean Christopherson
Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Restrict get_mt_mask() to a u8, use KVM_X86_OP_OPTIONAL_RET0Sean Christopherson
Restrict get_mt_mask() to a u8 and reintroduce using a RET0 static_call for the SVM implementation. EPT stores the memtype information in the lower 8 bits (bits 6:3 to be precise), and even returns a shifted u8 without an explicit cast to a larger type; there's no need to return a full u64. Note, RET0 doesn't play nice with a u64 return on 32-bit kernels, see commit bf07be36cd88 ("KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask"). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220714153707.3239119-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Add dedicated helper to get CPUID entry with significant indexSean Christopherson
Add a second CPUID helper, kvm_find_cpuid_entry_index(), to handle KVM queries for CPUID leaves whose index _may_ be significant, and drop the index param from the existing kvm_find_cpuid_entry(). Add a WARN in the inner helper, cpuid_entry2_find(), to detect attempts to retrieve a CPUID entry whose index is significant without explicitly providing an index. Using an explicit magic number and letting callers omit the index avoids confusion by eliminating the myriad cases where KVM specifies '0' as a dummy value. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: VMX: Update PT MSR intercepts during filter change iff PT in host+guestSean Christopherson
Update the Processor Trace (PT) MSR intercepts during a filter change if and only if PT may be exposed to the guest, i.e. only if KVM is operating in the so called "host+guest" mode where PT can be used simultaneously by both the host and guest. If PT is in system mode, the host is the sole owner of PT and the MSRs should never be passed through to the guest. Luckily the missed check only results in unnecessary work, as select RTIT MSRs are passed through only when RTIT tracing is enabled "in" the guest, and tracing can't be enabled in the guest when KVM is in system mode (writes to guest.MSR_IA32_RTIT_CTL are disallowed). Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20220712015838.1253995-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-06-27KVM: VMX: Prevent RSB underflow before vmenterJosh Poimboeuf
On VMX, there are some balanced returns between the time the guest's SPEC_CTRL value is written, and the vmenter. Balanced returns (matched by a preceding call) are usually ok, but it's at least theoretically possible an NMI with a deep call stack could empty the RSB before one of the returns. For maximum paranoia, don't allow *any* returns (balanced or otherwise) between the SPEC_CTRL write and the vmenter. [ bp: Fix 32-bit build. ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Fix IBRS handling after vmexitJosh Poimboeuf
For legacy IBRS to work, the IBRS bit needs to be always re-written after vmexit, even if it's already on. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Prevent guest RSB poisoning attacks with eIBRSJosh Poimboeuf
On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Convert launched argument to flagsJosh Poimboeuf
Convert __vmx_vcpu_run()'s 'launched' argument to 'flags', in preparation for doing SPEC_CTRL handling immediately after vmexit, which will need another flag. This is much easier than adding a fourth argument, because this code supports both 32-bit and 64-bit, and the fourth argument on 32-bit would have to be pushed on the stack. Note that __vmx_vcpu_run_flags() is called outside of the noinstr critical section because it will soon start calling potentially traceable functions. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/kvm/vmx: Make noinstr cleanPeter Zijlstra
The recent mmio_stale_data fixes broke the noinstr constraints: vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section make it all happy again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-24KVM: x86: Enable CMCI capability by default and handle injected UCNA errorsJue Wang
This patch enables MCG_CMCI_P by default in kvm_mce_cap_supported. It reuses ioctl KVM_X86_SET_MCE to implement injection of UnCorrectable No Action required (UCNA) errors, signaled via Corrected Machine Check Interrupt (CMCI). Neither of the CMCI and UCNA emulations depends on hardware. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-8-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: VMX: Refactor 32-bit PSE PT creation to avoid using MMU macroSean Christopherson
Compute the number of PTEs to be filled for the 32-bit PSE page tables using the page size and the size of each entry. While using the MMU's PT32_ENT_PER_PAGE macro is arguably better in isolation, removing VMX's usage will allow a future namespacing cleanup to move the guest page table macros into paging_tmpl.h, out of the reach of code that isn't directly related to shadow paging. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614233328.3896033-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Move "apicv_active" into "struct kvm_lapic"Sean Christopherson
Move the per-vCPU apicv_active flag into KVM's local APIC instance. APICv is fully dependent on an in-kernel local APIC, but that's not at all clear when reading the current code due to the flag being stored in the generic kvm_vcpu_arch struct. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update()Sean Christopherson
Drop the unused @vcpu parameter from hwapic_isr_update(). AMD/AVIC is unlikely to implement the helper, and VMX/APICv doesn't need the vCPU as it operates on the current VMCS. The result is somewhat odd, but allows for a decent amount of (future) cleanup in the APIC code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: nVMX: Update vmcs12 on BNDCFGS write, not at vmcs02=>vmcs12 syncSean Christopherson
Update vmcs12->guest_bndcfgs on intercepted writes to BNDCFGS from L2 instead of waiting until vmcs02 is synchronized to vmcs12. KVM always intercepts BNDCFGS accesses, so the only way the value in vmcs02 can change is via KVM's explicit VMWRITE during emulation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614215831.3762138-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: nVMX: Rename nested.vmcs01_* fields to nested.pre_vmenter_*Sean Christopherson
Rename the fields in struct nested_vmx used to snapshot pre-VM-Enter values to reflect that they can hold L2's values when restoring nested state, e.g. if userspace restores MSRs before nested state. As crazy as it seems, restoring MSRs before nested state actually works (because KVM goes out if it's way to make it work), even though the initial MSR writes will hit vmcs01 despite holding L2 values. Add a related comment to vmx_enter_smm() to call out that using the common VM-Exit and VM-Enter helpers to emulate SMI and RSM is wrong and broken. The few MSRs that have snapshots _could_ be fixed by taking a snapshot prior to the forced VM-Exit instead of at forced VM-Enter, but that's just the tip of the iceberg as the rather long list of MSRs that aren't snapshotted (hello, VM-Exit MSR load list) can't be handled this way. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614215831.3762138-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-15KVM: VMX: Skip filter updates for MSRs that KVM is already interceptingSean Christopherson
When handling userspace MSR filter updates, recompute interception for possible passthrough MSRs if and only if KVM wants to disabled interception. If KVM wants to intercept accesses, i.e. the associated bit is set in vmx->shadow_msr_intercept, then there's no need to set the intercept again as KVM will intercept the MSR regardless of userspace's wants. No functional change intended, the call to vmx_enable_intercept_for_msr() really is just a gigantic nop. Suggested-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220610214140.612025-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...
2022-06-14Merge tag 'x86-bugs-2022-06-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
2022-06-09Merge branch 'kvm-5.20-early'Paolo Bonzini
s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to show tests x86: * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Rewrite gfn-pfn cache refresh * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>