Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torvalds/linux.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2022-06-23Merge tag 'kvmarm-fixes-5.19-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.19, take #2 - Fix a regression with pKVM when kmemleak is enabled - Add Oliver Upton as an official KVM/arm64 reviewer
2022-06-17KVM: arm64: Prevent kmemleak from accessing pKVM memoryQuentin Perret
Commit a7259df76702 ("memblock: make memblock_find_in_range method private") changed the API using which memory is reserved for the pKVM hypervisor. However, memblock_phys_alloc() differs from the original API in terms of kmemleak semantics -- the old one didn't report the reserved regions to kmemleak while the new one does. Unfortunately, when protected KVM is enabled, all kernel accesses to pKVM-private memory result in a fatal exception, which can now happen because of kmemleak scans: $ echo scan > /sys/kernel/debug/kmemleak [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! [ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000 [ 34.991813] Kernel panic - not syncing: HYP panic: [ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 34.991813] VCPU:0000000000000000 [ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102 [ 34.994059] Hardware name: linux,dummy-virt (DT) [ 34.994452] Call trace: [ 34.994641] dump_backtrace.part.0+0xcc/0xe0 [ 34.994932] show_stack+0x18/0x6c [ 34.995094] dump_stack_lvl+0x68/0x84 [ 34.995276] dump_stack+0x18/0x34 [ 34.995484] panic+0x16c/0x354 [ 34.995673] __hyp_pgtable_total_pages+0x0/0x60 [ 34.995933] scan_block+0x74/0x12c [ 34.996129] scan_gray_list+0xd8/0x19c [ 34.996332] kmemleak_scan+0x2c8/0x580 [ 34.996535] kmemleak_write+0x340/0x4a0 [ 34.996744] full_proxy_write+0x60/0xbc [ 34.996967] vfs_write+0xc4/0x2b0 [ 34.997136] ksys_write+0x68/0xf4 [ 34.997311] __arm64_sys_write+0x20/0x2c [ 34.997532] invoke_syscall+0x48/0x114 [ 34.997779] el0_svc_common.constprop.0+0x44/0xec [ 34.998029] do_el0_svc+0x2c/0xc0 [ 34.998205] el0_svc+0x2c/0x84 [ 34.998421] el0t_64_sync_handler+0xf4/0x100 [ 34.998653] el0t_64_sync+0x18c/0x190 [ 34.999252] SMP: stopping secondary CPUs [ 35.000034] Kernel Offset: disabled [ 35.000261] CPU features: 0x800,00007831,00001086 [ 35.000642] Memory Limit: none [ 35.001329] ---[ end Kernel panic - not syncing: HYP panic: [ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 35.001329] VCPU:0000000000000000 ]--- Fix this by explicitly excluding the hypervisor's memory pool from kmemleak like we already do for the hyp BSS. Cc: Mike Rapoport <rppt@kernel.org> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220616161135.3997786-1-qperret@google.com
2022-06-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...
2022-06-14Merge tag 'x86-bugs-2022-06-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
2022-06-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Fixes all over the place, most notably fixes for latent bugs in drivers that got exposed by suppressing interrupts before DRIVER_OK, which in turn has been done by 8b4ec69d7e09 ("virtio: harden vring IRQ")" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: um: virt-pci: set device ready in probe() vdpa: make get_vq_group and set_group_asid optional virtio: Fix all occurences of the "the the" typo vduse: Fix NULL pointer dereference on sysfs access vringh: Fix loop descriptors check in the indirect cases vdpa/mlx5: clean up indenting in handle_ctrl_vlan() vdpa/mlx5: fix error code for deleting vlan virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed vdpa/mlx5: Fix syntax errors in comments virtio-rng: make device ready before making request
2022-06-11Merge tag 'loongarch-fixes-5.19-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen. "Fix build errors and a stale comment" * tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Remove MIPS comment about cycle counter LoongArch: Fix copy_thread() build errors LoongArch: Fix the !CONFIG_SMP build
2022-06-11um: virt-pci: set device ready in probe()Vincent Whitchurch
Call virtio_device_ready() to make this driver work after commit b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the virtqueues in the probe function. (The virtio core sets the device ready when probe returns.) Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ") Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Johannes Berg <johannes@sipsolutions.net>
2022-06-10Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - SME save/restore for EFI fix - incorrect logic for detecting the need for saving/restoring the FFR state. - SME fix for a CPU ID field value. - Sysreg generation awk script fix (comparison operator). - Some typos in documentation or comments and silence a sparse warning (missing prototype). * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Add kasan_hw_tags_enable() prototype to silence sparse arm64/sme: Fix EFI save/restore arm64/fpsimd: Fix typo in comment arm64/sysreg: Fix typo in Enum element regex arm64/sme: Fix SVE/SME typo in ABI documentation arm64/sme: Fix tests for 0b1111 value ID registers
2022-06-10arm64: Add kasan_hw_tags_enable() prototype to silence sparseCatalin Marinas
This function is only called from assembly, no need for a prototype declaration in a header file. In addition, add #ifdef around the function since it is only used when CONFIG_KASAN_HW_TAGS. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com>
2022-06-10Merge tag 'for-linus-5.19a-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a small cleanup removing "export" of an __init function - a small series adding a new infrastructure for platform flags - a series adding generic virtio support for Xen guests (frontend side) * tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: unexport __init-annotated xen_xlate_map_ballooned_pages() arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices xen/grant-dma-iommu: Introduce stub IOMMU driver dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops xen/virtio: Enable restricted memory access using Xen grant mappings xen/grant-dma-ops: Add option to restrict memory access under Xen xen/grants: support allocating consecutive grants arm/xen: Introduce xen_setup_dma_ops() virtio: replace arch_has_restricted_virtio_memory_access() kernel: add platform_has() infrastructure
2022-06-10arm64/sme: Fix EFI save/restoreMark Brown
The EFI save/restore code is confused. When saving the check for saving FFR is inverted due to confusion with the streaming mode check, and when restoring we check if we need to restore FFR by checking the percpu efi_sm_state without the required wrapper rather than based on the combination of FA64 support and streaming mode. Fixes: e0838f6373e5 ("arm64/sme: Save and restore streaming mode over EFI runtime calls") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220602124132.3528951-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10arm64/fpsimd: Fix typo in commentXiang wangx
Delete the redundant word 'in'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Link: https://lore.kernel.org/r/20220610070543.59338-1-wangxiang@cdjrlc.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10arm64/sysreg: Fix typo in Enum element regexAlejandro Tafalla
In the awk script, there was a typo with the comparison operator when checking if the matched pattern is inside an Enum block. This prevented the generation of the whole sysreg-defs.h header. Fixes: 66847e0618d7 ("arm64: Add sysreg header generation scripting") Signed-off-by: Alejandro Tafalla <atafalla@dnyon.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220609204220.12112-1-atafalla@dnyon.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-09Merge tag 'powerpc-5.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - On 32-bit fix overread/overwrite of thread_struct via ptrace PEEK/POKE. - Fix softirqs not switching to the softirq stack since we moved irq_exit(). - Force thread size increase when KASAN is enabled to avoid stack overflows. - On Book3s 64 mark more code as not to be instrumented by KASAN to avoid crashes. - Exempt __get_wchan() from KASAN checking, as it's inherently racy. - Fix a recently introduced crash in the papr_scm driver in some configurations. - Remove include of <generated/compile.h> which is forbidden. Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner, He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras, Sachin Sant, Vaibhav Jain, and Wanming Hu. * tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Fix overread/overwrite of thread_struct via ptrace powerpc/book3e: get rid of #include <generated/compile.h> powerpc/kasan: Force thread size increase with KASAN powerpc/papr_scm: don't requests stats with '0' sized stats buffer powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK powerpc/kasan: Silence KASAN warnings in __get_wchan() powerpc/kasan: Mark more real-mode code as not to be instrumented
2022-06-09Merge tag 'net-5.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-09gcc-12: disable '-Warray-bounds' universally for nowLinus Torvalds
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people disabled the '-Warray-bounds' warning for gcc-12, because the new logic in gcc would cause warnings for their use of the S390_lowcore macro, which accesses absolute pointers. It turns out gcc-12 has many other issues in this area, so this takes that s390 warning disable logic, and turns it into a kernel build config entry instead. Part of the intent is that we can make this all much more targeted, and use this conflig flag to disable it in only particular configurations that cause problems, with the s390 case as an example: select GCC12_NO_ARRAY_BOUNDS and we could do that for other configuration cases that cause issues. Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more targeted way, and disable the warning only for particular uses: again the s390 case as an example: KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) but this ends up just doing it globally in the top-level Makefile, since the current issues are spread fairly widely all over: KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds We'll try to limit this later, since the gcc-12 problems are rare enough that *much* of the kernel can be built with it without disabling this warning. Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSEPaolo Bonzini
Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/putMaxim Levitsky
Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: disable preemption while updating apicv inhibitionMaxim Levitsky
Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: fix avic_kick_target_vcpus_fastMaxim Levitsky
There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also don’t use ‘index’ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: remove avic's broken code that updated APIC IDMaxim Levitsky
AVIC is now inhibited if the guest changes the apic id, and therefore this code is no longer needed. There are several ways this code was broken, including: 1. a vCPU was only allowed to change its apic id to an apic id of an existing vCPU. 2. After such change, the vCPU whose apic id entry was overwritten, could not correctly change its own apic id, because its own entry is already overwritten. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: document AVIC/APICv inhibit reasonsMaxim Levitsky
These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRsYuan Yao
Assign shadow_me_value, not shadow_me_mask, to PAE root entries, a.k.a. shadow PDPTRs, when host memory encryption is supported. The "mask" is the set of all possible memory encryption bits, e.g. MKTME KeyIDs, whereas "value" holds the actual value that needs to be stuffed into host page tables. Using shadow_me_mask results in a failed VM-Entry due to setting reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to physical addresses with non-zero MKTME bits sending to_shadow_page() into the weeds: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. BUG: unable to handle page fault for address: ffd43f00063049e8 PGD 86dfd8067 P4D 0 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm] kvm_mmu_free_roots+0xd1/0x200 [kvm] __kvm_mmu_unload+0x29/0x70 [kvm] kvm_mmu_unload+0x13/0x20 [kvm] kvm_arch_destroy_vm+0x8a/0x190 [kvm] kvm_put_kvm+0x197/0x2d0 [kvm] kvm_vm_release+0x21/0x30 [kvm] __fput+0x8e/0x260 ____fput+0xe/0x10 task_work_run+0x6f/0xb0 do_exit+0x327/0xa90 do_group_exit+0x35/0xa0 get_signal+0x911/0x930 arch_do_signal_or_restart+0x37/0x720 exit_to_user_mode_prepare+0xb2/0x140 syscall_exit_to_user_mode+0x16/0x30 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask") Signed-off-by: Yuan Yao <yuan.yao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20220608012015.19566-1-yuan.yao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09Merge tag 'kvmarm-fixes-5.19-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.19, take #1 - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation)
2022-06-09Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini
into HEAD KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry
2022-06-09powerpc/32: Fix overread/overwrite of thread_struct via ptraceMichael Ellerman
The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process. To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion. The kernel then maps that index to a particular register in its own data structures and gets/sets the value. The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area. Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness. To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced. Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array. On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable. It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/ Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future. Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled. Fixes: 87fec0514f61 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: stable@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas <ariel.miculas@belden.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au
2022-06-09KVM: arm64: Drop stale commentMarc Zyngier
The layout of 'struct kvm_vcpu_arch' has evolved significantly since the initial port of KVM/arm64, so remove the stale comment suggesting that a prefix of the structure is used exclusively from assembly code. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-7-will@kernel.org
2022-06-09KVM: arm64: Remove redundant hyp_assert_lock_held() assertionsWill Deacon
host_stage2_try() asserts that the KVM host lock is held, so there's no need to duplicate the assertion in its wrappers. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-6-will@kernel.org
2022-06-09KVM: arm64: Extend comment in has_vhe()Will Deacon
has_vhe() expands to a compile-time constant when evaluated from the VHE or nVHE code, alternatively checking a static key when called from elsewhere in the kernel. On face value, this looks like a case of premature optimization, but in fact this allows symbol references on VHE-specific code paths to be dropped from the nVHE object. Expand the comment in has_vhe() to make this clearer, hopefully discouraging anybody from simplifying the code. Cc: David Brazdil <dbrazdil@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-5-will@kernel.org
2022-06-09KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHEWill Deacon
Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode() only returns KVM_MODE_PROTECTED on systems where the feature is available. Cc: David Brazdil <dbrazdil@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org
2022-06-09KVM: arm64: Handle all ID registers trapped for a protected VMMarc Zyngier
A protected VM accessing ID_AA64ISAR2_EL1 gets punished with an UNDEF, while it really should only get a zero back if the register is not handled by the hypervisor emulation (as mandated by the architecture). Introduce all the missing ID registers (including the unallocated ones), and have them to return 0. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-3-will@kernel.org
2022-06-09KVM: arm64: Return error from kvm_arch_init_vm() on allocation failureWill Deacon
If we fail to allocate the 'supported_cpus' cpumask in kvm_arch_init_vm() then be sure to return -ENOMEM instead of success (0) on the failure path. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-2-will@kernel.org
2022-06-09RISC-V: KVM: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-06-08arm64/sme: Fix tests for 0b1111 value ID registersMark Brown
For both ID_AA64SMFR0_EL1.I16I64 and ID_AA64SMFR0_EL1.I8I32 we check for the presence of the feature by looking for a specific ID value of 0x4 but should instead be checking for the value 0xf defined by the architecture. This had no practical effect since we are looking for values >= our define and the only valid values in the architecture are 0b0000 and 0b1111 so we would detect things appropriately with the architecture as it stands even with the incorrect defines. Signed-off-by: Mark Brown <broonie@kernel.org> Fixes: b4adc83b0770 ("arm64/sme: System register and exception syndrome definitions") Link: https://lore.kernel.org/r/20220607165128.2833157-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: - syzkaller NULL pointer dereference - TDP MMU performance issue with disabling dirty logging - 5.14 regression with SVM TSC scaling - indefinite stall on applying live patches - unstable selftest - memory leak from wrong copy-and-paste - missed PV TLB flush when racing with emulation * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not report a vCPU as preempted outside instruction boundaries KVM: x86: do not set st->preempted when going back to user space KVM: SVM: fix tsc scaling cache logic KVM: selftests: Make hyperv_clock selftest more stable KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set KVM: Don't null dereference ops->destroy
2022-06-08KVM: arm64: Warn if accessing timer pending state outside of vcpu contextMarc Zyngier
A recurrent bug in the KVM/arm64 code base consists in trying to access the timer pending state outside of the vcpu context, which makes zero sense (the pending state only exists when the vcpu is loaded). In order to avoid more embarassing crashes and catch the offenders red-handed, add a warning to kvm_arch_timer_get_input_level() and return the state as non-pending. This avoids taking the system down, and still helps tracking down silly bugs. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220607131427.1164881-4-maz@kernel.org
2022-06-08KVM: arm64: Replace vgic_v3_uaccess_read_pending with vgic_uaccess_read_pendingMarc Zyngier
Now that GICv2 has a proper userspace accessor for the pending state, switch GICv3 over to it, dropping the local version, moving over the specific behaviours that CGIv3 requires (such as the distinction between pending latch and line level which were never enforced with GICv2). We also gain extra locking that isn't really necessary for userspace, but that's a small price to pay for getting rid of superfluous code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220607131427.1164881-3-maz@kernel.org
2022-06-08KVM: x86: do not report a vCPU as preempted outside instruction boundariesPaolo Bonzini
If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86: do not set st->preempted when going back to user spacePaolo Bonzini
Similar to the Xen path, only change the vCPU's reported state if the vCPU was actually preempted. The reason for KVM's behavior is that for example optimistic spinning might not be a good idea if the guest is doing repeated exits to userspace; however, it is confusing and unlikely to make a difference, because well-tuned guests will hardly ever exit KVM_RUN in the first place. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08LoongArch: Remove MIPS comment about cycle counterJason A. Donenfeld
This comment block was taken originally from the MIPS architecture code, where indeed there are particular assumptions one can make regarding SMP and !SMP and cycle counters. On LoongArch, however, the rdtime family of functions is always available. As Xuerui wrote: The rdtime family of instructions is in fact guaranteed to be available on LoongArch; LoongArch's subsets all contain them, even the 32-bit "Primary" subset intended for university teaching -- they provide the rdtimeh.w and rdtimel.w pair of instructions that access the same 64-bit counter. So this commit simply removes the incorrect comment block. Link: https://lore.kernel.org/lkml/e78940bc-9be2-2fe7-026f-9e64a1416c9f@xen0n.name/ Fixes: b738c106f735 ("LoongArch: Add other common headers") Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-08LoongArch: Fix copy_thread() build errorsHuacai Chen
Commit c5febea0956fd387 ("fork: Pass struct kernel_clone_args into copy_thread") change the prototype of copy_thread(), while commit 5bd2e97c868a8a44 ("fork: Generalize PF_IO_WORKER handling") change the structure of kernel_clone_args. They cause build errors, so fix it. Fixes: 5bd2e97c868a8a44 ("fork: Generalize PF_IO_WORKER handling") Fixes: c5febea0956fd387 ("fork: Pass struct kernel_clone_args into copy_thread") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-08LoongArch: Fix the !CONFIG_SMP buildHuacai Chen
1, We assume arch/loongarch/include/asm/smp.h be included in include/ linux/smp.h is valid and the reverse inclusion isn't. So remove the <linux/smp.h> in arch/loongarch/include/asm/smp.h. 2, arch/loongarch/include/asm/smp.h is only needed when CONFIG_SMP, and setup.c include it only because it need plat_smp_setup(). So, reorganize setup.c & smp.h, and then remove <asm/smp.h> in setup.c. 3, Fix cacheinfo.c and percpu.h build error by adding the missing header files when !CONFIG_SMP. 4, Fix acpi.c build error by adding CONFIG_SMP guards. 5, Move irq_stat definition from smp.c to irq.c and fix its declaration. 6, Select CONFIG_SMP for CONFIG_NUMA, similar as other architectures do. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-07bpf, arm64: Clear prog->jited_len along prog->jitedEric Dumazet
syzbot reported an illegal copy_to_user() attempt from bpf_prog_get_info_by_fd() [1] There was no repro yet on this bug, but I think that commit 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") is exposing a prior bug in bpf arm64. bpf_prog_get_info_by_fd() looks at prog->jited_len to determine if the JIT image can be copied out to user space. My theory is that syzbot managed to get a prog where prog->jited_len has been set to 43, while prog->bpf_func has ben cleared. It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering this particular warning. I thought find_vma_area(NULL) would not find a vm_struct. As we do not hold vmap_area_lock spinlock, it might be possible that the found vm_struct was garbage. [1] usercopy: Kernel memory exposure attempt detected from vmalloc (offset 792633534417210172, size 43)! kernel BUG at mm/usercopy.c:101! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 25002 Comm: syz-executor.1 Not tainted 5.18.0-syzkaller-10139-g8291eaafed36 #0 Hardware name: linux,dummy-virt (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : usercopy_abort+0x90/0x94 mm/usercopy.c:101 lr : usercopy_abort+0x90/0x94 mm/usercopy.c:89 sp : ffff80000b773a20 x29: ffff80000b773a30 x28: faff80000b745000 x27: ffff80000b773b48 x26: 0000000000000000 x25: 000000000000002b x24: 0000000000000000 x23: 00000000000000e0 x22: ffff80000b75db67 x21: 0000000000000001 x20: 000000000000002b x19: ffff80000b75db3c x18: 00000000fffffffd x17: 2820636f6c6c616d x16: 76206d6f72662064 x15: 6574636574656420 x14: 74706d6574746120 x13: 2129333420657a69 x12: 73202c3237313031 x11: 3237313434333533 x10: 3336323937207465 x9 : 657275736f707865 x8 : ffff80000a30c550 x7 : ffff80000b773830 x6 : ffff80000b773830 x5 : 0000000000000000 x4 : ffff00007fbbaa10 x3 : 0000000000000000 x2 : 0000000000000000 x1 : f7ff000028fc0000 x0 : 0000000000000064 Call trace: usercopy_abort+0x90/0x94 mm/usercopy.c:89 check_heap_object mm/usercopy.c:186 [inline] __check_object_size mm/usercopy.c:252 [inline] __check_object_size+0x198/0x36c mm/usercopy.c:214 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_to_user include/linux/uaccess.h:159 [inline] bpf_prog_get_info_by_fd.isra.0+0xf14/0xfdc kernel/bpf/syscall.c:3993 bpf_obj_get_info_by_fd+0x12c/0x510 kernel/bpf/syscall.c:4253 __sys_bpf+0x900/0x2150 kernel/bpf/syscall.c:4956 __do_sys_bpf kernel/bpf/syscall.c:5021 [inline] __se_sys_bpf kernel/bpf/syscall.c:5019 [inline] __arm64_sys_bpf+0x28/0x40 kernel/bpf/syscall.c:5019 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0x44/0xec arch/arm64/kernel/syscall.c:142 do_el0_svc+0xa0/0xc0 arch/arm64/kernel/syscall.c:206 el0_svc+0x44/0xb0 arch/arm64/kernel/entry-common.c:624 el0t_64_sync_handler+0x1ac/0x1b0 arch/arm64/kernel/entry-common.c:642 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581 Code: aa0003e3 d00038c0 91248000 97fff65f (d4210000) Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220531215113.1100754-1-eric.dumazet@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07KVM: SVM: fix tsc scaling cache logicMaxim Levitsky
SVM uses a per-cpu variable to cache the current value of the tsc scaling multiplier msr on each cpu. Commit 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") broke this caching logic. Refactor the code so that all TSC scaling multiplier writes go through a single function which checks and updates the cache. This fixes the following scenario: 1. A CPU runs a guest with some tsc scaling ratio. 2. New guest with different tsc scaling ratio starts on this CPU and terminates almost immediately. This ensures that the short running guest had set the tsc scaling ratio just once when it was set via KVM_SET_TSC_KHZ. Due to the bug, the per-cpu cache is not updated. 3. The original guest continues to run, it doesn't restore the msr value back to its own value, because the cache matches, and thus continues to run with a wrong tsc scaling ratio. Fixes: 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty loggingBen Gardon
Currently disabling dirty logging with the TDP MMU is extremely slow. On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with the shadow MMU. When disabling dirty logging, zap non-leaf parent entries to allow replacement with huge pages instead of recursing and zapping all of the child, leaf entries. This reduces the number of TLB flushes required. and reduces the disable dirty log time with the TDP MMU to ~3 seconds. Opportunistically add a WARN() to catch GFNs that are mapped at a higher level than their max level. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220525230904.1584480-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()Jan Beulich
As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs and clobbering of "cc" don't work well together. The compiler appears to mean to reject such, but doesn't - in its upstream form - quite manage to yet for "cc". Furthermore two similar macros don't clobber "cc", and clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler always assumes status flags to be clobbered there. Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses") Signed-off-by: Jan Beulich <jbeulich@suse.com> Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()Shaoqin Huang
When freeing obsolete previous roots, check prev_roots as intended, not the current root. Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com> Fixes: 527d5cd7eece ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped") Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: arm64: Don't read a HW interrupt pending state in user contextMarc Zyngier
Since 5bfa685e62e9 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW"), we're able to source the pending bit for an interrupt that is stored either on the physical distributor or on a device. However, this state is only available when the vcpu is loaded, and is not intended to be accessed from userspace. Unfortunately, the GICv2 emulation doesn't provide specific userspace accessors, and we fallback with the ones that are intended for the guest, with fatal consequences. Add a new vgic_uaccess_read_pending() accessor for userspace to use, build on top of the existing vgic_mmio_read_pending(). Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: 5bfa685e62e9 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW") Link: https://lore.kernel.org/r/20220607131427.1164881-2-maz@kernel.org Cc: stable@vger.kernel.org
2022-06-07KVM: arm64: Fix inconsistent indentingsunliming
Fix the following smatch warnings: arch/arm64/kvm/vmid.c:62 flush_context() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220602024805.511457-1-sunliming@kylinos.cn