97 lines
4.1 KiB
ReStructuredText
Executable File
97 lines
4.1 KiB
ReStructuredText
Executable File
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
Protected virtual machines (pKVM)
|
|
=================================
|
|
|
|
Introduction
|
|
------------
|
|
|
|
Protected KVM (pKVM) is a KVM/arm64 extension which uses the two-stage
|
|
translation capability of the Armv8 MMU to isolate guest memory from the host
|
|
system. This allows for the creation of a confidential computing environment
|
|
without relying on whizz-bang features in hardware, but still allowing room for
|
|
complementary technologies such as memory encryption and hardware-backed
|
|
attestation.
|
|
|
|
The major implementation change brought about by pKVM is that the hypervisor
|
|
code running at EL2 is now largely independent of (and isolated from) the rest
|
|
of the host kernel running at EL1 and therefore additional hypercalls are
|
|
introduced to manage manipulation of guest stage-2 page tables, creation of VM
|
|
data structures and reclamation of memory on teardown. An immediate consequence
|
|
of this change is that the host itself runs with an identity mapping enabled
|
|
at stage-2, providing the hypervisor code with a mechanism to restrict host
|
|
access to an arbitrary physical page.
|
|
|
|
Enabling pKVM
|
|
-------------
|
|
|
|
The pKVM hypervisor is enabled by booting the host kernel at EL2 with
|
|
"``kvm-arm.mode=protected``" on the command-line. Once enabled, VMs can be spawned
|
|
in either protected or non-protected state, although the hypervisor is still
|
|
responsible for managing most of the VM metadata in either case.
|
|
|
|
Limitations
|
|
-----------
|
|
|
|
Enabling pKVM places some significant limitations on KVM guests, regardless of
|
|
whether they are spawned in protected state. It is therefore recommended only
|
|
to enable pKVM if protected VMs are required, with non-protected state acting
|
|
primarily as a debug and development aid.
|
|
|
|
If you're still keen, then here is an incomplete list of caveats that apply
|
|
to all VMs running under pKVM:
|
|
|
|
- Guest memory cannot be file-backed (with the exception of shmem/memfd) and is
|
|
pinned as it is mapped into the guest. This prevents the host from
|
|
swapping-out, migrating, merging or generally doing anything useful with the
|
|
guest pages. It also requires that the VMM has either ``CAP_IPC_LOCK`` or
|
|
sufficient ``RLIMIT_MEMLOCK`` to account for this pinned memory.
|
|
|
|
- GICv2 is not supported and therefore GICv3 hardware is required in order
|
|
to expose a virtual GICv3 to the guest.
|
|
|
|
- Read-only memslots are unsupported and therefore dirty logging cannot be
|
|
enabled.
|
|
|
|
- Memslot configuration is fixed once a VM has started running, with subsequent
|
|
move or deletion requests being rejected with ``-EPERM``.
|
|
|
|
- There are probably many others.
|
|
|
|
Since the host is unable to tear down the hypervisor when pKVM is enabled,
|
|
hibernation (``CONFIG_HIBERNATION``) and kexec (``CONFIG_KEXEC``) will fail
|
|
with ``-EBUSY``.
|
|
|
|
If you are not happy with these limitations, then please don't enable pKVM :)
|
|
|
|
VM creation
|
|
-----------
|
|
|
|
When pKVM is enabled, protected VMs can be created by specifying the
|
|
``KVM_VM_TYPE_ARM_PROTECTED`` flag in the machine type identifier parameter
|
|
passed to ``KVM_CREATE_VM``.
|
|
|
|
Protected VMs are instantiated according to a fixed vCPU configuration
|
|
described by the ID register definitions in
|
|
``arch/arm64/include/asm/kvm_pkvm.h``. Only a subset of the architectural
|
|
features that may be available to the host are exposed to the guest and the
|
|
capabilities advertised by ``KVM_CHECK_EXTENSION`` are limited accordingly,
|
|
with the vCPU registers being initialised to their architecturally-defined
|
|
values.
|
|
|
|
Where not defined by the architecture, the registers of a protected vCPU
|
|
are reset to zero with the exception of the PC and X0 which can be set
|
|
either by the ``KVM_SET_ONE_REG`` interface or by a call to PSCI ``CPU_ON``.
|
|
|
|
VM runtime
|
|
----------
|
|
|
|
By default, memory pages mapped into a protected guest are inaccessible to the
|
|
host and any attempt by the host to access such a page will result in the
|
|
injection of an abort at EL1 by the hypervisor. For accesses originating from
|
|
EL0, the host will then terminate the current task with a ``SIGSEGV``.
|
|
|
|
pKVM exposes additional hypercalls to protected guests, primarily for the
|
|
purpose of establishing shared-memory regions with the host for communication
|
|
and I/O. These hypercalls are documented in hypercalls.rst.
|