Re: [PATCH v4] KVM: s390: Use ESCA instead of BSCA at VM init

Janosch Frank <frankja@xxxxxxxxxxxxx> · Tue, 3 Jun 2025 10:48:09 +0200

On 6/2/25 6:34 PM, Christoph Schlameuss wrote:
All modern IBM Z and Linux One machines do offer support for the
Extended System Control Area (ESCA). The ESCA is available since the
z114/z196 released in 2010.
KVM needs to allocate and manage the SCA for guest VMs. Prior to this
change the SCA was setup as Basic SCA only supporting a maximum of 64
vCPUs when initializing the VM. With addition of the 65th vCPU the SCA
was needed to be converted to a ESCA.

Instead of allocating a BSCA and upgrading it for PV or when adding the
65th cpu we can always allocate the ESCA directly upon VM creation
simplifying the code in multiple places as well as completely removing
the need to convert an existing SCA.

In cases where the ESCA is not supported (z10 and earlier) the use of
the SCA entries and with that SIGP interpretation are disabled for VMs.
This increases the number of exits from the VM in multiprocessor
scenarios and thus decreases performance.
The same is true for VSIE where SIGP is currently disabled and thus no
SCA entries are used.

The only downside of the change is that we will always allocate 4 pages
for a 248 cpu ESCA instead of a single page for the BSCA per VM.
In return we can delete a bunch of checks and special handling depending
on the SCA type as well as the whole BSCA to ESCA conversion.

With that behavior change we are no longer referencing a bsca_block in
kvm->arch.sca. This will always be esca_block instead.
By specifying the type of the sca as esca_block we can simplify access
to the sca and get rid of some helpers while making the code clearer.

KVM_MAX_VCPUS is also moved to kvm_host_types to allow using this in
future type definitions.

Signed-off-by: Christoph Schlameuss <schlameuss@xxxxxxxxxxxxx>
---
Changes in v4:
- Squash patches into single patch
- Revert KVM_CAP_MAX_VCPUS to return KVM_CAP_MAX_VCPU_ID (255) again
- Link to v3: https://lore.kernel.org/r/20250522-rm-bsca-v3-0-51d169738fcf@xxxxxxxxxxxxx

Changes in v3:
- do not enable sigp for guests when kvm_s390_use_sca_entries() is false
   - consistently use kvm_s390_use_sca_entries() instead of sclp.has_sigpif
- Link to v2: https://lore.kernel.org/r/20250519-rm-bsca-v2-0-e3ea53dd0394@xxxxxxxxxxxxx

Changes in v2:
- properly apply checkpatch --strict (Thanks Claudio)
- some small comment wording changes
- rebased
- Link to v1: https://lore.kernel.org/r/20250514-rm-bsca-v1-0-6c2b065a8680@xxxxxxxxxxxxx
---
  arch/s390/include/asm/kvm_host.h       |   7 +-
  arch/s390/include/asm/kvm_host_types.h |   2 +
  arch/s390/kvm/gaccess.c                |  10 +-
  arch/s390/kvm/interrupt.c              |  71 ++++----------
  arch/s390/kvm/kvm-s390.c               | 167 ++++++---------------------------
  arch/s390/kvm/kvm-s390.h               |   9 +-
  6 files changed, 58 insertions(+), 208 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index cb89e54ada257eb4fdfe840ff37b2ea639c2d1cb..2a2b557357c8e40c82022eb338c3e98aa8f03a2b 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -27,8 +27,6 @@
  #include <asm/isc.h>
  #include <asm/guarded_storage.h>
  
-#define KVM_MAX_VCPUS 255
-
  #define KVM_INTERNAL_MEM_SLOTS 1
  
  /*
@@ -631,9 +629,8 @@ struct kvm_s390_pv {
  	struct mmu_notifier mmu_notifier;
  };
  
-struct kvm_arch{
-	void *sca;
-	int use_esca;
+struct kvm_arch {
+	struct esca_block *sca;
  	rwlock_t sca_lock;
  	debug_info_t *dbf;
  	struct kvm_s390_float_interrupt float_int;
diff --git a/arch/s390/include/asm/kvm_host_types.h b/arch/s390/include/asm/kvm_host_types.h
index 1394d3fb648f1e46dba2c513ed26e5dfd275fad4..9697db9576f6c39a6689251f85b4b974c344769a 100644
--- a/arch/s390/include/asm/kvm_host_types.h
+++ b/arch/s390/include/asm/kvm_host_types.h
@@ -6,6 +6,8 @@
  #include <linux/atomic.h>
  #include <linux/types.h>
  
+#define KVM_MAX_VCPUS 256

Why are we doing the whole 256 - 1 game?

+
  #define KVM_S390_BSCA_CPU_SLOTS 64

Can't you remove that now?

  #define KVM_S390_ESCA_CPU_SLOTS 248
  
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index f6fded15633ad87f6b02c2c42aea35a3c9164253..ee37d397d9218a4d33c7a33bd877d0b974ca9003 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -112,7 +112,7 @@ int ipte_lock_held(struct kvm *kvm)
  		int rc;
  
  		read_lock(&kvm->arch.sca_lock);
-		rc = kvm_s390_get_ipte_control(kvm)->kh != 0;
+		rc = kvm->arch.sca->ipte_control.kh != 0;
  		read_unlock(&kvm->arch.sca_lock);
  		return rc;
  	}

[...]

-static int sca_switch_to_extended(struct kvm *kvm);
  
  static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta)
  {
@@ -631,11 +630,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
  	case KVM_CAP_NR_VCPUS:
  	case KVM_CAP_MAX_VCPUS:
  	case KVM_CAP_MAX_VCPU_ID:
-		r = KVM_S390_BSCA_CPU_SLOTS;
+		/*
+		 * Return the same value for KVM_CAP_MAX_VCPUS and
+		 * KVM_CAP_MAX_VCPU_ID to pass the kvm_create_max_vcpus selftest.
+		 */
+		r = KVM_S390_ESCA_CPU_SLOTS;

We're not doing this to pass the test, we're doing this to adhere to the 
KVM API. Yes, the API document explains it with one indirection but it 
is in there.

The whole KVM_CAP_MAX_VCPU_ID problem will pop up in the future since we 
can't change the caps name. We'll have to live with it.