Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments

Mukesh R <mrathor@xxxxxxxxxxxxxxxxxxx> · Fri, 22 Aug 2025 19:25:09 -0700

On 8/21/25 19:10, Michael Kelley wrote:
From: Mukesh R <mrathor@xxxxxxxxxxxxxxxxxxx> Sent: Thursday, August 21, 2025 1:50 PM

On 8/21/25 12:24, Michael Kelley wrote:
From: Mukesh R <mrathor@xxxxxxxxxxxxxxxxxxx> Sent: Wednesday, August 20, 2025
7:58 PM

On 8/20/25 17:31, Mukesh R wrote:
On 4/15/25 11:07, mhkelley58@xxxxxxxxx wrote:
From: Michael Kelley <mhklinux@xxxxxxxxxxx>

<snip>

IMHO, this is unnecessary change that just obfuscates code. With status quo
one has the advantage of seeing what exactly is going on, one can use the
args any which way, change batch size any which way, and is thus flexible.

I started this patch set in response to some errors in open coding the
use of hyperv_pcpu_input/output_arg, to see if helper functions could
regularize the usage and reduce the likelihood of future errors. Balancing
the pluses and minuses of the result, in my view the helper functions are
an improvement, though not overwhelmingly so. Others may see the
tradeoffs differently, and as such I would not go to the mat in arguing the
patches must be taken. But if we don't take them, we need to go back and
clean up minor errors and inconsistencies in the open coding at some
existing hypercall call sites.

Yes, definitely. Assuming Nuno knows what issues you are referring to,
I'll work with him to get them addressed asap. Thanks for noticing them.
If Nuno is not aware, I'll ping you for more info.

With time these functions only get more complicated and error prone. The
saving of ram is very minimal, this makes analyzing crash dumps harder,
and in some cases like in your patch 3/7 disables unnecessarily in error case:

- if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
-  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
-   HV_MAX_MODIFY_GPA_REP_COUNT);
+ local_irq_save(flags);      <<<<<<<
...

FWIW, this error case is not disabled. It is checked a few lines further down as:

I meant disabled interrupts. The check moves after disabling interrupts, so
it runs "disabled" in traditional OS terminology :).

Got it. But why is it problem to make this check with interrupts disabled?

You are creating disabling overhead where that overhead previously
did not exist.

The check is just for robustness and should never be true since
hv_mark_gpa_visiblity() is called from only one place that already ensures
the PFN count won't overflow a single page.

+       if (count > batch_size) {
+               pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
+                      batch_size);

So, this is a nak from me. sorry.

Furthermore, this makes us lose the ability to permanently map
input/output pages in the hypervisor. So, Wei kindly undo.

Could you elaborate on "lose the ability to permanently map
input/output pages in the hypervisor"? What specifically can't be
done and why?

Input and output are mapped at fixed GPA/SPA always to avoid hyp
having to map/unmap every time.

OK. But how does this patch set impede doing a fixed mapping?

The output address can be varied depending on the hypercall, instead
of it being fixed always at fixed address:

         *(void **)output = space + offset; <<<<<<

Wouldn't that fixed mapping be done at the time the page or pages
are allocated, and then be transparent to hypercall call sites?

<snip>

+/*
+ * Allocate one page that is shared between input and output args, which is
+ * sufficient for all current hypercalls. If a future hypercall requires

That is incorrect. We've iommu map hypercalls that will use up entire page
for input. More coming as we implement ram withdrawl from the hypervisor.

At least some form of ram withdrawal is already implemented upstream as
hv_call_withdraw_memory(). The hypercall has a very small input using the
hv_setup_in() helper, but the output list of PFNs must go to a separately
allocated page so it can be retained with interrupts enabled while
__free_page() is called. The use of this separate output page predates the
introduction of the hv_setup_in() helper.

Yeah, I am talking about hyp memory that loader gives it, and during the
lifetime it accumulates for VMs. We are opening the flood gates, so you
will see lots patches very soon.

For iommu map hypercalls, what do the input and output look like? Is the
paradigm different from the typical small fixed portion plus a variable size
array of values that are fed into a rep hypercall? Is there also a large amount
of output from the hypercall? Just curious if there's a case that's fundamentally
different from the current set of hypercalls.

Patches coming soon, but at high level, hypercall includes list of SPAs
that hypevisor will map into the iommu. These can get large. We will be
exploring what we can do better to pass them, perhaps multiple pages, not
sure yet, but for now it's single page.

To be clear, if the iommu hypercall does not produce any output, this patch
set uses the entire per-cpu hypercall arg page for input. For example,

Good

hv_mark_gpa_visibility() uses the entire page for input, which is mostly an
array of PFNs.

Using multiple input pages is definitely a new paradigm, on both the
hypervisor and guest sides, and that will need additional infrastructure,
with or without this patch set.

Right. With this patch set, every hcall is affected rather than just one
when code is modified to support that. That means one must test every
hypercall.

I'm just trying to understand where there are real technical blockers vs.
concern about the style and the encapsulation the helpers impose.

Well no technical blockers that can't be resolved, but style and obfuscation
that helpers impose are big concern.

Thanks,
-Mukesh