Re: [PATCHv2 00/12] TDX: Enable Dynamic PAMT

Kiryl Shutsemau <kas@xxxxxxxxxx> · Thu, 14 Aug 2025 11:55:57 +0100

On Thu, Aug 14, 2025 at 12:14:40AM +0000, Edgecombe, Rick P wrote:
> On Wed, 2025-08-13 at 16:31 -0700, Dave Hansen wrote:
> > On 8/13/25 15:43, Edgecombe, Rick P wrote:
> > > I redid the test. Boot 10 TDs with 16GB of ram, run userspace to fault in memory
> > > from 4 threads until OOM, then shutdown. TDs were split between two sockets. It
> > > ended up with 1136 contentions of the global lock, 4ms waiting.
> > 
> > 4ms out of how much CPU time?
> 
> The whole test took about 60s wall time (minus the time of some manual steps).
> I'll have to automate it a bit more. But 4ms seemed safely in the "small"
> category.
> 
> > 
> > Also, contention is *NOT* necessarily bad here. Only _false_ contention.
> > 
> > The whole point of the lock is to ensure that there aren't two different
> > CPUs trying to do two different things to the same PAMT range at the
> > same time.
> > 
> > If there are, one of them *HAS* to wait. It can wait lots of different
> > ways, but it has to wait. That wait will show up as spinlock contention.
> > 
> > Even if the global lock went away, that 4ms of spinning might still be
> > there.
> 
> I assumed it was mostly real contention because the the refcount check outside
> the lock should prevent the majority of "two threads operating on the same 2MB
> region" collisions. The code is roughly:
> 
> 1:
>    if (atomic_inc_not_zero(2mb_pamt_refcount))
> 	return <it's mapped>;
> 2:
>    <global lock>
>    if (atomic_read(2mb_pamt_refcount) != 0) {
> 3:
> 	atomic_inc(2mb_pamt_refcount);
> 	<global unlock>
> 	return <it's mapped>;
>    }
>    <seamcall>
>    <global unlock>
> 4:
> 
> (similar pattern on the unmapping)
> 
> So it will only be valid contention if two threads try to fault in the *same* 2MB
> DPAMT region *and* lose that race around 1-3, but invalid contention if threads try
> to execute 2-4 at the same time for any different 2MB regions.
> 
> Let me go verify.

Note that in absence of the global lock here, concurrent PAMT.ADD would
also trigger some cache bouncing during pamt_walk() on taking shared
lock on 1G PAMT entry and exclusive lock on 2M entries in the same
cache (4 PAMT_2M entries per cache line). This is hidden by the global
lock.

You would not recover full contention time by removing the global lock.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov