On Mon, Jun 30, 2025 at 11:06 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > On Mon, Jun 30, 2025 at 10:22:26PM -0700, Vishal Annapurve wrote: > > On Mon, Jun 30, 2025 at 10:04 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > > > > On Tue, Jul 01, 2025 at 05:45:54AM +0800, Edgecombe, Rick P wrote: > > > > On Mon, 2025-06-30 at 12:25 -0700, Ackerley Tng wrote: > > > > > > So for this we can do something similar. Have the arch/x86 side of TDX grow > > > > > > a > > > > > > new tdx_buggy_shutdown(). Have it do an all-cpu IPI to kick CPUs out of > > > > > > SEAMMODE, wbivnd, and set a "no more seamcalls" bool. Then any SEAMCALLs > > > > > > after > > > > > > that will return a TDX_BUGGY_SHUTDOWN error, or similar. All TDs in the > > > > > > system > > > > > > die. Zap/cleanup paths return success in the buggy shutdown case. > > > > > > > > > > > > > > > > Do you mean that on unmap/split failure: > > > > > > > > Maybe Yan can clarify here. I thought the HWpoison scenario was about TDX module > > > My thinking is to set HWPoison to private pages whenever KVM_BUG_ON() was hit in > > > TDX. i.e., when the page is still mapped in S-EPT but the TD is bugged on and > > > about to tear down. > > > > > > So, it could be due to KVM or TDX module bugs, which retries can't help. > > > > > > > bugs. Not TDX busy errors, demote failures, etc. If there are "normal" failures, > > > > like the ones that can be fixed with retries, then I think HWPoison is not a > > > > good option though. > > > > > > > > > there is a way to make 100% > > > > > sure all memory becomes re-usable by the rest of the host, using > > > > > tdx_buggy_shutdown(), wbinvd, etc? > > > > > > Not sure about this approach. When TDX module is buggy and the page is still > > > accessible to guest as private pages, even with no-more SEAMCALLs flag, is it > > > safe enough for guest_memfd/hugetlb to re-assign the page to allow simultaneous > > > access in shared memory with potential private access from TD or TDX module? > > > > If no more seamcalls are allowed and all cpus are made to exit SEAM > > mode then how can there be potential private access from TD or TDX > > module? > Not sure. As Kirill said "TDX module has creative ways to corrupt it" > https://lore.kernel.org/all/zlxgzuoqwrbuf54wfqycnuxzxz2yduqtsjinr5uq4ss7iuk2rt@qaaolzwsy6ki/. I would assume that would be true only if TDX module logic is allowed to execute. Otherwise it would be useful to understand these "creative" ways better.