On Thu, 3 Jul 2025 15:33:16 +0200 Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote: > On 03/07/2025 14:15, David Laight wrote: > > On Thu, 3 Jul 2025 13:39:57 +0300 > > "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > >> On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote: > >>> On Tue, 1 Jul 2025 12:58:31 +0300 > >>> "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > >>>> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S > >>>> index a508e4a8c66a..47b613690f84 100644 > >>>> --- a/arch/x86/lib/clear_page_64.S > >>>> +++ b/arch/x86/lib/clear_page_64.S > >>>> @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms) > >>>> EXPORT_SYMBOL_GPL(clear_page_erms) > >>>> > >>>> /* > >>>> - * Default clear user-space. > >>>> + * Default memset. > >>>> * Input: > >>>> * rdi destination > >>>> + * rsi scratch > >>>> * rcx count > >>>> - * rax is zero > >>>> + * al is value > >>>> * > >>>> * Output: > >>>> * rcx: uncleared bytes or 0 if successful. > >>>> + * rdx: clobbered > >>>> */ > >>>> SYM_FUNC_START(rep_stos_alternative) > >>>> ANNOTATE_NOENDBR > >>>> + > >>>> + movzbq %al, %rsi > >>>> + movabs $0x0101010101010101, %rax > >>>> + > >>>> + /* RDX:RAX = RAX * RSI */ > >>>> + mulq %rsi > >>> > >>> NAK - you can't do that here. > >>> Neither %rsi nor %rdx can be trashed. > >>> The function has a very explicit calling convention. > > That's why we have the clobbers... see below > > >> What calling convention? We change the only caller to confirm to this. > > > > The one that is implicit in: > > > >>>> + asm volatile("1:\n\t" > >>>> + ALT_64("rep stosb", > >>>> + "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM)) > >>>> + "2:\n\t" > >>>> + _ASM_EXTABLE_UA(1b, 2b) > >>>> + : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT > >>>> + : "a" ((uint8_t)v) > > > > The called function is only allowed to change the registers that > > 'rep stosb' uses - except it can access (but not change) > > all of %rax - not just %al. > > > > See: https://godbolt.org/z/3fnrT3x9r > > In particular note that 'do_mset' must not change %rax. > > > > This is very specific and is done so that the compiler can use > > all the registers. > > I feel like you trimmed off the clobbers from the asm() in the context > above. For reference, it is: > > + : "memory", _ASM_SI, _ASM_DX); I'm sure they weren't there... Enough clobbers will 'un-break' it - but that isn't the point. Linux will reject the patch if he reads it. The whole point about the function is that it is as direct a replacement for 'rep stos/movsb' as possible. > > I'm not saying this can't be optimized, but that doesn't seem to be your > complaint -- you say "the called function is only allowed to change > ...", but this is not true when we have the clobbers, right? You can't change %rax either - not without a clobber. Oh, and even with your version you only clobbers for %rax and %rdx. There is no need to use both %rsi and %rdx. The performance is a different problem. And the extra clobbers are likely to matter. x86 really doesn't have many registers. David > > This is exactly what I fixed with my v7 fixlet to this patch: > > https://lore.kernel.org/all/1b96b0ca-5c14-4271-86c1-c305bf052b16@xxxxxxxxxx/ > > > Vegard