Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 3 Jul 2025 13:39:57 +0300
"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:

> On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
> > On Tue,  1 Jul 2025 12:58:31 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
> >   
> > > Extract memcpy and memset functions from copy_user_generic() and
> > > __clear_user().
> > > 
> > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > whenever necessary. LASS requires them to handle text_poke.  
> > 
> > Except they contain the fault handlers so aren't generic calls.  
> 
> That's true. I will add a comment to clarify it.

They need renaming.

...
> > > diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> > > index a508e4a8c66a..47b613690f84 100644
> > > --- a/arch/x86/lib/clear_page_64.S
> > > +++ b/arch/x86/lib/clear_page_64.S
> > > @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> > >  EXPORT_SYMBOL_GPL(clear_page_erms)
> > >  
> > >  /*
> > > - * Default clear user-space.
> > > + * Default memset.
> > >   * Input:
> > >   * rdi destination
> > > + * rsi scratch
> > >   * rcx count
> > > - * rax is zero
> > > + * al is value
> > >   *
> > >   * Output:
> > >   * rcx: uncleared bytes or 0 if successful.
> > > + * rdx: clobbered
> > >   */
> > >  SYM_FUNC_START(rep_stos_alternative)
> > >  	ANNOTATE_NOENDBR
> > > +
> > > +	movzbq %al, %rsi
> > > +	movabs $0x0101010101010101, %rax
> > > +
> > > +	/* RDX:RAX = RAX * RSI */
> > > +	mulq %rsi  
> > 
> > NAK - you can't do that here.
> > Neither %rsi nor %rdx can be trashed.
> > The function has a very explicit calling convention.  
> 
> What calling convention? We change the only caller to confirm to this.

The one that is implicit in:

> > > +	asm volatile("1:\n\t"
> > > +		     ALT_64("rep stosb",
> > > +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > > +		     "2:\n\t"
> > > +		     _ASM_EXTABLE_UA(1b, 2b)
> > > +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> > > +		     : "a" ((uint8_t)v)

The called function is only allowed to change the registers that
'rep stosb' uses - except it can access (but not change)
all of %rax - not just %al.

See: https://godbolt.org/z/3fnrT3x9r
In particular note that 'do_mset' must not change %rax.

This is very specific and is done so that the compiler can use
all the registers.

> > It is also almost certainly a waste of time.
> > Pretty much all the calls will be for a constant 0x00.
> > Rename it all memzero() ...  
> 
> text_poke_memset() is not limited to zeroing.

But you don't want the overhead of extending the constant
on all the calls - never mind reserving %rdx to do it.
Maybe define a function that requires the caller to have
done the 'dirty work' - so any code that wants memzero()
just passes zero.
Or do the multiply in the C code where it will get optimised
away for constant zero.
You do get the multiply for the 'rep stosb' case - but that
is always going to be true unless you complicate things further.  

	David






[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux