Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/07/2025 14:15, David Laight wrote:
On Thu, 3 Jul 2025 13:39:57 +0300
"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
On Tue,  1 Jul 2025 12:58:31 +0300
"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index a508e4a8c66a..47b613690f84 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
  EXPORT_SYMBOL_GPL(clear_page_erms)
/*
- * Default clear user-space.
+ * Default memset.
   * Input:
   * rdi destination
+ * rsi scratch
   * rcx count
- * rax is zero
+ * al is value
   *
   * Output:
   * rcx: uncleared bytes or 0 if successful.
+ * rdx: clobbered
   */
  SYM_FUNC_START(rep_stos_alternative)
  	ANNOTATE_NOENDBR
+
+	movzbq %al, %rsi
+	movabs $0x0101010101010101, %rax
+
+	/* RDX:RAX = RAX * RSI */
+	mulq %rsi

NAK - you can't do that here.
Neither %rsi nor %rdx can be trashed.
The function has a very explicit calling convention.

That's why we have the clobbers... see below

What calling convention? We change the only caller to confirm to this.

The one that is implicit in:

+	asm volatile("1:\n\t"
+		     ALT_64("rep stosb",
+			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
+		     "2:\n\t"
+		     _ASM_EXTABLE_UA(1b, 2b)
+		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
+		     : "a" ((uint8_t)v)

The called function is only allowed to change the registers that
'rep stosb' uses - except it can access (but not change)
all of %rax - not just %al.

See: https://godbolt.org/z/3fnrT3x9r
In particular note that 'do_mset' must not change %rax.

This is very specific and is done so that the compiler can use
all the registers.

I feel like you trimmed off the clobbers from the asm() in the context
above. For reference, it is:

+		     : "memory", _ASM_SI, _ASM_DX);

I'm not saying this can't be optimized, but that doesn't seem to be your
complaint -- you say "the called function is only allowed to change
...", but this is not true when we have the clobbers, right?

This is exactly what I fixed with my v7 fixlet to this patch:

https://lore.kernel.org/all/1b96b0ca-5c14-4271-86c1-c305bf052b16@xxxxxxxxxx/


Vegard




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux