arm64: Align __arch_clear_user() to 16 bytes as per upstream

With significant code churn, the 'read' result from callbench can
regress from 4000 ns to 7000 ns, despite no changes directly affecting
the code paths exercised by callbench. This can also happen when playing
with compiler options that affect the kernel's size.

Upon further investigation, it turns out that /dev/zero, which callbench
uses for its file benchmarks, makes heavy use of clear_user() when
accessed via read(). When the regression occurs, __arch_clear_user()
goes from being 16-byte aligned to being 4-byte aligned.

A recent upstream change to arm64's clear_user() implementation, commit
344323e0428b ("arm64: Rewrite __arch_clear_user()"), mentions this:
  Apparently some folks examine large reads from /dev/zero closely enough
  to notice the loop being hot, so align it per the other critical loops
  (presumably around a typical instruction fetch granularity).

As such, make __arch_clear_user() 16-byte aligned to fix the regression
and match upstream.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
This commit is contained in:
Sultan Alsawaf 2021-11-21 00:22:17 -08:00 committed by Ksawlii
parent 9bf1253932
commit 01b5714d66

View file

@ -19,6 +19,7 @@
*
* Alignment fixed up by hardware.
*/
.p2align 4
SYM_FUNC_START(__arch_clear_user)
mov x2, x1 // save the size for fixup return
subs x1, x1, #8