arm64: Align __arch_clear_user() to 16 bytes as per upstream
With significant code churn, the 'read' result from callbench can regress from 4000 ns to 7000 ns, despite no changes directly affecting the code paths exercised by callbench. This can also happen when playing with compiler options that affect the kernel's size. Upon further investigation, it turns out that /dev/zero, which callbench uses for its file benchmarks, makes heavy use of clear_user() when accessed via read(). When the regression occurs, __arch_clear_user() goes from being 16-byte aligned to being 4-byte aligned. A recent upstream change to arm64's clear_user() implementation, commit 344323e0428b ("arm64: Rewrite __arch_clear_user()"), mentions this: Apparently some folks examine large reads from /dev/zero closely enough to notice the loop being hot, so align it per the other critical loops (presumably around a typical instruction fetch granularity). As such, make __arch_clear_user() 16-byte aligned to fix the regression and match upstream. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
This commit is contained in:
parent
9bf1253932
commit
01b5714d66
1 changed files with 1 additions and 0 deletions
|
@ -19,6 +19,7 @@
|
|||
*
|
||||
* Alignment fixed up by hardware.
|
||||
*/
|
||||
.p2align 4
|
||||
SYM_FUNC_START(__arch_clear_user)
|
||||
mov x2, x1 // save the size for fixup return
|
||||
subs x1, x1, #8
|
||||
|
|
Loading…
Reference in a new issue