Commit graph

3 commits

Author SHA1 Message Date
Clement Courbet
d4b05cdad5 sched: Optimize __calc_delta()
A significant portion of __calc_delta() time is spent in the loop
shifting a u64 by 32 bits. Use `fls` instead of iterating.

This is ~7x faster on benchmarks.

The generic `fls` implementation (`generic_fls`) is still ~4x faster
than the loop.
Architectures that have a better implementation will make use of it. For
example, on x86 we get an additional factor 2 in speed without dedicated
implementation.

On GCC, the asm versions of `fls` are about the same speed as the
builtin. On Clang, the versions that use fls are more than twice as
slow as the builtin. This is because the way the `fls` function is
written, clang puts the value in memory:
https://godbolt.org/z/EfMbYe. This bug is filed at
https://bugs.llvm.org/show_bug.cgi?idI406.

```
name                                   cpu/op
BM_Calc<__calc_delta_loop>             9.57ms Â=B112%
BM_Calc<__calc_delta_generic_fls>      2.36ms Â=B113%
BM_Calc<__calc_delta_asm_fls>          2.45ms Â=B113%
BM_Calc<__calc_delta_asm_fls_nomem>    1.66ms Â=B112%
BM_Calc<__calc_delta_asm_fls64>        2.46ms Â=B113%
BM_Calc<__calc_delta_asm_fls64_nomem>  1.34ms Â=B115%
BM_Calc<__calc_delta_builtin>          1.32ms Â=B111%
```

Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210303224653.2579656-1-joshdon@google.com
2024-11-19 18:05:19 +01:00
darkhz
bf2ac59ec9 sched/uclamp: Fix incorrect uclamp.latency_sensitive setting
This patch fixes the latency_sensitive flag for all cpuset cgroups, and
the value present in the uclamp.latency_sensitive node directly
corresponds to the task_group's latency_sensitive value.

Prior to this patch, this was not the case. The
uclamp_latency_sensitive() function applied values only to the cpu
cgroup subsys instead of the required cpuset cgroup subsys, as a
result of which the latency_sensitive value remained zero for all
taskgroups irrespective of its setting.

Also, fix a situation where latency_sensitive is enabled for the
cpuset's root cgroup, in which case all tasks will have their value
as 1, which in turn will enable prefer_idle for all tasks. This is
undesired and may cause high battery drain.
2024-11-17 17:38:14 +01:00
Gabriel2392
7ed7ee9edf Import A536BXXU9EXDC 2024-06-15 16:02:09 -03:00