sched/psi: report zeroes for CPU full at the system level
Martin find it confusing when look at the /proc/pressure/cpu output, and found no hint about that CPU "full" line in psi Documentation. % cat /proc/pressure/cpu some avg10=0.92 avg60=0.91 avg300=0.73 total=933490489 full avg10=0.22 avg60=0.23 avg300=0.16 total=358783277 The PSI_CPU_FULL state is introduced by commit e7fcd7622823 ("psi: Add PSI_CPU_FULL state"), which mainly for cgroup level, but also counted at the system level as a side effect. Naturally, the FULL state doesn't exist for the CPU resource at the system level. These "full" numbers can come from CPU idle schedule latency. For example, t1 is the time when task wakeup on an idle CPU, t2 is the time when CPU pick and switch to it. The delta of (t2 - t1) will be in CPU_FULL state. Another case all processes can be stalled is when all cgroups have been throttled at the same time, which unlikely to happen. Anyway, CPU_FULL metric is meaningless and confusing at the system level. So this patch will report zeroes for CPU full at the system level, and update psi Documentation accordingly. Fixes: e7fcd7622823 ("psi: Add PSI_CPU_FULL state") Reported-by: Martin Steigerwald <Martin.Steigerwald@proact.de> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220408121914.82855-1-zhouchengming@bytedance.com (cherry picked from commit 890d550d7dbac7a31ecaa78732aa22be282bb6b8) (cherry picked from commit f5187de2b75d019739caec97e8e7886a27e8554c) (cherry picked from commit 669718aaede6df28e37f6a11c0e257d18f050b1a)
This commit is contained in:
parent
91839ef7d5
commit
09558667e8
2 changed files with 13 additions and 11 deletions
|
@ -37,11 +37,7 @@ Pressure interface
|
||||||
Pressure information for each resource is exported through the
|
Pressure information for each resource is exported through the
|
||||||
respective file in /proc/pressure/ -- cpu, memory, and io.
|
respective file in /proc/pressure/ -- cpu, memory, and io.
|
||||||
|
|
||||||
The format for CPU is as such::
|
The format is as such::
|
||||||
|
|
||||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
|
||||||
|
|
||||||
and for memory and IO::
|
|
||||||
|
|
||||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||||
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||||
|
@ -58,6 +54,9 @@ situation from a state where some tasks are stalled but the CPU is
|
||||||
still doing productive work. As such, time spent in this subset of the
|
still doing productive work. As such, time spent in this subset of the
|
||||||
stall state is tracked separately and exported in the "full" averages.
|
stall state is tracked separately and exported in the "full" averages.
|
||||||
|
|
||||||
|
CPU full is undefined at the system level, but has been reported
|
||||||
|
since 5.13, so it is set to zero for backward compatibility.
|
||||||
|
|
||||||
The ratios (in %) are tracked as recent trends over ten, sixty, and
|
The ratios (in %) are tracked as recent trends over ten, sixty, and
|
||||||
three hundred second windows, which gives insight into short term events
|
three hundred second windows, which gives insight into short term events
|
||||||
as well as medium and long term trends. The total absolute stall time
|
as well as medium and long term trends. The total absolute stall time
|
||||||
|
|
|
@ -1114,14 +1114,17 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res)
|
||||||
mutex_unlock(&group->avgs_lock);
|
mutex_unlock(&group->avgs_lock);
|
||||||
|
|
||||||
for (full = 0; full < 2 - (res == PSI_CPU); full++) {
|
for (full = 0; full < 2 - (res == PSI_CPU); full++) {
|
||||||
unsigned long avg[3];
|
unsigned long avg[3] = { 0, };
|
||||||
u64 total;
|
u64 total = 0;
|
||||||
int w;
|
int w;
|
||||||
|
|
||||||
|
/* CPU FULL is undefined at the system level */
|
||||||
|
if (!(group == &psi_system && res == PSI_CPU && full)) {
|
||||||
for (w = 0; w < 3; w++)
|
for (w = 0; w < 3; w++)
|
||||||
avg[w] = group->avg[res * 2 + full][w];
|
avg[w] = group->avg[res * 2 + full][w];
|
||||||
total = div_u64(group->total[PSI_AVGS][res * 2 + full],
|
total = div_u64(group->total[PSI_AVGS][res * 2 + full],
|
||||||
NSEC_PER_USEC);
|
NSEC_PER_USEC);
|
||||||
|
}
|
||||||
|
|
||||||
seq_printf(m, "%s avg10=%lu.%02lu avg60=%lu.%02lu avg300=%lu.%02lu total=%llu\n",
|
seq_printf(m, "%s avg10=%lu.%02lu avg60=%lu.%02lu avg300=%lu.%02lu total=%llu\n",
|
||||||
full ? "full" : "some",
|
full ? "full" : "some",
|
||||||
|
|
Loading…
Reference in a new issue