We don't want sched_nr_migrate to be too high, as that would impact
real-time latencies as the load balancing process for SCHED_OTHER
disallows IRQ interrupts.
Instead of making this a constant value, let's use an amount that
relates to the CPU performance of a device.
Consider that a device with a very weak 4 core processor needs to load
balance 32 tasks from the most busy-but-idle CPU. That balancing would
take significantly longer than a device with 8 cores.
On the contrary view, that same 8 core device balancing 32 tasks would
have no issues with SCHED_OTHER performance, however, real-time tasks
would suffer more jitter than is necessary.
Let's only balance as many tasks as there are CPUs in a device for
optimal SCHED_OTHER performance and SCHED_FIFO / SCHED_RR latency.