commit edbbaae42a56f9a2b39c52ef2504dfb3fb0a7858 upstream.
Currently, whenever a caller is providing an affinity hint for an
interrupt, the allocation code uses it to calculate the node and copies the
cpumask into irq_desc::affinity.
If the affinity for the interrupt is not marked 'managed' then the startup
of the interrupt ignores irq_desc::affinity and uses the system default
affinity mask.
Prevent this by setting the IRQD_AFFINITY_SET flag for the interrupt in the
allocator, which causes irq_setup_affinity() to use irq_desc::affinity on
interrupt startup if the mask contains an online CPU.
[ tglx: Massaged changelog ]
Fixes: 45ddcecbfa94 ("genirq: Use affinity hint in irqdesc allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20240806072044.837827-1-shayd@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 393e1280f765661cf39785e967676a4e57324126 ]
In order to let a const irqchip be fed to the irqchip layer, adjust
the various prototypes. An extra cast in irq_set_chip()() is required
to avoid a warning.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220209162607.1118325-3-maz@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f8863bfb5ca500ea1c7669b16b1931ba27fce20 ]
As a preparation to moving the reference to the device used for
runtime power management, add a new 'dev' field to the irqdomain
structure for that exact purpose.
The irq_chip_pm_{get,put}() helpers are made aware of the dual
location via a new private helper.
No functional change intended.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Bartosz Golaszewski <brgl@bgdev.pl>
Link: https://lore.kernel.org/r/20220201120310.878267-2-maz@kernel.org
Stable-dep-of: 33b1c47d1fc0 ("irqchip/imx-irqsteer: Handle runtime power management correctly")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 68cbd415dd4b9c5b9df69f0f091879e56bf5907a upstream.
A proper task_work_cancel() API that actually cancels a callback and not
*any* callback pointing to a given function is going to be needed for
perf events event freeing. Do the appropriate rename to prepare for
that.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240621091601.18227-2-frederic@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even with an affinity mask that has multiple CPUs set, IRQs always run
on the first CPU in their affinity mask. Drivers that register an IRQ
affinity notifier (such as pm_qos) will therefore have an incorrect
assumption of where an IRQ is affined.
Fix the IRQ affinity mask deception by forcing it to only contain one
set CPU.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
On ARM, IRQs are executed on the first CPU inside the affinity mask, so
setting an affinity mask with more than one CPU set is deceptive and
causes issues with pm_qos. To fix this, only set the CPU0 bit inside the
affinity mask, since that's where IRQs will run by default.
This is a follow-up to "kernel: Don't allow IRQ affinity masks to have
more than one CPU".
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
An IRQ affinity notifier getting overwritten can point to some annoying
issues which need to be resolved, like multiple pm_qos objects being
registered to the same IRQ. Print out a warning when this happens to aid
debugging.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
The effective affinity mask causes a lot of bugs by virtue of many
set_irq_affinity handlers only setting an effective affinity mask for an
IRQ's parent but not the IRQ itself. Since this is a widespread issue that
would require manual fixing on every different SoC, just disable the
effective affinity mask altogether and use the first CPU in an affinity
mask configured.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
commit a6c11c0a5235fb144a65e0cb2ffd360ddc1f6c32 upstream.
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e7afb2eb7b2a7c81e9f608cbdf74a07606fd1b5 upstream.
irq_remove_generic_chip() calculates the Linux interrupt number for removing the
handler and interrupt chip based on gc::irq_base as a linear function of
the bit positions of set bits in the @msk argument.
When the generic chip is present in an irq domain, i.e. created with a call
to irq_alloc_domain_generic_chips(), gc::irq_base contains not the base
Linux interrupt number. It contains the base hardware interrupt for this
chip. It is set to 0 for the first chip in the domain, 0 + N for the next
chip, where $N is the number of hardware interrupts per chip.
That means the Linux interrupt number cannot be calculated based on
gc::irq_base for irqdomain based chips without a domain map lookup, which
is currently missing.
Rework the code to take the irqdomain case into account and calculate the
Linux interrupt number by a irqdomain lookup of the domain specific
hardware interrupt number.
[ tglx: Massage changelog. Reshuffle the logic and add a proper comment. ]
Fixes: cfefd21e693d ("genirq: Add chip suspend and resume callbacks")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231024150335.322282-1-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a0b0bad10587ae2948a7c36ca4ffc206007fbcf3 ]
When a CPU is about to be offlined, x86 validates that all active
interrupts which are targeted to this CPU can be migrated to the remaining
online CPUs. If not, the offline operation is aborted.
The validation uses irq_matrix_allocated() to retrieve the number of
vectors which are allocated on the outgoing CPU. The returned number of
allocated vectors includes also vectors which are associated to managed
interrupts.
That's overaccounting because managed interrupts are:
- not migrated when the affinity mask of the interrupt targets only
the outgoing CPU
- migrated to another CPU, but in that case the vector is already
pre-allocated on the potential target CPUs and must not be taken into
account.
As a consequence the check whether the remaining online CPUs have enough
capacity for migrating the allocated vectors from the outgoing CPU might
fail incorrectly.
Let irq_matrix_allocated() return only the number of allocated non-managed
interrupts to make this validation check correct.
[ tglx: Amend changelog and fixup kernel-doc comment ]
Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator")
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231020072522.557846-1-yu.c.chen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
This is a simple IRQ balancer that polls every X number of milliseconds and
moves IRQs from the most interrupt-heavy CPU to the least interrupt-heavy
CPUs until the heaviest CPU is no longer the heaviest. IRQs are only moved
from one source CPU to any number of destination CPUs per balance run.
Balancing is skipped if the gap between the most interrupt-heavy CPU and
the least interrupt-heavy CPU is below the configured threshold of
interrupts.
The heaviest IRQs are targeted for migration in order to reduce the number
of IRQs to migrate. If moving an IRQ would reduce overall balance, then it
won't be migrated.
The most interrupt-heavy CPU is calculated by scaling the number of new
interrupts on that CPU to the CPU's current capacity. This way, interrupt
heaviness takes into account factors such as thermal pressure and time
spent processing interrupts rather than just the sheer number of them. This
also makes SBalance aware of CPU asymmetry, where different CPUs can have
different performance capacities and be proportionally balanced.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
The msm_irqbalance service in userspace constantly migrates IRQs between
CPUs according to its whims, which is not desired. All of the IRQs have
a sane affinity (CPU0 if unimportant, CPU4-7 otherwise), so prevent
userspace from tampering with that.
Signed-off-by: Sultan Alsawaf <sultanxda@gmail.com>
(cherry picked from commit 6cedf3c9b1f8c962d19ce4151ca5caaff69e3c6a)
(cherry picked from commit 8fc0013ba4094fd8fe95fb0d23af0936347060f4)