e648baf242
commit 5ef1dc40ffa6a6cb968b0fdc43c3a61727a9e950 upstream. The s390 common I/O layer (CIO) returns an unexpected -EBUSY return code when drivers try to start I/O while a path-verification (PV) process is pending. This can lead to failed device initialization attempts with symptoms like broken network connectivity after boot. Fix this by replacing the -EBUSY return code with a deferred condition code 1 reply to make path-verification handling consistent from a driver's point of view. The problem can be reproduced semi-regularly using the following process, while repeating steps 2-3 as necessary (example assumes an OSA device with bus-IDs 0.0.a000-0.0.a002 on CHPID 0.02): 1. echo 0.0.a000,0.0.a001,0.0.a002 >/sys/bus/ccwgroup/drivers/qeth/group 2. echo 0 > /sys/bus/ccwgroup/devices/0.0.a000/online 3. echo 1 > /sys/bus/ccwgroup/devices/0.0.a000/online ; \ echo on > /sys/devices/css0/chp0.02/status Background information: The common I/O layer starts path-verification I/Os when it receives indications about changes in a device path's availability. This occurs for example when hardware events indicate a change in channel-path status, or when a manual operation such as a CHPID vary or configure operation is performed. If a driver attempts to start I/O while a PV is running, CIO reports a successful I/O start (ccw_device_start() return code 0). Then, after completion of PV, CIO synthesizes an interrupt response that indicates an asynchronous status condition that prevented the start of the I/O (deferred condition code 1). If a PV indication arrives while a device is busy with driver-owned I/O, PV is delayed until after I/O completion was reported to the driver's interrupt handler. To ensure that PV can be started eventually, CIO reports a device busy condition (ccw_device_start() return code -EBUSY) if a driver tries to start another I/O while PV is pending. In some cases this -EBUSY return code causes device drivers to consider a device not operational, resulting in failed device initialization. Note: The code that introduced the problem was added in 2003. Symptoms started appearing with the following CIO commit that causes a PV indication when a device is removed from the cio_ignore list after the associated parent subchannel device was probed, but before online processing of the CCW device has started: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") During boot, the cio_ignore list is modified by the cio_ignore dracut module [1] as well as Linux vendor-specific systemd service scripts[2]. When combined, this commit and boot scripts cause a frequent occurrence of the problem during boot. [1] https://github.com/dracutdevs/dracut/tree/master/modules.d/81cio_ignore [2] https://github.com/SUSE/s390-tools/blob/master/cio_ignore.service Cc: stable@vger.kernel.org # v5.15+ Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") Tested-By: Thorsten Winkler <twinkler@linux.ibm.com> Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
---|---|---|
.. | ||
airq.c | ||
blacklist.c | ||
blacklist.h | ||
ccwgroup.c | ||
ccwreq.c | ||
chp.c | ||
chp.h | ||
chsc.c | ||
chsc.h | ||
chsc_sch.c | ||
chsc_sch.h | ||
cio.c | ||
cio.h | ||
cio_debug.h | ||
cmf.c | ||
crw.c | ||
css.c | ||
css.h | ||
device.c | ||
device.h | ||
device_fsm.c | ||
device_id.c | ||
device_ops.c | ||
device_pgid.c | ||
device_status.c | ||
eadm_sch.c | ||
eadm_sch.h | ||
fcx.c | ||
idset.c | ||
idset.h | ||
io_sch.h | ||
ioasm.c | ||
ioasm.h | ||
isc.c | ||
itcw.c | ||
Makefile | ||
orb.h | ||
qdio.h | ||
qdio_debug.c | ||
qdio_debug.h | ||
qdio_main.c | ||
qdio_setup.c | ||
qdio_thinint.c | ||
scm.c | ||
trace.c | ||
trace.h | ||
vfio_ccw_async.c | ||
vfio_ccw_chp.c | ||
vfio_ccw_cp.c | ||
vfio_ccw_cp.h | ||
vfio_ccw_drv.c | ||
vfio_ccw_fsm.c | ||
vfio_ccw_ops.c | ||
vfio_ccw_private.h | ||
vfio_ccw_trace.c | ||
vfio_ccw_trace.h |