kernel_samsung_a53x

Author	SHA1	Message	Date
ztc1997	136bbfd757	block: Add default I/O scheduler option	2024-11-19 17:43:55 +01:00
Paolo Valente	fbbabdb3bc	block, bfq: use half slice_idle as a threshold to check short ttime The value of the I/O plugging (idling) timeout is used also as the think-time threshold to decide whether a process has a short think time. In this respect, a good value of this timeout for rotational drives is un the order of several ms. Yet, this is often too long a time interval to be effective as a think-time threshold. This commit mitigates this problem (by a lot, according to tests), by halving the threshold. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit b5f74ecacc3139ef873e69acc3aba28083ecc416) (cherry picked from commit b1511c438e8a5668e6be04ad9107d6695332756c) (cherry picked from commit 389992d9dc78340676248d0f01c7569b3db950ed) (cherry picked from commit 49919eface6f4391cda0e77bcaad3e2786cbbab3) (cherry picked from commit 87b015de51122ea9b5d9e56b846ae945db8444f0) (cherry picked from commit 6ada34cdc94c89e97926a2d001412ecc027e1392) (cherry picked from commit 2782bcc2919dd2a0a1d461d36c22338e67bc6327)	2024-11-19 17:43:46 +01:00
Paolo Valente	fe945719eb	block, bfq: increase time window for waker detection Tests on slower machines showed current window to be way too small. This commit increases it. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit ab1fb47e33dc7754a7593181ffe0742c7105ea9a) (cherry picked from commit 0d1663f1922c5f6fb3a4b3cc5a3a861c765a3704) (cherry picked from commit 85d9e1637a38d0cfdeba4e3847f1797dcd18da5d) (cherry picked from commit 6bd707bb9a60e2bf0e680a271208f6c82a331571) (cherry picked from commit 43755e08d048ccd6f3b2a3bbd34bea4a71c5bc12) (cherry picked from commit b1a8cce9e99277ce53da20ab603473ad6c3e95d1) (cherry picked from commit 74d27133a3261a296ddd98e9ff09d89bfab797bb)	2024-11-19 17:43:43 +01:00
Paolo Valente	7034a03ec0	block, bfq: do not raise non-default weights BFQ heuristics try to detect interactive I/O, and raise the weight of the queues containing such an I/O. Yet, if also the user changes the weight of a queue (i.e., the user changes the ioprio of the process associated with that queue), then it is most likely better to prevent BFQ heuristics from silently changing the same weight. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 91b896f65d32610d6d58af02170b15f8d37a7702) (cherry picked from commit cbbd2f045e60073978fe1b721c0953cd8762ecbb) (cherry picked from commit 88b650c71f7d0d30ac2fa215a139d7a48d069cd9) (cherry picked from commit 9a4725f0341c71a9b4f50f2d203f9740029e42e5) (cherry picked from commit a2c57345ffa5404cefd3d43e2fd4e4492ac7c6e0) (cherry picked from commit df56458ca85c681d163d879b832f868ed5044c8e) (cherry picked from commit dfc085aad98db2bcabd2c438fcd722a90303e6cb)	2024-11-19 17:43:40 +01:00
Paolo Valente	4b23f1e69b	block, bfq: do not expire a queue when it is the only busy one This commits preserves I/O-dispatch plugging for a special symmetric case that may suddenly turn into asymmetric: the case where only one bfq_queue, say bfqq, is busy. In this case, not expiring bfqq does not cause any harm to any other queues in terms of service guarantees. In contrast, it avoids the following unlucky sequence of events: (1) bfqq is expired, (2) a new queue with a lower weight than bfqq becomes busy (or more queues), (3) the new queue is served until a new request arrives for bfqq, (4) when bfqq is finally served, there are so many requests of the new queue in the drive that the pending requests for bfqq take a lot of time to be served. In particular, event (2) may case even already dispatched requests of bfqq to be delayed, inside the drive. So, to avoid this series of events, the scenario is preventively declared as asymmetric also if bfqq is the only busy queues. By doing so, I/O-dispatch plugging is performed for bfqq. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 2391d13ed484df1515f0025458e1f82317823fab) (cherry picked from commit 79827eb41d8fb0f838a2c592775a8e63caeb7c57) (cherry picked from commit 41720669259995fb7f064fc0f988c9d228750b37) (cherry picked from commit 07d273c955ea2c34a42f6de0f1e3f1bfb00c6ce1) (cherry picked from commit 8034c856b8fcafbef405eedddc12bb0625e52a42) (cherry picked from commit f49083d304bda30647196b550a109f528c8266dc) (cherry picked from commit 8a597f0ab5e7e83bfa426d071185c3d3ce5fa535)	2024-11-19 17:43:34 +01:00
Paolo Valente	5238084cd8	block, bfq: save also injection state on queue merging To prevent injection information from being lost on bfq_queue merging, also the amount of service that a bfq_queue receives must be saved and restored when the bfq_queue is merged and split, respectively. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 5a5436b98d5cd2714feaaa579cec49dd7f7057bb) (cherry picked from commit 9372e98dc77c7f2ebbb808a60abb01f30d70d0bc) (cherry picked from commit e6a5b66cfe56495f26182cfd2340e3336bb4b2b4) (cherry picked from commit c579a3634d163ed05cc4ac258411f03db969926e) (cherry picked from commit 359f87d07390f687634185b0dd9d6f106fb5afdd) (cherry picked from commit d1d1f1336ed77b83e98d26175e196b45a28958f4) (cherry picked from commit 0ff8068594640924e0cffe27d8b0273bb80d74ca)	2024-11-19 17:43:15 +01:00
Paolo Valente	0769622634	block, bfq: save also weight-raised service on queue merging To prevent weight-raising information from being lost on bfq_queue merging, also the amount of service that a bfq_queue receives must be saved and restored when the bfq_queue is merged and split, respectively. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit e673914d52f913584cc4c454dfcff2e8eb04533f) (cherry picked from commit 48f3cf9bb6ae73de3e8e6cad2e50c6e70a6cd33f) (cherry picked from commit d947cf3f8bcbcbe2dd8f5eec82e83a35198f874b) (cherry picked from commit 39b91f1f22265c70cdc48916ac694dad6c21c191) (cherry picked from commit 421c82648e46467d29dc0b5cd5522f00a026083d) (cherry picked from commit e9eecde7c67303c1dc87864c10c372019d609b0b) (cherry picked from commit 41d4c63679c36dc63b4cc9be301ec8d8d518d33f)	2024-11-19 17:43:10 +01:00
Paolo Valente	5267faf794	block, bfq: fix switch back from soft-rt weitgh-raising A bfq_queue may happen to be deemed as soft real-time while it is still enjoying interactive weight-raising. If this happens because of a false positive, then the bfq_queue is likely to loose its soft real-time status soon. Upon losing such a status, the bfq_queue must get back its interactive weight-raising, if its interactive period is not over yet. But this case is not handled. This commit corrects this error. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit d1f600fa4732dac36c71a03b790f0c829a076475) (cherry picked from commit db891a7d6aed6cc37d681d2bbf6c9bd697059281) (cherry picked from commit 647b877a9a8493df84a1d4abd94be089c8fed49b) (cherry picked from commit 7eda6de0bbbfa1d05b8888b697d9b7aeffe4d64e) (cherry picked from commit c1e076d9f4688c77dfa0f859060ae1f27a8d889e) (cherry picked from commit db0058abb7534aeb0abebe01c65659aa3886de78) (cherry picked from commit 40bc06529a2053ca0caf2053dd6f2a27bf7af916)	2024-11-19 17:42:58 +01:00
Paolo Valente	8b47ef547b	block, bfq: re-evaluate convenience of I/O plugging on rq arrivals Upon an I/O-dispatch attempt, BFQ may detect that it was better to plug I/O dispatch, and to wait for a new request to arrive for the currently in-service queue. But the arrival of a new request for an empty bfq_queue, and thus the switch from idle to busy of the bfq_queue, may cause the scenario to change, and make plugging no longer needed for service guarantees, or more convenient for throughput. In this case, keeping I/O-dispatch plugged would certainly lower throughput. To address this issue, this commit makes such a check, and stops plugging I/O if it is better to stop plugging I/O. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 7f1995c27b19060dbdff23442f375e3097c90707) (cherry picked from commit 12ec5a8ca2486d06f880d41751383c0d9549ba49) (cherry picked from commit 64c6efc5ccb01edf553487aff312c0b7110cb30f) (cherry picked from commit 3e04c1949f447a8166fa6d6343bd5332d8c12a4b) (cherry picked from commit 40a263c36cf2094311e8189b6e9173360a808b12) (cherry picked from commit 61a02ce46503671c747e550a13972ca8abaf5030) (cherry picked from commit 3707ff2d32dccd807b8e5e6885f07f3874c71180)	2024-11-19 17:42:55 +01:00
Pavel Begunkov	f029d24207	splice: don't generate zero-len segement bvecs iter_file_splice_write() may spawn bvec segments with zero-length. In preparation for prohibiting them, filter out by hand at splice level. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 0f1d344feb534555a0dcd0beafb7211a37c5355e) (cherry picked from commit 4c72fdc13bd20d10f59b8145627312814583a945) (cherry picked from commit cba6a18da1cc8144a07ba6a4b03e8e8dc8d24428) (cherry picked from commit 54a17499483118cd3c92feb747c88207ce30e9ce) (cherry picked from commit 4dec661d05c16a8e62dd833262ff68ce3e466770) (cherry picked from commit fe99d86b681099f662b2b01155b02b8476ff428d) (cherry picked from commit aa033460cd26157fe81e829e4744b3396a09860b)	2024-11-19 17:42:24 +01:00
Pavel Begunkov	3c61c6aa45	bvec/iter: disallow zero-length segment bvecs zero-length bvec segments are allowed in general, but not handled by bio and down the block layer so filtered out. This inconsistency may be confusing and prevent from optimisations. As zero-length segments are useless and places that were generating them are patched, declare them not allowed. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 9b2e0016d04c6542ace0128eb82ecb3b10c97e43) (cherry picked from commit 87afbd40acbb99860f846ad6f199e62e93be96c2) (cherry picked from commit f0677085687d50b5ecd6e7a2e19e4aff23251cb6) (cherry picked from commit affb154c088db678d4a541f8a4080fa5088cb10b) (cherry picked from commit 9b383b80e8432af1d0421acf9287076db26996d7) (cherry picked from commit f643066fcac50220888ecfe9b86c5d895d621648) (cherry picked from commit d2f588cf9664d76f78287142f505e4f375503ae6)	2024-11-19 17:42:21 +01:00
Christoph Hellwig	8ae63d0654	target/file: allocate the bvec array as part of struct target_core_file_cmd This saves one memory allocation, and ensures the bvecs aren't freed before the AIO completion. This will allow the lower level code to be optimized so that it can avoid allocating another bvec array. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit ecd7fba0ade1d6d8d49d320df9caf96922a376b2) (cherry picked from commit 272d2ea22b0e3da786a506896e36d3a586e6c252) (cherry picked from commit 83ff0aa1cc08c329feb0748c575810b3ce8c0077) (cherry picked from commit d0dc27fcc3f57d556ce4468a060e54f25c7b91b0) (cherry picked from commit 847a30a99fc4b11c9e6cf2ec049ca20a6da9c769) (cherry picked from commit 3799ad215edeb9276c4d16150a33de916cfa4ea1) (cherry picked from commit ee8f417b3276049e4f0bbadf4c4524f071de2361)	2024-11-19 17:42:15 +01:00
Pavel Begunkov	f6172ea41b	iov_iter: optimise bvec iov_iter_advance() iov_iter_advance() is heavily used, but implemented through generic means. For bvecs there is a specifically crafted function for that, so use bvec_iter_advance() instead, it's faster and slimmer. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 54c8195b4ebe10af66b49ab9c809bc16939555fc) (cherry picked from commit 8cac76228025fb022b1bb15e100efae8acde0425) (cherry picked from commit c8b0dff6b5ac38ff23605bdae1c5bf62766d0fa3) (cherry picked from commit 5bbff4ddbd3f87ddb409753269fa933109a99a7f) (cherry picked from commit 689d9157a0b58f95cb2641a17226b023a1fb226a) (cherry picked from commit 0df724cafe05ae311556249c7df0c2cd00e05007) (cherry picked from commit ba5d942df07c03782ab2aa2b2dd1f7b96b3b5c52)	2024-11-19 17:42:10 +01:00
Jan Kara	b93af2c415	bfq: Use 'ttime' local variable Use local variable 'ttime' instead of dereferencing bfqq. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 28c6def009192b673f92ea357dfb535ba15e00a4) (cherry picked from commit bb2a213aa0a2b717c3a6e7848c6f82656d80897f) (cherry picked from commit 2e0cfffb9a6da88cb1a786fb95618bfa714fea32) (cherry picked from commit caff780963fdfda0ab456c24027298482d745b2f) (cherry picked from commit b893b660ea8e998b760d48faeed2834e483158ad) (cherry picked from commit 7e3d952af5fdcf6b02d01d55dbf658fbc2d67f41) (cherry picked from commit 033b49f66e3808fead9e65e7c9417f26d423374f)	2024-11-19 17:42:05 +01:00
Joseph Qi	fdcb87e105	block/bfq: update comments and default value in docs for fifo_expire Correct the comments since bfq_fifo_expire[0] is for async request, while bfq_fifo_expire[1] is for sync request. Also update docs, according the source code, the default fifo_expire_async is 250ms, and fifo_expire_sync is 125ms. Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 4168a8d27ed3a00f160e7f885c956f060d2a0741) (cherry picked from commit a31ff2eb7d7cfa8331e513bb282f304117f18a77) (cherry picked from commit a78637befaa4106f9858b3ad8e3273960d3de82b) (cherry picked from commit bd8e7d3845c7a3b602aee361c7e3d0b5764ce060) (cherry picked from commit a8543954accfadbb9a1cf1f64c6b3749ee3a629b) (cherry picked from commit 960981f44b77dcd0d4e786aaef72d39057ccfc03) (cherry picked from commit 50cfb4b6c1c2e4a3778f66510fee7a2e86e053f2)	2024-11-19 17:41:49 +01:00
Paolo Valente	8eb5a42575	block, bfq: always inject I/O of queues blocked by wakers Suppose that I/O dispatch is plugged, to wait for new I/O for the in-service bfq-queue, say bfqq. Suppose then that there is a further bfq_queue woken by bfqq, and that this woken queue has pending I/O. A woken queue does not steal bandwidth from bfqq, because it remains soon without I/O if bfqq is not served. So there is virtually no risk of loss of bandwidth for bfqq if this woken queue has I/O dispatched while bfqq is waiting for new I/O. In contrast, this extra I/O injection boosts throughput. This commit performs this extra injection. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-2-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 2ec5a5c48373d4bc2f0699f86507a65bf0b9df35) (cherry picked from commit 0750db9767232fc2e4850868e526f4b02ecfb247) (cherry picked from commit 8676f43249bbb0478a8b18bd87703da59902dbfd) (cherry picked from commit df655d250f253a2f8a6792569108f30a04b7b894) (cherry picked from commit d76168c1c3805a2c948e7ff60c8eb341e2ff0013) (cherry picked from commit f213ae4e575f8ed67ae065fe80d06dc957f0b068) (cherry picked from commit eb1ff3ab6d66081fbaf007c6cfc1a5e841719c0c)	2024-11-19 17:41:42 +01:00
Jan Kara	9abe5bf065	bfq: Provide helper to generate bfqq name Instead of having helper formating bfqq pid, provide a helper to generate full bfqq name as used in the traces. It saves some code duplication and will save more in the coming tracepoints. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-6-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 582f04e19ad7b41df993c669805e48a01bcd9c5b) (cherry picked from commit e030e88a4c2e220366f3db1af33d72d9638f93b5) (cherry picked from commit e925a5fdce15f914ec2386b03bf64242792acce0) (cherry picked from commit 9265a0e6952305932aa2b5caf2183387859dcfce) (cherry picked from commit 41794de36673c11faca8c57625dfa50b76edde20) (cherry picked from commit 5e830976b50a9f0a2c927b02f921f0d6ae796183) (cherry picked from commit b5344876556e4a62cac7905bf11ca7ccf8d16d6d)	2024-11-19 17:41:18 +01:00
Yahu Gao	1297c45dcc	block/bfq_wf2q: correct weight to ioprio The return value is ioprio * BFQ_WEIGHT_CONVERSION_COEFF or 0. What we want is ioprio or 0. Correct this by changing the calculation. Signed-off-by: Yahu Gao <gaoyahu19@gmail.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220107065859.25689-1-gaoyahu19@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit bcd2be763252f3a4d5fc4d6008d4d96c601ee74b) (cherry picked from commit 81806db867a17e49d37b1d556dd39f4da5227f56) (cherry picked from commit aed9dbfda208b30130c64bac55570e2f89084d2b) (cherry picked from commit 7158b54afec4b986d52cc646a5dffc30eac6dc19) (cherry picked from commit fb4f80f773e0fc89f372c7afda9c8e9794849f67) (cherry picked from commit 5ad409c78ed2bfca202490fa13f0a93c49f21382)	2024-11-19 17:40:48 +01:00
Jan Kara	c72c9473f5	blk: Fix lock inversion between ioc lock and bfqd lock Lockdep complains about lock inversion between ioc->lock and bfqd->lock: bfqd -> ioc: put_io_context+0x33/0x90 -> ioc->lock grabbed blk_mq_free_request+0x51/0x140 blk_put_request+0xe/0x10 blk_attempt_req_merge+0x1d/0x30 elv_attempt_insert_merge+0x56/0xa0 blk_mq_sched_try_insert_merge+0x4b/0x60 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed blk_mq_sched_insert_requests+0xd6/0x2b0 blk_mq_flush_plug_list+0x154/0x280 blk_finish_plug+0x40/0x60 ext4_writepages+0x696/0x1320 do_writepages+0x1c/0x80 __filemap_fdatawrite_range+0xd7/0x120 sync_file_range+0xac/0xf0 ioc->bfqd: bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed put_io_context_active+0x78/0xb0 -> ioc->lock grabbed exit_io_context+0x48/0x50 do_exit+0x7e9/0xdd0 do_group_exit+0x54/0xc0 To avoid this inversion we change blk_mq_sched_try_insert_merge() to not free the merged request but rather leave that upto the caller similarly to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure to free all the merged requests after dropping bfqd->lock. Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210623093634.27879-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit fd2ef39cc9a6b9c4c41864ac506906c52f94b06a) (cherry picked from commit 786e392c4a7bd2559bdc1a1c6ac28d8b612a0735) (cherry picked from commit aa8e3e1451bde73dff60f1e5110b6a3cb810e35b) (cherry picked from commit 4deef6abb13a82b148c583d9ab37374c876fe4c2) (cherry picked from commit 1988f864ec1c494bb54e5b9df1611195f6d923f2) (cherry picked from commit 9dc0074b0dd8960f9e06dc1494855493ff53eb68) (cherry picked from commit c937983724111bb4526e34da0d5c6c8aea1902af)	2024-11-19 17:40:26 +01:00
Johannes Weiner	90c0c9aa4a	cgroup: rstat: punt root-level optimization to individual controllers Current users of the rstat code can source root-level statistics from the native counters of their respective subsystem, allowing them to forego aggregation at the root level. This optimization is currently implemented inside the generic rstat code, which doesn't track the root cgroup and doesn't invoke the subsystem flush callbacks on it. However, the memory controller cannot do this optimization, because cgroup1 breaks out memory specifically for the local level, including at the root level. In preparation for the memory controller switching to rstat, move the optimization from rstat core to the controllers. Afterwards, rstat will always track the root cgroup for changes and invoke the subsystem callbacks on it; and it's up to the subsystem to special-case and skip aggregation of the root cgroup if it can source this information through other, cheaper means. This is the case for the io controller and the cgroup base stats. In their respective flush callbacks, check whether the parent is the root cgroup, and if so, skip the unnecessary upward propagation. The extra cost of tracking the root cgroup is negligible: on stat changes, we actually remove a branch that checks for the root. The queueing for a flush touches only per-cpu data, and only the first stat change since a flush requires a (per-cpu) lock. Link: https://lkml.kernel.org/r/20210209163304.77088-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit dc26532aed0ab25c0801a34640d1f3b9b9098a48) (cherry picked from commit 69da183fcd0112af130879a1c93113a941e2241b) (cherry picked from commit ddf1013871482b246147e71a04c865c1be5cf74d) (cherry picked from commit 30fcd52e18dd1d508b1b22f7c660ac22de734f67) (cherry picked from commit 19c9a1b9d9ae9a4f359deaf89101f9013254f43d) (cherry picked from commit 0b4286aea9bb0a6ea6acb723f8396e476044190b)	2024-11-19 17:40:21 +01:00
Nahuel Gómez	a4d33f6631	block: ssg-iosched: adapt to new patches ../block/ssg-iosched.c:684:41: error: too few arguments to function call, expected 3, have 2 684 \| if (blk_mq_sched_try_insert_merge(q, rq)) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ../block/blk-mq-sched.h:15:6: note: 'blk_mq_sched_try_insert_merge' declared here 15 \| bool blk_mq_sched_try_insert_merge(struct request_queue q, struct request rq, \| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 16 \| struct list_head *free); \| ~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-11-19 17:40:09 +01:00
Nahuel Gómez	82eba12440	exynos-pm: fix build without CONFIG_SEC_PM_DEBUG We remove the checks to allow the function to be used anyway. ../drivers/soc/samsung/exynos-pm/exynos-pm.c:107:10: error: declaration of 'struct wakeup_stat_name' will not be visible outside of this function [-Werror,-Wvisibility] 107 \| struct wakeup_stat_name ws_names) \| ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:114:18: error: incomplete definition of type 'struct wakeup_stat_name' 114 \| name = ws_names->name[bit]; \| ~~~~~~~~^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:107:10: note: forward declaration of 'struct wakeup_stat_name' 107 \| struct wakeup_stat_name ws_names) \| ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:131:25: error: no member named 'ws_names' in 'struct exynos_pm_info' 131 \| if (unlikely(!pm_info->ws_names)) \| ~~~~~~~ ^ ../include/linux/compiler.h:78:42: note: expanded from macro 'unlikely' 78 \| # define unlikely(x) __builtin_expect(!!(x), 0) \| ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:143:51: error: no member named 'ws_names' in 'struct exynos_pm_info' 143 \| exynos_show_wakeup_reason_sysint(wss, &pm_info->ws_names[i]); \| ~~~~~~~ ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:465:11: error: no member named 'ws_names' in 'struct exynos_pm_info' 465 \| pm_info->ws_names = kzalloc(sizeof(pm_info->ws_names) n, GFP_KERNEL); \| ~~~~~~~ ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:465:47: error: no member named 'ws_names' in 'struct exynos_pm_info' 465 \| pm_info->ws_names = kzalloc(sizeof(pm_info->ws_names) n, GFP_KERNEL); \| ~~~~~~~ ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:466:16: error: no member named 'ws_names' in 'struct exynos_pm_info' 466 \| if (!pm_info->ws_names) \| ~~~~~~~ ^ ../drivers/soc/samsung/exynos-pm/exynos-pm.c:478:14: error: no member named 'ws_names' in 'struct exynos_pm_info' 478 \| pm_info->ws_names[idx].name, size); \| ~~~~~~~ ^ 8 errors generated. Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-11-19 17:39:21 +01:00
Nahuel Gómez	7059d8baa3	kernel: sysctl: add init protection to common mm-related nodes The protected nodes are: * dirty_ratio * dirty_background_ratio * dirty_bytes * dirty_background_bytes * dirty_expire_centisecs * dirty_writeback_centisecs * swappiness This approach is inspired by [1] and makes use of the node tampering blacklist. [1]: `239efdc263` Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-11-19 17:39:17 +01:00
Nahuel Gómez	bfb3710a7c	mm: new writeback and swappiness values from Ktweak Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2024-11-19 17:39:12 +01:00
Adam W. Willis	bcec04dde1	mm: apply init protection Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Change-Id: I1a1928fec9efeb29203a94644388c3ca48e7d96e [TogoFire]: adapt to k5.4. Signed-off-by: TogoFire <togofire@mailfence.com>	2024-11-19 17:39:06 +01:00
Uladzislau Rezki	d0dc26b405	workqueue: Make queue_rcu_work() use call_rcu_flush() Earlier commits in this series allow battery-powered systems to build their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option. This Kconfig option causes call_rcu() to delay its callbacks in order to batch them. This means that a given RCU grace period covers more callbacks, thus reducing the number of grace periods, in turn reducing the amount of energy consumed, which increases battery lifetime which can be a very good thing. This is not a subtle effect: In some important use cases, the battery lifetime is increased by more than 10%. This CONFIG_RCU_LAZY=y option is available only for CPUs that offload callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. Delaying callbacks is normally not a problem because most callbacks do nothing but free memory. If the system is short on memory, a shrinker will kick all currently queued lazy callbacks out of their laziness, thus freeing their memory in short order. Similarly, the rcu_barrier() function, which blocks until all currently queued callbacks are invoked, will also kick lazy callbacks, thus enabling rcu_barrier() to complete in a timely manner. However, there are some cases where laziness is not a good option. For example, synchronize_rcu() invokes call_rcu(), and blocks until the newly queued callback is invoked. It would not be a good for synchronize_rcu() to block for ten seconds, even on an idle system. Therefore, synchronize_rcu() invokes call_rcu_flush() instead of call_rcu(). The arrival of a non-lazy call_rcu_flush() callback on a given CPU kicks any lazy callbacks that might be already queued on that CPU. After all, if there is going to be a grace period, all callbacks might as well get full benefit from it. Yes, this could be done the other way around by creating a call_rcu_lazy(), but earlier experience with this approach and feedback at the 2022 Linux Plumbers Conference shifted the approach to call_rcu() being lazy with call_rcu_flush() for the few places where laziness is inappropriate. And another call_rcu() instance that cannot be lazy is the one in queue_rcu_work(), given that callers to queue_rcu_work() are not necessarily OK with long delays. Therefore, make queue_rcu_work() use call_rcu_flush() in order to revert to the old behavior. Signed-off-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2024-11-19 17:37:56 +01:00
Sultan Alsawaf	fa6b06bf46	sched/fair: Always update CPU capacity when load balancing Limiting CPU capacity updates, which are quite cheap, results in worse balancing decisions during opportunistic balancing (e.g., SD_BALANCE_WAKE). This causes opportunistic placement decisions to be skewed using stale CPU capacity data, and when a CPU isn't idling much, its capacity suffers from even more staleness since the only exception to the 100 ms capacity update ratelimit is a CPU exiting idle. Since the capacity updates are cheap, always do it when load balancing in order to improve opportunistic task placement decisions. Change-Id: If1d451ce742fd093010057e31e71012d47fad70a Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-11-19 17:34:49 +01:00
Joel Fernandes (Google)	323a4009a4	rcu: Avoid unnecessary softirq when system is idle When there are no callbacks pending on an idle system, I noticed that RCU softirq is continuously firing. During this the cpu_no_qs is set to false, and core_needs_qs is set to true indefinitely. This causes rcu_process_callbacks to be repeatedly called, even though the node corresponding to the CPU has that CPU's mask bit cleared and the system is idle. I believe the race is when such mask clearing is done during idle CPU scan of the quiescent state forcing stage in the kthread instead of the softirq. Since the rnp mask is cleared, but the flags on the CPU's rdp are not cleared, the CPU thinks it still needs to report to core RCU. Cure this by clearing the core_needs_qs flag when the CPU detects that its node is already updated which will avoid the unwanted softirq raises to the benefit of real-time systems. Test: Ran rcutorture for various tree RCU configs. Change-Id: Iee374d1dcdc74ecc5e6816a99be51feddd876931 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: mydongistiny <jaysonedson@gmail.com>	2024-11-19 17:34:20 +01:00
Tyler Nijmeh	35df832f54	mm: oom_kill: Do not dump tasks by default This takes RCU and tasks locks to simply print debugging information, which with certain log levels will not even display. Disable this by default. Change-Id: I952dba176f955239061acc7b178d88fceff8ecdf Signed-off-by: RyuujiX <saputradenny712@gmail.com> Signed-off-by: onettboots <blackcocopet@gmail.com>	2024-11-19 17:33:46 +01:00
Panchajanya1999	0ab2f838a5	tcp: Force the TCP no-delay option for everything Forcing TCP no-delay will disable Nagle's algorithm, which basically collects small outgoing packets to send all at once. Disabling this will lead to all the packets being sent at their respective times, leading to better latency. Read https://brooker.co.za/blog/2024/05/09/nagle.html for details. Signed-off-by: prathamdubey2005 <134331217+prathamdubey2005@users.noreply.github.com>	2024-11-19 17:33:40 +01:00
gustavoss	9a9f44e174	Optimized Console FrameBuffer for upto 70% increase in Performance Signed-off-by: Joe Maples <joe@frap129.org> Signed-off-by: John Vincent <git@tenseventyseven.cf>	2024-11-19 17:30:21 +01:00
Ksawlii	9b077df9ac	Revert "net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD()" This reverts commit `97f5298d5c`.	2024-11-19 14:52:14 +01:00
Greg Kroah-Hartman	b2c91c005b	Linux 5.10.223 Link: https://lore.kernel.org/r/20240725142733.262322603@linuxfoundation.org Tested-by: ChromeOS CQ Test <chromeos-kernel-stable-merge@google.com> Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: kernelci.org bot <bot@kernelci.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:53 +01:00
Si-Wei Liu	fadb305e68	tap: add missing verification for short frame commit ed7f2afdd0e043a397677e597ced0830b83ba0b3 upstream. The cited commit missed to check against the validity of the frame length in the tap_get_user_xdp() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tap_get_user_xdp()-->skb_set_network_header() may assume the size is more than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tap_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted. This is to drop any frame shorter than the Ethernet header size just like how tap_get_user() does. CVE: CVE-2024-41090 Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:53 +01:00
Dongli Zhang	4ab91c0fbc	tun: add missing verification for short frame commit 049584807f1d797fc3078b68035450a9769eb5c3 upstream. The cited commit missed to check against the validity of the frame length in the tun_xdp_one() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tun_xdp_one-->eth_type_trans() may access the Ethernet header although it can be less than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tun_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted for IFF_TAP. This is to drop any frame shorter than the Ethernet header size just like how tun_get_user() does. CVE: CVE-2024-41091 Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:53 +01:00
Jann Horn	dadacb32b4	filelock: Fix fcntl/close race recovery compat path commit f8138f2ad2f745b9a1c696a05b749eabe44337ea upstream. When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:53 +01:00
Shengjiu Wang	9493326651	ALSA: pcm_dmaengine: Don't synchronize DMA channel when DMA is paused commit 88e98af9f4b5b0d60c1fe7f7f2701b5467691e75 upstream. When suspended, the DMA channel may enter PAUSE state if dmaengine_pause() is supported by DMA. At this state, dmaengine_synchronize() should not be called, otherwise the DMA channel can't be resumed successfully. Fixes: e8343410ddf0 ("ALSA: dmaengine: Synchronize dma channel after drop()") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/1721198693-27636-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:52 +01:00
Seunghun Han	b6bbfcc48c	ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360 commit d7063c08738573fc2f3296da6d31a22fa8aa843a upstream. Samsung Galaxy Book Pro 360 (13" 2022 NT935QDB-KC71S) with codec SSID 144d:c1a4 requires the same workaround to enable the speaker amp as other Samsung models with the ALC298 codec. Signed-off-by: Seunghun Han <kkamagui@gmail.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20240718080908.8677-1-kkamagui@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Edson Juliano Drosdeck	45ee25ef66	ALSA: hda/realtek: Enable headset mic on Positivo SU C1400 commit 8fc1e8b230771442133d5cf5fa4313277aa2bb8b upstream. Positivo SU C1400 is equipped with ALC256, and it needs ALC269_FIXUP_ASPIRE_HEADSET_MIC quirk to make its headset mic work. Signed-off-by: Edson Juliano Drosdeck <edson.drosdeck@gmail.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20240712180642.22564-1-edson.drosdeck@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
lei lu	fac2aa0b41	jfs: don't walk off the end of ealist commit d0fa70aca54c8643248e89061da23752506ec0d4 upstream. Add a check before visiting the members of ea to make sure each ea stays within the ealist. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
lei lu	1494ceb608	ocfs2: add bounds checking to ocfs2_check_dir_entry() commit 255547c6bb8940a97eea94ef9d464ea5967763fb upstream. This adds sanity checks for ocfs2_dir_entry to make sure all members of ocfs2_dir_entry don't stray beyond valid memory region. Link: https://lkml.kernel.org/r/20240626104433.163270-1-llfamsec@gmail.com Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Paolo Abeni	c246f2bb3d	net: relax socket state check at accept time. commit 26afda78cda3da974fd4c287962c169e9462c495 upstream. Christoph reported the following splat: WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0 Modules linked in: CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759 Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80 RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293 RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64 R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000 R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800 FS: 000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786 do_accept+0x435/0x620 net/socket.c:1929 __sys_accept4_file net/socket.c:1969 [inline] __sys_accept4+0x9b/0x110 net/socket.c:1999 __do_sys_accept net/socket.c:2016 [inline] __se_sys_accept net/socket.c:2013 [inline] __x64_sys_accept+0x7d/0x90 net/socket.c:2013 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x4315f9 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300 R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055 </TASK> The reproducer invokes shutdown() before entering the listener status. After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets"), the above causes the child to reach the accept syscall in FIN_WAIT1 status. Eric noted we can relax the existing assertion in __inet_accept() Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490 Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Dan Carpenter	aeaf2ac6b3	drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq() commit 6769a23697f17f9bf9365ca8ed62fe37e361a05a upstream. The "instance" variable needs to be signed for the error handling to work. Fixes: 8b2faf1a4f3b ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou <bob.zhou@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Gabriel Krisman Bertazi	54d36cabde	ext4: fix error code saved on super block during file system abort commit 124e7c61deb27d758df5ec0521c36cf08d417f7a upstream. ext4_abort will eventually call ext4_errno_to_code, which translates the errno to an EXT4_ERR specific error. This means that ext4_abort expects an errno. By using EXT4_ERR_ here, it gets misinterpreted (as an errno), and ends up saving EXT4_ERR_EBUSY on the superblock during an abort, which makes no sense. ESHUTDOWN will get properly translated to EXT4_ERR_SHUTDOWN, so use that instead. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/20211026173302.84000-1-krisman@collabora.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Bart Van Assche	7be7fa0f35	scsi: core: Fix a use-after-free commit 8fe4ce5836e932f5766317cb651c1ff2a4cd0506 upstream. There are two .exit_cmd_priv implementations. Both implementations use resources associated with the SCSI host. Make sure that these resources are still available when .exit_cmd_priv is called by waiting inside scsi_remove_host() until the tag set has been freed. This commit fixes the following use-after-free: ================================================================== BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp] Read of size 8 at addr ffff888100337000 by task multipathd/16727 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db kasan_report+0xab/0x120 srp_exit_cmd_priv+0x27/0xd0 [ib_srp] scsi_mq_exit_request+0x4d/0x70 blk_mq_free_rqs+0x143/0x410 __blk_mq_free_map_and_rqs+0x6e/0x100 blk_mq_free_tag_set+0x2b/0x160 scsi_host_dev_release+0xf3/0x1a0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_device_dev_release_usercontext+0x4c1/0x4e0 execute_in_process_context+0x23/0x90 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_disk_release+0x3f/0x50 device_release+0x54/0xe0 kobject_put+0xa5/0x120 disk_release+0x17f/0x1b0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 dm_put_table_device+0xa3/0x160 [dm_mod] dm_put_device+0xd0/0x140 [dm_mod] free_priority_group+0xd8/0x110 [dm_multipath] free_multipath+0x94/0xe0 [dm_multipath] dm_table_destroy+0xa2/0x1e0 [dm_mod] __dm_destroy+0x196/0x350 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x2c2/0x590 [dm_mod] dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/r/20220826002635.919423-1-bvanassche@acm.org Fixes: 65ca846a5314 ("scsi: core: Introduce {init,exit}_cmd_priv()") Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Li Zhijian <lizhijian@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> [mheyne: fixed contextual conflicts: - drivers/scsi/hosts.c: due to missing commit 973dac8a8a14 ("scsi: core: Refine how we set tag_set NUMA node") - drivers/scsi/scsi_sysfs.c: due to missing commit 6f8191fdf41d ("block: simplify disk shutdown") - drivers/scsi/scsi_scan.c: due to missing commit 59506abe5e34 ("scsi: core: Inline scsi_mq_alloc_queue()")] Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Jason Xing	a4599f9d80	bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue commit 6648e613226e18897231ab5e42ffc29e63fa3365 upstream. Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which syzbot reported [1]. [1] BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1: sk_psock_stop_verdict net/core/skmsg.c:1257 [inline] sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843 sk_psock_put include/linux/skmsg.h:459 [inline] sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648 unix_release+0x4b/0x80 net/unix/af_unix.c:1048 __sock_release net/socket.c:659 [inline] sock_close+0x68/0x150 net/socket.c:1421 __fput+0x2c1/0x660 fs/file_table.c:422 __fput_sync+0x44/0x60 fs/file_table.c:507 __do_sys_close fs/open.c:1556 [inline] __se_sys_close+0x101/0x1b0 fs/open.c:1541 __x64_sys_close+0x1f/0x30 fs/open.c:1541 do_syscall_64+0xd3/0x1d0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0: sk_psock_data_ready include/linux/skmsg.h:464 [inline] sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555 sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606 sk_psock_verdict_apply net/core/skmsg.c:1008 [inline] sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202 unix_read_skb net/unix/af_unix.c:2546 [inline] unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682 sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223 unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x140/0x180 net/socket.c:745 ____sys_sendmsg+0x312/0x410 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x1e9/0x280 net/socket.c:2667 __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674 do_syscall_64+0xd3/0x1d0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 value changed: 0xffffffff83d7feb0 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G W 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer similarly due to no protection of saved_data_ready. Here is another different caller causing the same issue because of the same reason. So we should protect it with sk_callback_lock read lock because the writer side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);". To avoid errors that could happen in future, I move those two pairs of lock into the sk_psock_data_ready(), which is suggested by John Fastabend. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com Signed-off-by: Jason Xing <kernelxing@tencent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d Link: https://lore.kernel.org/all/20240329134037.92124-1-kerneljasonxing@gmail.com Link: https://lore.kernel.org/bpf/20240404021001.94815-1-kerneljasonxing@gmail.com Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Daniel Borkmann	33c88d138d	bpf: Fix overrunning reservations in ringbuf commit cfa1a2329a691ffd991fcf7248a57d752e712881 upstream. The BPF ring buffer internally is implemented as a power-of-2 sized circular buffer, with two logical and ever-increasing counters: consumer_pos is the consumer counter to show which logical position the consumer consumed the data, and producer_pos which is the producer counter denoting the amount of data reserved by all producers. Each time a record is reserved, the producer that "owns" the record will successfully advance producer counter. In user space each time a record is read, the consumer of the data advanced the consumer counter once it finished processing. Both counters are stored in separate pages so that from user space, the producer counter is read-only and the consumer counter is read-write. One aspect that simplifies and thus speeds up the implementation of both producers and consumers is how the data area is mapped twice contiguously back-to-back in the virtual memory, allowing to not take any special measures for samples that have to wrap around at the end of the circular buffer data area, because the next page after the last data page would be first data page again, and thus the sample will still appear completely contiguous in virtual memory. Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for book-keeping the length and offset, and is inaccessible to the BPF program. Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ` for the BPF program to use. Bing-Jhong and Muhammad reported that it is however possible to make a second allocated memory chunk overlapping with the first chunk and as a result, the BPF program is now able to edit first chunk's header. For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in [0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets allocate a chunk B with size 0x3000. This will succeed because consumer_pos was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask` check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data pages. This means that chunk B at [0x4000,0x4008] is chunk A's header. bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong page and could cause a crash. Fix it by calculating the oldest pending_pos and check whether the range from the oldest outstanding record to the newest would span beyond the ring buffer size. If that is the case, then reject the request. We've tested with the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh) before/after the fix and while it seems a bit slower on some benchmarks, it is still not significantly enough to matter. Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Co-developed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Co-developed-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Kuan-Wei Chiu	38ee79d89a	ACPI: processor_idle: Fix invalid comparison with insertion sort for latency commit 233323f9b9f828cd7cd5145ad811c1990b692542 upstream. The acpi_cst_latency_cmp() comparison function currently used for sorting C-state latencies does not satisfy transitivity, causing incorrect sorting results. Specifically, if there are two valid acpi_processor_cx elements A and B and one invalid element C, it may occur that A < B, A = C, and B = C. Sorting algorithms assume that if A < B and A = C, then C < B, leading to incorrect ordering. Given the small size of the array (<=8), we replace the library sort function with a simple insertion sort that properly ignores invalid elements and sorts valid ones based on latency. This change ensures correct ordering of the C-state latencies. Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered") Reported-by: Julian Sikorski <belegdol@gmail.com> Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:50 +01:00
Masahiro Yamada	009969cd50	ARM: 9324/1: fix get_user() broken with veneer commit 24d3ba0a7b44c1617c27f5045eecc4f34752ab03 upstream. The 32-bit ARM kernel stops working if the kernel grows to the point where veneers for __get_user_* are created. AAPCS32 [1] states, "Register r12 (IP) may be used by a linker as a scratch register between a routine and any subroutine it calls. It can also be used within a routine to hold intermediate values between subroutine calls." However, bl instructions buried within the inline asm are unpredictable for compilers; hence, "ip" must be added to the clobber list. This becomes critical when veneers for __get_user_* are created because veneers use the ip register since commit 02e541db0540 ("ARM: 8323/1: force linker to use PIC veneers"). [1]: https://github.com/ARM-software/abi-aa/blob/2023Q1/aapcs32/aapcs32.rst Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: John Stultz <jstultz@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:50 +01:00
David Lechner	395b1f694e	spi: mux: set ctlr->bits_per_word_mask [ Upstream commit c8bd922d924bb4ab6c6c488310157d1a27996f31 ] Like other SPI controller flags, bits_per_word_mask may be used by a peripheral driver, so it needs to reflect the capabilities of the underlying controller. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20240708-spi-mux-fix-v1-3-6c8845193128@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:50 +01:00

... 2 3 4 5 6 ...

4502 commits