kernel_samsung_a53x

Author	SHA1	Message	Date
Jan Kara	fb8f56a2c4	udf: Limit file size to 4TB commit c2efd13a2ed4f29bf9ef14ac2fbb7474084655f8 upstream. UDF disk format supports in principle file sizes up to 1<<64-1. However the file space (including holes) is described by a linked list of extents, each of which can have at most 1GB. Thus the creation and handling of extents gets unusably slow beyond certain point. Limit the file size to 4TB to avoid locking up the kernel too easily. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:51:26 +01:00
Jan Kara	58889da59d	ext4: handle redirtying in ext4_bio_write_page() commit 04e568a3b31cfbd545c04c8bfc35c20e5ccfce0f upstream. Since we want to transition transaction commits to use ext4_writepages() for writing back ordered, add handling of page redirtying into ext4_bio_write_page(). Also move buffer dirty bit clearing into the same place other buffer state handling. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:51:19 +01:00
Chuck Lever	263a00f371	NFSD: Refactor the duplicate reply cache shrinker [ Upstream commit c135e1269f34dfdea4bd94c11060c83a3c0b3c12 ] Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to reduce the shrinker's impact on the cache's effectiveness. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:49:36 +01:00
Chuck Lever	03eec713f3	NFSD: Replace nfsd_prune_bucket() [ Upstream commit a9507f6af1450ed26a4a36d979af518f5bb21e5d ] Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache. A few percpu operations are moved outside the lock since they temporarily disable local IRQs which is expensive and does not need to be done while the lock is held. Reviewed-by: Jeff Layton <jlayton@kernel.org> Stable-dep-of: c135e1269f34 ("NFSD: Refactor the duplicate reply cache shrinker") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:49:06 +01:00
Chuck Lever	620a4a5357	NFSD: Rename nfsd_reply_cache_alloc() [ Upstream commit ff0d169329768c1102b7b07eebe5a9839aa1c143 ] For readability, rename to match the other helpers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:49:01 +01:00
Chuck Lever	404f9c456d	NFSD: Refactor nfsd_reply_cache_free_locked() [ Upstream commit 35308e7f0fc3942edc87d9c6dc78c4a096428957 ] To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held. Start by refactoring nfsd_reply_cache_free_locked() into a helper that removes an entry from the bucket (and must therefore run under the lock) and a second helper that frees the entry (which does not need to hold the lock). For readability, rename the helpers nfsd_cacherep_<verb>. Reviewed-by: Jeff Layton <jlayton@kernel.org> Stable-dep-of: a9507f6af145 ("NFSD: Replace nfsd_prune_bucket()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:57 +01:00
NeilBrown	b5bcbb0d04	NFSD: simplify error paths in nfsd_svc() [ Upstream commit bf32075256e9dd9c6b736859e2c5813981339908 ] The error paths in nfsd_svc() are needlessly complex and can result in a final call to svc_put() without nfsd_last_thread() being called. This results in the listening sockets not being closed properly. The per-netns setup provided by nfsd_startup_new() and removed by nfsd_shutdown_net() is needed precisely when there are running threads. So we don't need nfsd_up_before. We don't need to know if it was up. We only need to know if any threads are left. If none are, then we must call nfsd_shutdown_net(). But we don't need to do that explicitly as nfsd_last_thread() does that for us. So simply call nfsd_last_thread() before the last svc_put() if there are no running threads. That will always do the right thing. Also discard: pr_info("nfsd: last server has exited, flushing export cache\n"); It may not be true if an attempt to start the first server failed, and it isn't particularly helpful and it simply reports normal behaviour. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:46 +01:00
Chuck Lever	bf0b09ddc3	NFSD: Rewrite synopsis of nfsd_percpu_counters_init() [ Upstream commit 5ec39944f874e1ecc09f624a70dfaa8ac3bf9d08 ] In function ‘export_stats_init’, inlined from ‘svc_export_alloc’ at fs/nfsd/export.c:866:6: fs/nfsd/export.c:337:16: warning: ‘nfsd_percpu_counters_init’ accessing 40 bytes in a region of size 0 [-Wstringop-overflow=] 337 \| return nfsd_percpu_counters_init(&stats->counter, EXP_STATS_COUNTERS_NUM); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/nfsd/export.c:337:16: note: referencing argument 1 of type ‘struct percpu_counter[0]’ fs/nfsd/stats.h: In function ‘svc_export_alloc’: fs/nfsd/stats.h:40:5: note: in a call to function ‘nfsd_percpu_counters_init’ 40 \| int nfsd_percpu_counters_init(struct percpu_counter counters[], int num); \| ^~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Stable-dep-of: 93483ac5fec6 ("nfsd: expose /proc/net/sunrpc/nfsd in net namespaces") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:42 +01:00
Chuck Lever	138f8b605e	NFSD: Fix frame size warning in svc_export_parse() [ Upstream commit 6939ace1f22681fface7841cdbf34d3204cc94b5 ] fs/nfsd/export.c: In function 'svc_export_parse': fs/nfsd/export.c:737:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] 737 \| } On my systems, svc_export_parse() has a stack frame of over 800 bytes, not 1040, but nonetheless, it could do with some reduction. When a struct svc_export is on the stack, it's a temporary structure used as an argument, and not visible as an actual exported FS. No need to reserve space for export_stats in such cases. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310012359.YEw5IrK6-lkp@intel.com/ Cc: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Stable-dep-of: 4b14885411f7 ("nfsd: make all of the nfsd stats per-network namespace") [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:42 +01:00
Josef Bacik	68ec312d1f	nfsd: stop setting ->pg_stats for unused stats [ Upstream commit a2214ed588fb3c5b9824a21cff870482510372bb ] A lot of places are setting a blank svc_stats in ->pg_stats and never utilizing these stats. Remove all of these extra structs as we're not reporting these stats anywhere. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:42 +01:00
Josef Bacik	f285f5881d	sunrpc: pass in the sv_stats struct through svc_create_pooled [ Upstream commit f094323867668d50124886ad884b665de7319537 ] Since only one service actually reports the rpc stats there's not much of a reason to have a pointer to it in the svc_program struct. Adjust the svc_create_pooled function to take the sv_stats as an argument and pass the struct through there as desired instead of getting it from the svc_program->pg_stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:42 +01:00
Josef Bacik	bd5e34855d	sunrpc: remove ->pg_stats from svc_program [ Upstream commit 3f6ef182f144dcc9a4d942f97b6a8ed969f13c95 ] Now that this isn't used anywhere, remove it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:42 +01:00
Josef Bacik	4cd8ae840e	nfsd: rename NFSD_NET_* to NFSD_STATS_* [ Upstream commit d98416cc2154053950610bb6880911e3dcbdf8c5 ] We're going to merge the stats all into per network namespace in subsequent patches, rename these nn counters to be consistent with the rest of the stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-19 14:48:42 +01:00
Josef Bacik	c287d9a8f8	btrfs: replace BUG_ON with ASSERT in walk_down_proc() [ Upstream commit 1f9d44c0a12730a24f8bb75c5e1102207413cc9b ] We have a couple of areas where we check to make sure the tree block is locked before looking up or messing with references. This is old code so it has this as BUG_ON(). Convert this to ASSERT() for developers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:10:00 +01:00
Josef Bacik	1648fc0d4d	btrfs: clean up our handling of refs == 0 in snapshot delete [ Upstream commit b8ccef048354074a548f108e51d0557d6adfd3a3 ] In reada we BUG_ON(refs == 0), which could be unkind since we aren't holding a lock on the extent leaf and thus could get a transient incorrect answer. In walk_down_proc we also BUG_ON(refs == 0), which could happen if we have extent tree corruption. Change that to return -EUCLEAN. In do_walk_down() we catch this case and handle it correctly, however we return -EIO, which -EUCLEAN is a more appropriate error code. Finally in walk_up_proc we have the same BUG_ON(refs == 0), so convert that to proper error handling. Also adjust the error message so we can actually do something with the information. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:10:00 +01:00
David Sterba	272d17c198	btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry() [ Upstream commit b8e947e9f64cac9df85a07672b658df5b2bcff07 ] Some arch + compiler combinations report a potentially unused variable location in btrfs_lookup_dentry(). This is a false alert as the variable is passed by value and always valid or there's an error. The compilers cannot probably reason about that although btrfs_inode_by_name() is in the same file. > + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used +uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9 > + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used +uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5 m68k-gcc8/m68k-allmodconfig mips-gcc8/mips-allmodconfig powerpc-gcc5/powerpc-all{mod,yes}config powerpc-gcc5/ppc64_defconfig Initialize it to zero, this should fix the warnings and won't change the behaviour as btrfs_inode_by_name() accepts only a root or inode item types, otherwise returns an error. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/linux-btrfs/bd4e9928-17b3-9257-8ba7-6b7f9bbb639a@linux-m68k.org/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:10:00 +01:00
Phillip Lougher	93ab48b465	Squashfs: sanity check symbolic link size [ Upstream commit 810ee43d9cd245d138a2733d87a24858a23f577d ] Syzkiller reports a "KMSAN: uninit-value in pick_link" bug. This is caused by an uninitialised page, which is ultimately caused by a corrupted symbolic link size read from disk. The reason why the corrupted symlink size causes an uninitialised page is due to the following sequence of events: 1. squashfs_read_inode() is called to read the symbolic link from disk. This assigns the corrupted value 3875536935 to inode->i_size. 2. Later squashfs_symlink_read_folio() is called, which assigns this corrupted value to the length variable, which being a signed int, overflows producing a negative number. 3. The following loop that fills in the page contents checks that the copied bytes is less than length, which being negative means the loop is skipped, producing an uninitialised page. This patch adds a sanity check which checks that the symbolic link size is not larger than expected. -- Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Link: https://lore.kernel.org/r/20240811232821.13903-1-phillip@squashfs.org.uk Reported-by: Lizhi Xu <lizhi.xu@windriver.com> Reported-by: syzbot+24ac24ff58dc5b0d26b9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/ V2: fix spelling mistake. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:09:59 +01:00
Trond Myklebust	4a80fcf1fa	NFSv4: Add missing rescheduling points in nfs_client_return_marked_delegations [ Upstream commit a017ad1313fc91bdf235097fd0a02f673fc7bb11 ] We're seeing reports of soft lockups when iterating through the loops, so let's add rescheduling points. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:09:59 +01:00
Ryusuke Konishi	937418aa91	nilfs2: protect references to superblock parameters exposed in sysfs [ Upstream commit 683408258917541bdb294cd717c210a04381931e ] The superblock buffers of nilfs2 can not only be overwritten at runtime for modifications/repairs, but they are also regularly swapped, replaced during resizing, and even abandoned when degrading to one side due to backing device issues. So, accessing them requires mutual exclusion using the reader/writer semaphore "nilfs->ns_sem". Some sysfs attribute show methods read this superblock buffer without the necessary mutual exclusion, which can cause problems with pointer dereferencing and memory access, so fix it. Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:09:58 +01:00
Qing Wang	5a064488ff	nilfs2: replace snprintf in show functions with sysfs_emit [ Upstream commit 3bcd6c5bd483287f4a09d3d59a012d47677b6edc ] Patch series "nilfs2 updates". This patch (of 2): coccicheck complains about the use of snprintf() in sysfs show functions. Fix the coccicheck warning: WARNING: use scnprintf or sprintf. Use sysfs_emit instead of scnprintf or sprintf makes more sense. Link: https://lkml.kernel.org/r/1635151862-11547-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/1634095759-4625-1-git-send-email-wangqing@vivo.com Link: https://lkml.kernel.org/r/1635151862-11547-2-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Qing Wang <wangqing@vivo.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 683408258917 ("nilfs2: protect references to superblock parameters exposed in sysfs") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-19 00:09:58 +01:00
Gabriel Krisman Bertazi	161cbfabfa	UPSTREAM: unicode: Don't special case ignorable code points We don't need to handle them separately. Instead, just let them decompose/casefold to themselves. Change-Id: I01c3f2c98ae4d84269586cec09f18239cbee0abb Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> (cherry picked from commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2025-01-19 00:09:58 +01:00
Ksawlii	1dd5acfeba	Import susfs4ksu	2025-01-18 21:48:58 +01:00
Arjan van de Ven	54bab59901	fs: ext4: fsync: optimize double-fsync() a bunch There are cases where EXT4 is a bit too conservative sending barriers down to the disk; there are cases where the transaction in progress is not the one that sent the barrier (in other words: the fsync is for a file for which the IO happened more time ago and all data was already sent to the disk). For that case, a more performing tradeoff can be made on SSD devices (which have the ability to flush their dram caches in a hurry on a power fail event) where the barrier gets sent to the disk, but we don't need to wait for the barrier to complete. Any consecutive IO will block on the barrier correctly. Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> (cherry picked from commit 74aa09a7751e438bd15b5cd73f611021b7239240) (cherry picked from commit fa3bdf1a32cac074ff52403cb9ce18eb18c7f7d1)	2025-01-16 22:01:03 +01:00
Ksawlii	46676a78ff	Revert "epoll: Add synchronous wakeup support for ep_poll_callback" This reverts commit b718903719ca67c98f72863873dbedef8900ec2a.	2025-01-15 16:38:29 +01:00
Zygo Blaxell	03b917a443	btrfs: don't set lock_owner when locking extent buffer for reading [ Upstream commit 97e86631bccddfbbe0c13f9a9605cdef11d31296 ] In 196d59ab9ccc "btrfs: switch extent buffer tree lock to rw_semaphore" the functions for tree read locking were rewritten, and in the process the read lock functions started setting eb->lock_owner = current->pid. Previously lock_owner was only set in tree write lock functions. Read locks are shared, so they don't have exclusive ownership of the underlying object, so setting lock_owner to any single value for a read lock makes no sense. It's mostly harmless because write locks and read locks are mutually exclusive, and none of the existing code in btrfs (btrfs_init_new_buffer and print_eb_refs_lock) cares what nonsense is written in lock_owner when no writer is holding the lock. KCSAN does care, and will complain about the data race incessantly. Remove the assignments in the read lock functions because they're useless noise. Fixes: 196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:56 +01:00
Josef Bacik	0c2a61d442	btrfs: locking: remove the recursion handling code [ Upstream commit 4048daedb910f83f080c6bb03c78af794aebdff5 ] Now that we're no longer using recursion, rip out all of the supporting code. Follow up patches will clean up the callers of these functions. The extent_buffer::lock_owner is still retained as it allows safety checks in btrfs_init_new_buffer for the case that the free space cache is corrupted and we try to allocate a block that we are currently using and have locked in the path. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 97e86631bccd ("btrfs: don't set lock_owner when locking extent buffer for reading") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:56 +01:00
Filipe Manana	3c072fc2c3	btrfs: fix use-after-free when COWing tree bock and tracing is enabled [ Upstream commit 44f52bbe96dfdbe4aca3818a2534520082a07040 ] When a COWing a tree block, at btrfs_cow_block(), and we have the tracepoint trace_btrfs_cow_block() enabled and preemption is also enabled (CONFIG_PREEMPT=y), we can trigger a use-after-free in the COWed extent buffer while inside the tracepoint code. This is because in some paths that call btrfs_cow_block(), such as btrfs_search_slot(), we are holding the last reference on the extent buffer @buf so btrfs_force_cow_block() drops the last reference on the @buf extent buffer when it calls free_extent_buffer_stale(buf), which schedules the release of the extent buffer with RCU. This means that if we are on a kernel with preemption, the current task may be preempted before calling trace_btrfs_cow_block() and the extent buffer already released by the time trace_btrfs_cow_block() is called, resulting in a use-after-free. Fix this by moving the trace_btrfs_cow_block() from btrfs_cow_block() to btrfs_force_cow_block() before the COWed extent buffer is freed. This also has a side effect of invoking the tracepoint in the tree defrag code, at defrag.c:btrfs_realloc_node(), since btrfs_force_cow_block() is called there, but this is fine and it was actually missing there. Reported-by: syzbot+8517da8635307182c8a5@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/6759a9b9.050a0220.1ac542.000d.GAE@google.com/ CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:54 +01:00
Filipe Manana	480a44135a	btrfs: rename and export __btrfs_cow_block() [ Upstream commit 95f93bc4cbcac6121a5ee85cd5019ee8e7447e0b ] Rename and export __btrfs_cow_block() as btrfs_force_cow_block(). This is to allow to move defrag specific code out of ctree.c and into defrag.c in one of the next patches. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 44f52bbe96df ("btrfs: fix use-after-free when COWing tree bock and tracing is enabled") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:54 +01:00
Josef Bacik	ed3940efb8	btrfs: locking: remove all the blocking helpers [ Upstream commit ac5887c8e013d6754d36e6d51dc03448ee0b0065 ] Now that we're using a rw_semaphore we no longer need to indicate if a lock is blocking or not, nor do we need to flip the entire path from blocking to spinning. Remove these helpers and all the places they are called. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 44f52bbe96df ("btrfs: fix use-after-free when COWing tree bock and tracing is enabled") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:54 +01:00
Josef Bacik	68c6a28426	btrfs: switch extent buffer tree lock to rw_semaphore [ Upstream commit 196d59ab9ccc975d8d29292845d227cdf4423ef8 ] Historically we've implemented our own locking because we wanted to be able to selectively spin or sleep based on what we were doing in the tree. For instance, if all of our nodes were in cache then there's rarely a reason to need to sleep waiting for node locks, as they'll likely become available soon. At the time this code was written the rw_semaphore didn't do adaptive spinning, and thus was orders of magnitude slower than our home grown locking. However now the opposite is the case. There are a few problems with how we implement blocking locks, namely that we use a normal waitqueue and simply wake everybody up in reverse sleep order. This leads to some suboptimal performance behavior, and a lot of context switches in highly contended cases. The rw_semaphores actually do this properly, and also have adaptive spinning that works relatively well. The locking code is also a bit of a bear to understand, and we lose the benefit of lockdep for the most part because the blocking states of the lock are simply ad-hoc and not mapped into lockdep. So rework the locking code to drop all of this custom locking stuff, and simply use a rw_semaphore for everything. This makes the locking much simpler for everything, as we can now drop a lot of cruft and blocking transitions. The performance numbers vary depending on the workload, because generally speaking there doesn't tend to be a lot of contention on the btree. However, on my test system which is an 80 core single socket system with 256GiB of RAM and a 2TiB NVMe drive I get the following results (with all debug options off): dbench 200 baseline Throughput 216.056 MB/sec 200 clients 200 procs max_latency=1471.197 ms dbench 200 with patch Throughput 737.188 MB/sec 200 clients 200 procs max_latency=714.346 ms Previously we also used fs_mark to test this sort of contention, and those results are far less impressive, mostly because there's not enough tasks to really stress the locking fs_mark -d /d[0-15] -S 0 -L 20 -n 100000 -s 0 -t 16 baseline Average Files/sec: 160166.7 p50 Files/sec: 165832 p90 Files/sec: 123886 p99 Files/sec: 123495 real 3m26.527s user 2m19.223s sys 48m21.856s patched Average Files/sec: 164135.7 p50 Files/sec: 171095 p90 Files/sec: 122889 p99 Files/sec: 113819 real 3m29.660s user 2m19.990s sys 44m12.259s Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 44f52bbe96df ("btrfs: fix use-after-free when COWing tree bock and tracing is enabled") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:54 +01:00
Yang Erkun	d4913fa414	nfsd: cancel nfsd_shrinker_work using sync mode in nfs4_state_shutdown_net commit d5ff2fb2e7167e9483846e34148e60c0c016a1f6 upstream. In the normal case, when we excute `echo 0 > /proc/fs/nfsd/threads`, the function `nfs4_state_destroy_net` in `nfs4_state_shutdown_net` will release all resources related to the hashed `nfs4_client`. If the `nfsd_client_shrinker` is running concurrently, the `expire_client` function will first unhash this client and then destroy it. This can lead to the following warning. Additionally, numerous use-after-free errors may occur as well. nfsd_client_shrinker echo 0 > /proc/fs/nfsd/threads expire_client nfsd_shutdown_net unhash_client ... nfs4_state_shutdown_net /* won't wait shrinker exit / / cancel_work(&nn->nfsd_shrinker_work) * nfsd_file for this /* won't destroy unhashed client1 / client1 still alive nfs4_state_destroy_net / nfsd_file_cache_shutdown / trigger warning / kmem_cache_destroy(nfsd_file_slab) kmem_cache_destroy(nfsd_file_mark_slab) / release nfsd_file and mark */ __destroy_client ==================================================================== BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on __kmem_cache_shutdown() -------------------------------------------------------------------- CPU: 4 UID: 0 PID: 764 Comm: sh Not tainted 6.12.0-rc3+ #1 dump_stack_lvl+0x53/0x70 slab_err+0xb0/0xf0 __kmem_cache_shutdown+0x15c/0x310 kmem_cache_destroy+0x66/0x160 nfsd_file_cache_shutdown+0xac/0x210 [nfsd] nfsd_destroy_serv+0x251/0x2a0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e ==================================================================== BUG nfsd_file_mark (Tainted: G B W ): Objects remaining nfsd_file_mark on __kmem_cache_shutdown() -------------------------------------------------------------------- dump_stack_lvl+0x53/0x70 slab_err+0xb0/0xf0 __kmem_cache_shutdown+0x15c/0x310 kmem_cache_destroy+0x66/0x160 nfsd_file_cache_shutdown+0xc8/0x210 [nfsd] nfsd_destroy_serv+0x251/0x2a0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e To resolve this issue, cancel `nfsd_shrinker_work` using synchronous mode in nfs4_state_shutdown_net. Fixes: 7c24fa225081 ("NFSD: replace delayed_work with work_struct for nfsd_client_shrinker") Signed-off-by: Yang Erkun <yangerkun@huaweicloud.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:52 +01:00
Filipe Manana	92f06e879a	btrfs: avoid monopolizing a core when activating a swap file commit 2c8507c63f5498d4ee4af404a8e44ceae4345056 upstream. During swap activation we iterate over the extents of a file and we can have many thousands of them, so we can end up in a busy loop monopolizing a core. Avoid this by doing a voluntary reschedule after processing each extent. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:51 +01:00
NeilBrown	5554be6263	nfsd: restore callback functionality for NFSv4.0 [ Upstream commit 7917f01a286ce01e9c085e24468421f596ee1a0c ] A recent patch inadvertently broke callbacks for NFSv4.0. In the 4.0 case we do not expect a session to be found but still need to call setup_callback_client() which will not try to dereference it. This patch moves the check for failure to find a session into the 4.1+ branch of setup_callback_client() Fixes: 1e02c641c3a4 ("NFSD: Prevent NULL dereference in nfsd4_process_cb_update()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:49 +01:00
Xuewen Yan	126a6a15df	epoll: Add synchronous wakeup support for ep_poll_callback commit 900bbaae67e980945dec74d36f8afe0de7556d5a upstream. Now, the epoll only use wake_up() interface to wake up task. However, sometimes, there are epoll users which want to use the synchronous wakeup flag to hint the scheduler, such as Android binder driver. So add a wake_up_sync() define, and use the wake_up_sync() when the sync is true in ep_poll_callback(). Co-developed-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Link: https://lore.kernel.org/r/20240426080548.8203-1-xuewen.yan@unisoc.com Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Brian Geffon <bgeffon@google.com> Reported-by: Benoit Lize <lizeb@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:47 +01:00
Ilya Dryomov	f698a9998c	ceph: validate snapdirname option length when mounting commit 12eb22a5a609421b380c3c6ca887474fb2089b2c upstream. It becomes a path component, so it shouldn't exceed NAME_MAX characters. This was hardened in commit c152737be22b ("ceph: Use strscpy() instead of strcpy() in __get_snap_name()"), but no actual check was put in place. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:47 +01:00
Trond Myklebust	40ec080a9e	NFS/pnfs: Fix a live lock between recalled layouts and layoutget commit 62e2a47ceab8f3f7d2e3f0e03fdd1c5e0059fd8b upstream. When the server is recalling a layout, we should ignore the count of outstanding layoutget calls, since the server is expected to return either NFS4ERR_RECALLCONFLICT or NFS4ERR_RETURNCONFLICT for as long as the recall is outstanding. Currently, we may end up livelocking, causing the layout to eventually be forcibly revoked. Fixes: bf0291dd2267 ("pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:46 +01:00
Qu Wenruo	8b2aa52cea	btrfs: tree-checker: reject inline extent items with 0 ref count commit dfb92681a19e1d5172420baa242806414b3eff6f upstream. [BUG] There is a bug report in the mailing list where btrfs_run_delayed_refs() failed to drop the ref count for logical 25870311358464 num_bytes 2113536. The involved leaf dump looks like this: item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50 extent refs 1 gen 84178 flags 1 ref#0: shared data backref parent 32399126528000 count 0 <<< ref#1: shared data backref parent 31808973717504 count 1 Notice the count number is 0. [CAUSE] There is no concrete evidence yet, but considering 0 -> 1 is also a single bit flipped, it's possible that hardware memory bitflip is involved, causing the on-disk extent tree to be corrupted. [FIX] To prevent us reading such corrupted extent item, or writing such damaged extent item back to disk, enhance the handling of BTRFS_EXTENT_DATA_REF_KEY and BTRFS_SHARED_DATA_REF_KEY keys for both inlined and key items, to detect such 0 ref count and reject them. CC: stable@vger.kernel.org # 5.4+ Link: https://lore.kernel.org/linux-btrfs/7c69dd49-c346-4806-86e7-e6f863a66f48@app.fastmail.com/ Reported-by: Frankie Fisher <frankie@terrorise.me.uk> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:45 +01:00
James Bottomley	923e817ed4	efivarfs: Fix error on non-existent file commit 2ab0837cb91b7de507daa145d17b3b6b2efb3abf upstream. When looking up a non-existent file, efivarfs returns -EINVAL if the file does not conform to the NAME-GUID format and -ENOENT if it does. This is caused by efivars_d_hash() returning -EINVAL if the name is not formatted correctly. This error is returned before simple_lookup() returns a negative dentry, and is the error value that the user sees. Fix by removing this check. If the file does not exist, simple_lookup() will return a negative dentry leading to -ENOENT and efivarfs_create() already has a validity check before it creates an entry (and will correctly return -EINVAL) Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: <stable@vger.kernel.org> [ardb: make efivarfs_valid_name() static] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-15 16:29:41 +01:00
Gao Xiang	08a4c231f1	erofs: fix incorrect symlink detection in fast symlink commit 9ed50b8231e37b1ae863f5dec8153b98d9f389b4 upstream. Fast symlink can be used if the on-disk symlink data is stored in the same block as the on-disk inode, so we don’t need to trigger another I/O for symlink data. However, currently fs correction could be reported _incorrectly_ if inode xattrs are too large. In fact, these should be valid images although they cannot be handled as fast symlinks. Many thanks to Colin for reporting this! Reported-by: Colin Walters <walters@verbum.org> Reported-by: https://honggfuzz.dev/ Link: https://lore.kernel.org/r/bb2dd430-7de0-47da-ae5b-82ab2dd4d945@app.fastmail.com Fixes: 431339ba9042 ("staging: erofs: add inode operations") [ Note that it's a runtime misbehavior instead of a security issue. ] Link: https://lore.kernel.org/r/20240909031911.1174718-1-hsiangkao@linux.alibaba.com [ Gao Xiang: fix 5.10.y build warning due to `check_add_overflow`. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:40 +01:00
Gao Xiang	9fc2be9da8	erofs: fix order >= MAX_ORDER warning due to crafted negative i_size commit 1dd73601a1cba37a0ed5f89a8662c90191df5873 upstream. As syzbot reported [1], the root cause is that i_size field is a signed type, and negative i_size is also less than EROFS_BLKSIZ. As a consequence, it's handled as fast symlink unexpectedly. Let's fall back to the generic path to deal with such unusual i_size. [1] https://lore.kernel.org/r/000000000000ac8efa05e7feaa1f@google.com Reported-by: syzbot+f966c13b1b4fc0403b19@syzkaller.appspotmail.com Fixes: 431339ba9042 ("staging: erofs: add inode operations") Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20220909023948.28925-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-15 16:29:40 +01:00
Ksawlii	5ab3b533e0	Revert "fs: import susfs 1.5.3" This reverts commit `3d8ac9044d`.	2025-01-06 22:38:26 +01:00
Ksawlii	9be2607d55	Revert "treewide: patch for susfs 1.5.3" This reverts commit `c7b62da18c`.	2025-01-06 22:38:18 +01:00
Nahuel Gómez	3d8ac9044d	fs: import susfs 1.5.3 * Revision: ec6d3ae385059409a58d67bc2a10d762e740efcb Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2025-01-04 19:39:03 +01:00
Nahuel Gómez	c7b62da18c	treewide: patch for susfs 1.5.3 * Patch: 50_add_susfs_in_gki-android12-5.10.patch * Revision: ec6d3ae385059409a58d67bc2a10d762e740efcb Signed-off-by: Nahuel Gómez <nahuelgomez329@gmail.com>	2025-01-04 19:38:56 +01:00
Sungjong Seo	cec3c7c380	exfat: fix potential deadlock on __exfat_get_dentry_set commit 89fc548767a2155231128cb98726d6d2ea1256c9 upstream. When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array is allocated in __exfat_get_entry_set. The problem is that the bh-array is allocated with GFP_KERNEL. It does not make sense. In the following cases, a deadlock for sbi->s_lock between the two processes may occur. CPU0 CPU1 ---- ---- kswapd balance_pgdat lock(fs_reclaim) exfat_iterate lock(&sbi->s_lock) exfat_readdir exfat_get_uniname_from_ext_entry exfat_get_dentry_set __exfat_get_dentry_set kmalloc_array ... lock(fs_reclaim) ... evict exfat_evict_inode lock(&sbi->s_lock) To fix this, let's allocate bh-array with GFP_NOFS. Fixes: a3ff29a95fde ("exfat: support dynamic allocate bh for exfat_entry_set_cache") Cc: stable@vger.kernel.org # v6.2+ Reported-by: syzbot+412a392a2cd4a65e71db@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000fef47e0618c0327f@google.com Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> [Sherry: The problematic commit was backported to 5.15.y and 5.10.y, thus backport this fix] Signed-off-by: Sherry Yang <sherry.yang@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-02 17:00:49 +01:00
Darrick J. Wong	571b0174d3	xfs: fix scrub tracepoints when inode-rooted btrees are involved commit ffc3ea4f3c1cc83a86b7497b0c4b0aee7de5480d upstream. Fix a minor mistakes in the scrub tracepoints that can manifest when inode-rooted btrees are enabled. The existing code worked fine for bmap btrees, but we should tighten the code up to be less sloppy. Cc: <stable@vger.kernel.org> # v5.7 Fixes: 92219c292af8dd ("xfs: convert btree cursor inode-private member names") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-02 17:00:49 +01:00
Darrick J. Wong	0c0c4c5cd7	xfs: don't drop errno values when we fail to ficlone the entire range commit 7ce31f20a0771d71779c3b0ec9cdf474cc3c8e9a upstream. Way back when we first implemented FICLONE for XFS, life was simple -- either the the entire remapping completed, or something happened and we had to return an errno explaining what happened. Neither of those ioctls support returning partial results, so it's all or nothing. Then things got complicated when copy_file_range came along, because it actually can return the number of bytes copied, so commit 3f68c1f562f1e4 tried to make it so that we could return a partial result if the REMAP_FILE_CAN_SHORTEN flag is set. This is also how FIDEDUPERANGE can indicate that the kernel performed a partial deduplication. Unfortunately, the logic is wrong if an error stops the remapping and CAN_SHORTEN is not set. Because those callers cannot return partial results, it is an error for ->remap_file_range to return a positive quantity that is less than the @len passed in. Implementations really should be returning a negative errno in this case, because that's what btrfs (which introduced FICLONE{,RANGE}) did. Therefore, ->remap_range implementations cannot silently drop an errno that they might have when the number of bytes remapped is less than the number of bytes requested and CAN_SHORTEN is not set. Found by running generic/562 on a 64k fsblock filesystem and wondering why it reported corrupt files. Cc: <stable@vger.kernel.org> # v4.20 Fixes: 3fc9f5e409319e ("xfs: remove xfs_reflink_remap_range") Really-Fixes: 3f68c1f562f1e4 ("xfs: support returning partial reflink results") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-02 17:00:49 +01:00
Ksawlii	98936adf87	fs: blkdev.c: Fix a typo	2024-12-18 18:20:04 +01:00
Ksawlii	3f33f5d093	fs: blkdev.c: Fix build without CONFIG_SDFAT_DEBUG	2024-12-18 18:14:48 +01:00
Al Viro	436eb37b50	d_add_ci(): make sure we don't miss d_lookup_done() All callers of d_alloc_parallel() must make sure that resulting in-lookup dentry (if any) will encounter __d_lookup_done() before the final dput(). d_add_ci() might end up creating in-lookup dentries; they are fed to d_splice_alias(), which will normally make sure they meet __d_lookup_done(). However, it is possible to end up with d_splice_alias() failing with ERR_PTR(-ELOOP) without having done so. It takes a corrupted ntfs or case-insensitive xfs image, but neither should end up with memory corruption... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-12-18 15:05:46 +01:00

1 2 3 4 5 ...

1395 commits