kernel_samsung_a53x

Author	SHA1	Message	Date
Sheng Yong	d2c4bce0bf	f2fs: fix start segno of large section [ Upstream commit 8c409989678e92e4a737e7cd2bb04f3efb81071a ] get_ckpt_valid_blocks() checks valid ckpt blocks in current section. It counts all vblocks from the first to the last segment in the large section. However, START_SEGNO() is used to get the first segno in an SIT block. This patch fixes that to get the correct start segno. Fixes: 61461fc921b7 ("f2fs: fix to avoid touching checkpointed data in get_victim()") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:16 +01:00
Jeongjun Park	c48b0762c0	jfs: Fix array-index-out-of-bounds in diFree [ Upstream commit f73f969b2eb39ad8056f6c7f3a295fa2f85e313a ] Reported-by: syzbot+241c815bda521982cb49@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:16 +01:00
Ryusuke Konishi	61019dc2ee	nilfs2: handle inconsistent state in nilfs_btnode_create_block() commit 4811f7af6090e8f5a398fbdd766f903ef6c0d787 upstream. Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987d45d ("nilfs2: B-tree node cache") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:16 +01:00
Jan Kara	ecdf0fab70	jbd2: make jbd2_journal_get_max_txn_bufs() internal commit 4aa99c71e42ad60178c1154ec24e3df9c684fb67 upstream. There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:13 +01:00
Baokun Li	e8d12256ca	ext4: make sure the first directory block is not a hole commit f9ca51596bbfd0f9c386dd1c613c394c78d9e5e6 upstream. The syzbot constructs a directory that has no dirblock but is non-inline, i.e. the first directory block is a hole. And no errors are reported when creating files in this directory in the following flow. ext4_mknod ... ext4_add_entry // Read block 0 ext4_read_dirblock(dir, block, DIRENT) bh = ext4_bread(NULL, inode, block, 0) if (!bh && (type == INDEX \|\| type == DIRENT_HTREE)) // The first directory block is a hole // But type == DIRENT, so no error is reported. After that, we get a directory block without '.' and '..' but with a valid dentry. This may cause some code that relies on dot or dotdot (such as make_indexed_dir()) to crash. Therefore when ext4_read_dirblock() finds that the first directory block is a hole report that the filesystem is corrupted and return an error to avoid loading corrupted data from disk causing something bad. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:13 +01:00
Baokun Li	dc0e632bc7	ext4: check dot and dotdot of dx_root before making dir indexed commit 50ea741def587a64e08879ce6c6a30131f7111e7 upstream. Syzbot reports a issue as follows: ============================================ BUG: unable to handle page fault for address: ffffed11022e24fe PGD 23ffee067 P4D 23ffee067 PUD 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted 6.10.0-rc5-g55027e689933 #0 Call Trace: <TASK> make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451 ext4_rename fs/ext4/namei.c:3936 [inline] ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214 [...] ============================================ The immediate cause of this problem is that there is only one valid dentry for the block to be split during do_split, so split==0 results in out of bounds accesses to the map triggering the issue. do_split unsigned split dx_make_map count = 1 split = count/2 = 0; continued = hash2 == map[split - 1].hash; ---> map[4294967295] The maximum length of a filename is 255 and the minimum block size is 1024, so it is always guaranteed that the number of entries is greater than or equal to 2 when do_split() is called. But syzbot's crafted image has no dot and dotdot in dir, and the dentry distribution in dirblock is as follows: bus dentry1 hole dentry2 free \|xx--\|xx-------------\|...............\|xx-------------\|...............\| 0 12 (8+248)=256 268 256 524 (8+256)=264 788 236 1024 So when renaming dentry1 increases its name_len length by 1, neither hole nor free is sufficient to hold the new dentry, and make_indexed_dir() is called. In make_indexed_dir() it is assumed that the first two entries of the dirblock must be dot and dotdot, so bus and dentry1 are left in dx_root because they are treated as dot and dotdot, and only dentry2 is moved to the new leaf block. That's why count is equal to 1. Therefore add the ext4_check_dx_root() helper function to add more sanity checks to dot and dotdot before starting the conversion to avoid the above issue. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240702132349.2600605-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:13 +01:00
Jan Kara	41c2d4b1fc	udf: Avoid using corrupted block bitmap buffer commit a90d4471146de21745980cba51ce88e7926bcc4f upstream. When the filesystem block bitmap is corrupted, we detect the corruption while loading the bitmap and fail the allocation with error. However the next allocation from the same bitmap will notice the bitmap buffer is already loaded and tries to allocate from the bitmap with mixed results (depending on the exact nature of the bitmap corruption). Fix the problem by using BH_verified bit to indicate whether the bitmap is valid or not. Reported-by: syzbot+5f682cd029581f9edfd1@syzkaller.appspotmail.com CC: stable@vger.kernel.org Link: https://patch.msgid.link/20240617154201.29512-2-jack@suse.cz Fixes: 1e0d4adf17e7 ("udf: Check consistency of Space Bitmap Descriptor") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:13 +01:00
Jan Kara	b63bddf2e8	ext2: Verify bitmap and itable block numbers before using them commit 322a6aff03937aa1ece33b4e46c298eafaf9ac41 upstream. Verify bitmap block numbers and inode table blocks are sane before using them for checking bits in the block bitmap. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:12 +01:00
Chao Yu	8ce994880f	hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode() commit 26a2ed107929a855155429b11e1293b83e6b2a8b upstream. Syzbot reports uninitialized value access issue as below: loop0: detected capacity change from 0 to 64 ===================================================== BUG: KMSAN: uninit-value in hfs_revalidate_dentry+0x307/0x3f0 fs/hfs/sysdep.c:30 hfs_revalidate_dentry+0x307/0x3f0 fs/hfs/sysdep.c:30 d_revalidate fs/namei.c:862 [inline] lookup_fast+0x89e/0x8e0 fs/namei.c:1649 walk_component fs/namei.c:2001 [inline] link_path_walk+0x817/0x1480 fs/namei.c:2332 path_lookupat+0xd9/0x6f0 fs/namei.c:2485 filename_lookup+0x22e/0x740 fs/namei.c:2515 user_path_at_empty+0x8b/0x390 fs/namei.c:2924 user_path_at include/linux/namei.h:57 [inline] do_mount fs/namespace.c:3689 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount+0x66b/0x810 fs/namespace.c:3875 __x64_sys_mount+0xe4/0x140 fs/namespace.c:3875 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b BUG: KMSAN: uninit-value in hfs_ext_read_extent fs/hfs/extent.c:196 [inline] BUG: KMSAN: uninit-value in hfs_get_block+0x92d/0x1620 fs/hfs/extent.c:366 hfs_ext_read_extent fs/hfs/extent.c:196 [inline] hfs_get_block+0x92d/0x1620 fs/hfs/extent.c:366 block_read_full_folio+0x4ff/0x11b0 fs/buffer.c:2271 hfs_read_folio+0x55/0x60 fs/hfs/inode.c:39 filemap_read_folio+0x148/0x4f0 mm/filemap.c:2426 do_read_cache_folio+0x7c8/0xd90 mm/filemap.c:3553 do_read_cache_page mm/filemap.c:3595 [inline] read_cache_page+0xfb/0x2f0 mm/filemap.c:3604 read_mapping_page include/linux/pagemap.h:755 [inline] hfs_btree_open+0x928/0x1ae0 fs/hfs/btree.c:78 hfs_mdb_get+0x260c/0x3000 fs/hfs/mdb.c:204 hfs_fill_super+0x1fb1/0x2790 fs/hfs/super.c:406 mount_bdev+0x628/0x920 fs/super.c:1359 hfs_mount+0xcd/0xe0 fs/hfs/super.c:456 legacy_get_tree+0x167/0x2e0 fs/fs_context.c:610 vfs_get_tree+0xdc/0x5d0 fs/super.c:1489 do_new_mount+0x7a9/0x16f0 fs/namespace.c:3145 path_mount+0xf98/0x26a0 fs/namespace.c:3475 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x919/0x9e0 fs/namespace.c:3674 __ia32_sys_mount+0x15b/0x1b0 fs/namespace.c:3674 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Uninit was created at: __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2190 [inline] allocate_slab mm/slub.c:2354 [inline] new_slab+0x2d7/0x1400 mm/slub.c:2407 ___slab_alloc+0x16b5/0x3970 mm/slub.c:3540 __slab_alloc mm/slub.c:3625 [inline] __slab_alloc_node mm/slub.c:3678 [inline] slab_alloc_node mm/slub.c:3850 [inline] kmem_cache_alloc_lru+0x64d/0xb30 mm/slub.c:3879 alloc_inode_sb include/linux/fs.h:3018 [inline] hfs_alloc_inode+0x5a/0xc0 fs/hfs/super.c:165 alloc_inode+0x83/0x440 fs/inode.c:260 new_inode_pseudo fs/inode.c:1005 [inline] new_inode+0x38/0x4f0 fs/inode.c:1031 hfs_new_inode+0x61/0x1010 fs/hfs/inode.c:186 hfs_mkdir+0x54/0x250 fs/hfs/dir.c:228 vfs_mkdir+0x49a/0x700 fs/namei.c:4126 do_mkdirat+0x529/0x810 fs/namei.c:4149 __do_sys_mkdirat fs/namei.c:4164 [inline] __se_sys_mkdirat fs/namei.c:4162 [inline] __x64_sys_mkdirat+0xc8/0x120 fs/namei.c:4162 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b It missed to initialize .tz_secondswest, .cached_start and .cached_blocks fields in struct hfs_inode_info after hfs_alloc_inode(), fix it. Cc: stable@vger.kernel.org Reported-by: syzbot+3ae6be33a50b5aae4dab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-fsdevel/0000000000005ad04005ee48897f@google.com Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240616013841.2217-1-chao@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-23 23:20:12 +01:00
Ryusuke Konishi	5ff878bcc2	nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro [ Upstream commit 0f3819e8c483771a59cf9d3190cd68a7a990083c ] According to the C standard 3.4.3p3, the result of signed integer overflow is undefined. The macro nilfs_cnt32_ge(), which compares two sequence numbers, uses signed integer subtraction that can overflow, and therefore the result of the calculation may differ from what is expected due to undefined behavior in different environments. Similar to an earlier change to the jiffies-related comparison macros in commit 5a581b367b5d ("jiffies: Avoid undefined behavior from signed overflow"), avoid this potential issue by changing the definition of the macro to perform the subtraction as unsigned integers, then cast the result to a signed integer for comparison. Link: https://lkml.kernel.org/r/20130727225828.GA11864@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/20240702183512.6390-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:12 +01:00
Alex Shi	53beb2e33d	fs/nilfs2: remove some unused macros to tame gcc [ Upstream commit e7920b3e9d9f5470d5ff7d883e72a47addc0a137 ] There some macros are unused and cause gcc warning. Remove them. fs/nilfs2/segment.c:137:0: warning: macro "nilfs_cnt32_gt" is not used [-Wunused-macros] fs/nilfs2/segment.c:144:0: warning: macro "nilfs_cnt32_le" is not used [-Wunused-macros] fs/nilfs2/segment.c:143:0: warning: macro "nilfs_cnt32_lt" is not used [-Wunused-macros] Link: https://lkml.kernel.org/r/1607552733-24292-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 0f3819e8c483 ("nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:11 +01:00
David Hildenbrand	fc25cace5d	fs/proc/task_mmu: indicate PM_FILE for PMD-mapped file THP [ Upstream commit 3f9f022e975d930709848a86a1c79775b0585202 ] Patch series "fs/proc: move page_mapcount() to fs/proc/internal.h". With all other page_mapcount() users in the tree gone, move page_mapcount() to fs/proc/internal.h, rename it and extend the documentation to prevent future (ab)use. ... of course, I find some issues while working on that code that I sort first ;) We'll now only end up calling page_mapcount() [now folio_precise_page_mapcount()] on pages mapped via present page table entries. Except for /proc/kpagecount, that still does questionable things, but we'll leave that legacy interface as is for now. Did a quick sanity check. Likely we would want some better selfestest for /proc/$/pagemap + smaps. I'll see if I can find some time to write some more. This patch (of 6): Looks like we never taught pagemap_pmd_range() about the existence of PMD-mapped file THPs. Seems to date back to the times when we first added support for non-anon THPs in the form of shmem THP. Link: https://lkml.kernel.org/r/20240607122357.115423-1-david@redhat.com Link: https://lkml.kernel.org/r/20240607122357.115423-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:11 +01:00
Jan Kara	cb60b25ab6	ext4: avoid writing unitialized memory to disk in EA inodes [ Upstream commit 65121eff3e4c8c90f8126debf3c369228691c591 ] If the extended attribute size is not a multiple of block size, the last block in the EA inode will have uninitialized tail which will get written to disk. We will never expose the data to userspace but still this is not a good practice so just zero out the tail of the block as it isn't going to cause a noticeable performance overhead. Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Reported-by: syzbot+9c1fe13fcb51574b249b@syzkaller.appspotmail.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240613150234.25176-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:10 +01:00
Luis Henriques (SUSE)	63d07b26c5	ext4: fix infinite loop when replaying fast_commit [ Upstream commit 907c3fe532253a6ef4eb9c4d67efb71fab58c706 ] When doing fast_commit replay an infinite loop may occur due to an uninitialized extent_status struct. ext4_ext_determine_insert_hole() does not detect the replay and calls ext4_es_find_extent_range(), which will return immediately without initializing the 'es' variable. Because 'es' contains garbage, an integer overflow may happen causing an infinite loop in this function, easily reproducible using fstest generic/039. This commit fixes this issue by unconditionally initializing the structure in function ext4_es_find_extent_range(). Thanks to Zhang Yi, for figuring out the real problem! Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240515082857.32730-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:20:09 +01:00
Chao Yu	8ac36e7ed3	hfsplus: fix to avoid false alarm of circular locking [ Upstream commit be4edd1642ee205ed7bbf66edc0453b1be1fb8d7 ] Syzbot report potential ABBA deadlock as below: loop0: detected capacity change from 0 to 1024 ====================================================== WARNING: possible circular locking dependency detected 6.9.0-syzkaller-10323-g8f6a15f095a6 #0 Not tainted ------------------------------------------------------ syz-executor171/5344 is trying to acquire lock: ffff88807cb980b0 (&tree->tree_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595 but task is already holding lock: ffff88807a930108 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}, at: hfsplus_file_truncate+0x2da/0xb50 fs/hfsplus/extents.c:576 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&HFSPLUS_I(inode)->extents_lock){+.+.}-{3:3}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 hfsplus_file_extend+0x21b/0x1b70 fs/hfsplus/extents.c:457 hfsplus_bmap_reserve+0x105/0x4e0 fs/hfsplus/btree.c:358 hfsplus_rename_cat+0x1d0/0x1050 fs/hfsplus/catalog.c:456 hfsplus_rename+0x12e/0x1c0 fs/hfsplus/dir.c:552 vfs_rename+0xbdb/0xf00 fs/namei.c:4887 do_renameat2+0xd94/0x13f0 fs/namei.c:5044 __do_sys_rename fs/namei.c:5091 [inline] __se_sys_rename fs/namei.c:5089 [inline] __x64_sys_rename+0x86/0xa0 fs/namei.c:5089 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&tree->tree_lock){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 hfsplus_file_truncate+0x811/0xb50 fs/hfsplus/extents.c:595 hfsplus_setattr+0x1ce/0x280 fs/hfsplus/inode.c:265 notify_change+0xb9d/0xe70 fs/attr.c:497 do_truncate+0x220/0x310 fs/open.c:65 handle_truncate fs/namei.c:3308 [inline] do_open fs/namei.c:3654 [inline] path_openat+0x2a3d/0x3280 fs/namei.c:3807 do_filp_open+0x235/0x490 fs/namei.c:3834 do_sys_openat2+0x13e/0x1d0 fs/open.c:1406 do_sys_open fs/open.c:1421 [inline] __do_sys_creat fs/open.c:1497 [inline] __se_sys_creat fs/open.c:1491 [inline] __x64_sys_creat+0x123/0x170 fs/open.c:1491 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&HFSPLUS_I(inode)->extents_lock); lock(&tree->tree_lock); lock(&HFSPLUS_I(inode)->extents_lock); lock(&tree->tree_lock); This is a false alarm as tree_lock mutex are different, one is from sbi->cat_tree, and another is from sbi->ext_tree: Thread A Thread B - hfsplus_rename - hfsplus_rename_cat - hfs_find_init - mutext_lock(cat_tree->tree_lock) - hfsplus_setattr - hfsplus_file_truncate - mutex_lock(hip->extents_lock) - hfs_find_init - mutext_lock(ext_tree->tree_lock) - hfs_bmap_reserve - hfsplus_file_extend - mutex_lock(hip->extents_lock) So, let's call mutex_lock_nested for tree_lock mutex lock, and pass correct lock class for it. Fixes: 31651c607151 ("hfsplus: avoid deadlock on file truncation") Reported-by: syzbot+6030b3b1b9bf70e538c4@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-fsdevel/000000000000e37a4005ef129563@google.com Cc: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240607142304.455441-1-chao@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-23 23:19:56 +01:00
Sultan Alsawaf	debf1dee53	mbcache: Speed up cache entry creation In order to prevent redundant entry creation by racing against itself, mb_cache_entry_create scans through a large hash-list of all current entries in order to see if another allocation for the requested new entry has been made. Furthermore, it allocates memory for a new entry before scanning through this hash-list, which results in that allocated memory being discarded when the requested new entry is already present. This happens more than half the time. Speed up cache entry creation by keeping a small linked list of requested new entries in progress, and scanning through that first instead of the large hash-list. Additionally, don't bother allocating memory for a new entry until it's known that the allocated memory will be used. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-11-19 17:58:19 +01:00
Jesse Chan	1b01b9bda3	f2fs: Enlarge min_fsync_blocks to 20 In OPPO's kernel: enlarge min_fsync_blocks to optimize performance - yanwu@TECH.Storage.FS.oF2FS, 2019/08/12 Huawei is also doing this in their production kernel. If this optimization is good for them and shipped with their devices, it should be good for us. Signed-off-by: Jesse Chan <jc@linux.com> Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>	2024-11-19 17:55:45 +01:00
Park Ju Hyung	414da43f9f	f2fs: set ioprio of GC kthread to idle GC should run conservatively as possible to reduce latency spikes to the user. Setting ioprio to idle class will allow the kernel to schedule GC thread's I/O to not affect any other processes' I/O requests. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>	2024-11-19 17:55:41 +01:00
Park Ju Hyung	008f8557c0	f2fs: reduce timeout for uncongestion On high fs utilization, congestion is hit quite frequently and waiting for a whooping 20ms is too expensive, especially on critical paths. Reduce it to an amount that is unlikely to affect UI rendering paths. The new times are as follows: 100 Hz => 1 jiffy (effective: 10 ms) 250 Hz => 2 jiffies (effective: 8 ms) 300 Hz => 2 jiffies (effective: 6 ms) 1000 Hz => 6 jiffies (effective: 6 ms) Co-authored-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>	2024-11-19 17:55:37 +01:00
Pavel Begunkov	f029d24207	splice: don't generate zero-len segement bvecs iter_file_splice_write() may spawn bvec segments with zero-length. In preparation for prohibiting them, filter out by hand at splice level. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 0f1d344feb534555a0dcd0beafb7211a37c5355e) (cherry picked from commit 4c72fdc13bd20d10f59b8145627312814583a945) (cherry picked from commit cba6a18da1cc8144a07ba6a4b03e8e8dc8d24428) (cherry picked from commit 54a17499483118cd3c92feb747c88207ce30e9ce) (cherry picked from commit 4dec661d05c16a8e62dd833262ff68ce3e466770) (cherry picked from commit fe99d86b681099f662b2b01155b02b8476ff428d) (cherry picked from commit aa033460cd26157fe81e829e4744b3396a09860b)	2024-11-19 17:42:24 +01:00
Jann Horn	dadacb32b4	filelock: Fix fcntl/close race recovery compat path commit f8138f2ad2f745b9a1c696a05b749eabe44337ea upstream. When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:53 +01:00
lei lu	fac2aa0b41	jfs: don't walk off the end of ealist commit d0fa70aca54c8643248e89061da23752506ec0d4 upstream. Add a check before visiting the members of ea to make sure each ea stays within the ealist. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
lei lu	1494ceb608	ocfs2: add bounds checking to ocfs2_check_dir_entry() commit 255547c6bb8940a97eea94ef9d464ea5967763fb upstream. This adds sanity checks for ocfs2_dir_entry to make sure all members of ocfs2_dir_entry don't stray beyond valid memory region. Link: https://lkml.kernel.org/r/20240626104433.163270-1-llfamsec@gmail.com Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Gabriel Krisman Bertazi	54d36cabde	ext4: fix error code saved on super block during file system abort commit 124e7c61deb27d758df5ec0521c36cf08d417f7a upstream. ext4_abort will eventually call ext4_errno_to_code, which translates the errno to an EXT4_ERR specific error. This means that ext4_abort expects an errno. By using EXT4_ERR_ here, it gets misinterpreted (as an errno), and ends up saving EXT4_ERR_EBUSY on the superblock during an abort, which makes no sense. ESHUTDOWN will get properly translated to EXT4_ERR_SHUTDOWN, so use that instead. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/20211026173302.84000-1-krisman@collabora.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:51 +01:00
Edward Adam Davis	7274ea7067	hfsplus: fix uninit-value in copy_name [ Upstream commit 0570730c16307a72f8241df12363f76600baf57d ] [syzbot reported] BUG: KMSAN: uninit-value in sized_strscpy+0xc4/0x160 sized_strscpy+0xc4/0x160 copy_name+0x2af/0x320 fs/hfsplus/xattr.c:411 hfsplus_listxattr+0x11e9/0x1a50 fs/hfsplus/xattr.c:750 vfs_listxattr fs/xattr.c:493 [inline] listxattr+0x1f3/0x6b0 fs/xattr.c:840 path_listxattr fs/xattr.c:864 [inline] __do_sys_listxattr fs/xattr.c:876 [inline] __se_sys_listxattr fs/xattr.c:873 [inline] __x64_sys_listxattr+0x16b/0x2f0 fs/xattr.c:873 x64_sys_call+0x2ba0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:3877 [inline] slab_alloc_node mm/slub.c:3918 [inline] kmalloc_trace+0x57b/0xbe0 mm/slub.c:4065 kmalloc include/linux/slab.h:628 [inline] hfsplus_listxattr+0x4cc/0x1a50 fs/hfsplus/xattr.c:699 vfs_listxattr fs/xattr.c:493 [inline] listxattr+0x1f3/0x6b0 fs/xattr.c:840 path_listxattr fs/xattr.c:864 [inline] __do_sys_listxattr fs/xattr.c:876 [inline] __se_sys_listxattr fs/xattr.c:873 [inline] __x64_sys_listxattr+0x16b/0x2f0 fs/xattr.c:873 x64_sys_call+0x2ba0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [Fix] When allocating memory to strbuf, initialize memory to 0. Reported-and-tested-by: syzbot+efde959319469ff8d4d7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_8BBB6433BC9E1C1B7B4BDF1BF52574BA8808@qq.com Reported-and-tested-by: syzbot+01ade747b16e9c8030e0@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:50 +01:00
Christian Brauner	0829cd1693	fs: better handle deep ancestor chains in is_subdir() [ Upstream commit 391b59b045004d5b985d033263ccba3e941a7740 ] Jan reported that 'cd ..' may take a long time in deep directory hierarchies under a bind-mount. If concurrent renames happen it is possible to livelock in is_subdir() because it will keep retrying. Change is_subdir() from simply retrying over and over to retry once and then acquire the rename lock to handle deep ancestor chains better. The list of alternatives to this approach were less then pleasant. Change the scope of rcu lock to cover the whole walk while at it. A big thanks to Jan and Linus. Both Jan and Linus had proposed effectively the same thing just that one version ended up being slightly more elegant. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:50 +01:00
Filipe Manana	101b87fc72	btrfs: qgroup: fix quota root leak after quota disable failure [ Upstream commit a7e4c6a3031c74078dba7fa36239d0f4fe476c53 ] If during the quota disable we fail when cleaning the quota tree or when deleting the root from the root tree, we jump to the 'out' label without ever dropping the reference on the quota root, resulting in a leak of the root since fs_info->quota_root is no longer pointing to the root (we have set it to NULL just before those steps). Fix this by always doing a btrfs_put_root() call under the 'out' label. This is a problem that exists since qgroups were first added in 2012 by commit bed92eae26cc ("Btrfs: qgroup implementation and prototypes"), but back then we missed a kfree on the quota root and free_extent_buffer() calls on its root and commit root nodes, since back then roots were not yet reference counted. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:48 +01:00
Yuntao Wang	6f0d62e180	fs/file: fix the check in find_next_fd() [ Upstream commit ed8c7fbdfe117abbef81f65428ba263118ef298a ] The maximum possible return value of find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) is maxbit. This return value, multiplied by BITS_PER_LONG, gives the value of bitbit, which can never be greater than maxfd, it can only be equal to maxfd at most, so the following check 'if (bitbit > maxfd)' will never be true. Moreover, when bitbit equals maxfd, it indicates that there are no unused fds, and the function can directly return. Fix this check. Signed-off-by: Yuntao Wang <yuntao.wang@linux.dev> Link: https://lore.kernel.org/r/20240529160656.209352-1-yuntao.wang@linux.dev Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:47 +01:00
Jann Horn	551a595fa8	filelock: Remove locks reliably when fcntl/close race is detected commit 3cad1bc010416c6dd780643476bc59ed742436b9 upstream. When fcntl_setlk() races with close(), it removes the created lock with do_lock_file_wait(). However, LSMs can allow the first do_lock_file_wait() that created the lock while denying the second do_lock_file_wait() that tries to remove the lock. In theory (but AFAIK not in practice), posix_lock_file() could also fail to remove a lock due to GFP_KERNEL allocation failure (when splitting a range in the middle). After the bug has been triggered, use-after-free reads will occur in lock_get_status() when userspace reads /proc/locks. This can likely be used to read arbitrary kernel memory, but can't corrupt kernel memory. This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in enforcing mode and only works from some security contexts. Fix it by calling locks_remove_posix() instead, which is designed to reliably get rid of POSIX locks associated with the given file and files_struct and is also used by filp_flush(). Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240702-fs-lock-recover-2-v1-1-edd456f63789@google.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> [stable fixup: ->c.flc_type was ->fl_type in older kernels] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:46 +01:00
Ryusuke Konishi	28da8d8372	nilfs2: fix kernel bug on rename operation of broken directory commit a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 upstream. Syzbot reported that in rename directory operation on broken directory on nilfs2, __block_write_begin_int() called to prepare block write may fail BUG_ON check for access exceeding the folio/page size. This is because nilfs_dotdot(), which gets parent directory reference entry ("..") of the directory to be moved or renamed, does not check consistency enough, and may return location exceeding folio/page size for broken directories. Fix this issue by checking required directory entries ("." and "..") in the first chunk of the directory in nilfs_dotdot(). Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+d3abed1ad3d367fa2627@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:46 +01:00
Brian Foster	f4bad69f25	vfs: don't mod negative dentry count when on shrinker list [ Upstream commit aabfe57ebaa75841db47ea59091ec3c5a06d2f52 ] The nr_dentry_negative counter is intended to only account negative dentries that are present on the superblock LRU. Therefore, the LRU add, remove and isolate helpers modify the counter based on whether the dentry is negative, but the shrinker list related helpers do not modify the counter, and the paths that change a dentry between positive and negative only do so if DCACHE_LRU_LIST is set. The problem with this is that a dentry on a shrinker list still has DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a shrink related list. Therefore if a relevant operation (i.e. unlink) occurs while a dentry is present on a shrinker list, and the associated codepath only checks for DCACHE_LRU_LIST, then it is technically possible to modify the negative dentry count for a dentry that is off the LRU. Since the shrinker list related helpers do not modify the negative dentry count (because non-LRU dentries should not be included in the count) when the dentry is ultimately removed from the shrinker list, this can cause the negative dentry count to become permanently inaccurate. This problem can be reproduced via a heavy file create/unlink vs. drop_caches workload. On an 80xcpu system, I start 80 tasks each running a 1k file create/delete loop, and one task spinning on drop_caches. After 10 minutes or so of runtime, the idle/clean cache negative dentry count increases from somewhere in the range of 5-10 entries to several hundred (and increasingly grows beyond nr_dentry_unused). Tweak the logic in the paths that turn a dentry negative or positive to filter out the case where the dentry is present on a shrink related list. This allows the above workload to maintain an accurate negative dentry count. Fixes: af0c9af1b3f6 ("fs/dcache: Track & report number of negative dentries") Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20240703121301.247680-1-bfoster@redhat.com Acked-by: Ian Kent <ikent@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:43 +01:00
linke li	f2b553daa2	fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading [ Upstream commit 8bfb40be31ddea0cb4664b352e1797cfe6c91976 ] Currently, the __d_clear_type_and_inode() writes the value flags to dentry->d_flags, then immediately re-reads it in order to use it in a if statement. This re-read is useless because no other update to dentry->d_flags can occur at this point. This commit therefore re-use flags in the if statement instead of re-reading dentry->d_flags. Signed-off-by: linke li <lilinke99@qq.com> Link: https://lore.kernel.org/r/tencent_5E187BD0A61BA28605E85405F15228254D0A@qq.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: aabfe57ebaa7 ("vfs: don't mod negative dentry count when on shrinker list") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:43 +01:00
Jeff Layton	a13746ed11	filelock: fix potential use-after-free in posix_lock_inode [ Upstream commit 1b3ec4f7c03d4b07bad70697d7e2f4088d2cfe92 ] Light Hsieh reported a KASAN UAF warning in trace_posix_lock_inode(). The request pointer had been changed earlier to point to a lock entry that was added to the inode's list. However, before the tracepoint could fire, another task raced in and freed that lock. Fix this by moving the tracepoint inside the spinlock, which should ensure that this doesn't happen. Fixes: 74f6f5912693 ("locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock") Link: https://lore.kernel.org/linux-fsdevel/724ffb0a2962e912ea62bb0515deadf39c325112.camel@kernel.org/ Reported-by: Light Hsieh (謝明燈) <Light.Hsieh@mediatek.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240702-filelock-6-10-v1-1-96e766aadc98@kernel.org Reviewed-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:43 +01:00
Ryusuke Konishi	5e3112026c	nilfs2: fix incorrect inode allocation from reserved inodes commit 93aef9eda1cea9e84ab2453fcceb8addad0e46f1 upstream. If the bitmap block that manages the inode allocation status is corrupted, nilfs_ifile_create_inode() may allocate a new inode from the reserved inode area where it should not be allocated. Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of struct nilfs_root"), fixed the problem that reserved inodes with inode numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to bitmap corruption, but since the start number of non-reserved inodes is read from the super block and may change, in which case inode allocation may occur from the extended reserved inode area. If that happens, access to that inode will cause an IO error, causing the file system to degrade to an error state. Fix this potential issue by adding a wraparound option to the common metadata object allocation routine and by modifying nilfs_ifile_create_inode() to disable the option so that it only allocates inodes with inode numbers greater than or equal to the inode number read in "nilfs->ns_first_ino", regardless of the bitmap status of reserved inodes. Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:43 +01:00
Ryusuke Konishi	37fc6d7785	nilfs2: add missing check for inode numbers on directory entries commit bb76c6c274683c8570ad788f79d4b875bde0e458 upstream. Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn(). As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case). The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking. Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages. Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis. Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 Reported-by: Jan Kara <jack@suse.cz> Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:41 +01:00
Ryusuke Konishi	757ac87caa	nilfs2: fix inode number range checks commit e2fec219a36e0993642844be0f345513507031f4 upstream. Patch series "nilfs2: fix potential issues related to reserved inodes". This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value. This patch (of 3): In the current implementation of nilfs2, "nilfs->ns_first_ino", which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked. As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly. In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if "ns_first_ino" is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment. Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros. Also, change the type of "ns_first_ino" from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time. Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:41 +01:00
Wang Yong	b131b91878	jffs2: Fix potential illegal address access in jffs2_free_inode [ Upstream commit af9a8730ddb6a4b2edd779ccc0aceb994d616830 ] During the stress testing of the jffs2 file system,the following abnormal printouts were found: [ 2430.649000] Unable to handle kernel paging request at virtual address 0069696969696948 [ 2430.649622] Mem abort info: [ 2430.649829] ESR = 0x96000004 [ 2430.650115] EC = 0x25: DABT (current EL), IL = 32 bits [ 2430.650564] SET = 0, FnV = 0 [ 2430.650795] EA = 0, S1PTW = 0 [ 2430.651032] FSC = 0x04: level 0 translation fault [ 2430.651446] Data abort info: [ 2430.651683] ISV = 0, ISS = 0x00000004 [ 2430.652001] CM = 0, WnR = 0 [ 2430.652558] [0069696969696948] address between user and kernel address ranges [ 2430.653265] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2430.654512] CPU: 2 PID: 20919 Comm: cat Not tainted 5.15.25-g512f31242bf6 #33 [ 2430.655008] Hardware name: linux,dummy-virt (DT) [ 2430.655517] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2430.656142] pc : kfree+0x78/0x348 [ 2430.656630] lr : jffs2_free_inode+0x24/0x48 [ 2430.657051] sp : ffff800009eebd10 [ 2430.657355] x29: ffff800009eebd10 x28: 0000000000000001 x27: 0000000000000000 [ 2430.658327] x26: ffff000038f09d80 x25: 0080000000000000 x24: ffff800009d38000 [ 2430.658919] x23: 5a5a5a5a5a5a5a5a x22: ffff000038f09d80 x21: ffff8000084f0d14 [ 2430.659434] x20: ffff0000bf9a6ac0 x19: 0169696969696940 x18: 0000000000000000 [ 2430.659969] x17: ffff8000b6506000 x16: ffff800009eec000 x15: 0000000000004000 [ 2430.660637] x14: 0000000000000000 x13: 00000001000820a1 x12: 00000000000d1b19 [ 2430.661345] x11: 0004000800000000 x10: 0000000000000001 x9 : ffff8000084f0d14 [ 2430.662025] x8 : ffff0000bf9a6b40 x7 : ffff0000bf9a6b48 x6 : 0000000003470302 [ 2430.662695] x5 : ffff00002e41dcc0 x4 : ffff0000bf9aa3b0 x3 : 0000000003470342 [ 2430.663486] x2 : 0000000000000000 x1 : ffff8000084f0d14 x0 : fffffc0000000000 [ 2430.664217] Call trace: [ 2430.664528] kfree+0x78/0x348 [ 2430.664855] jffs2_free_inode+0x24/0x48 [ 2430.665233] i_callback+0x24/0x50 [ 2430.665528] rcu_do_batch+0x1ac/0x448 [ 2430.665892] rcu_core+0x28c/0x3c8 [ 2430.666151] rcu_core_si+0x18/0x28 [ 2430.666473] __do_softirq+0x138/0x3cc [ 2430.666781] irq_exit+0xf0/0x110 [ 2430.667065] handle_domain_irq+0x6c/0x98 [ 2430.667447] gic_handle_irq+0xac/0xe8 [ 2430.667739] call_on_irq_stack+0x28/0x54 The parameter passed to kfree was 5a5a5a5a, which corresponds to the target field of the jffs_inode_info structure. It was found that all variables in the jffs_inode_info structure were 5a5a5a5a, except for the first member sem. It is suspected that these variables are not initialized because they were set to 5a5a5a5a during memory testing, which is meant to detect uninitialized memory.The sem variable is initialized in the function jffs2_i_init_once, while other members are initialized in the function jffs2_init_inode_info. The function jffs2_init_inode_info is called after iget_locked, but in the iget_locked function, the destroy_inode process is triggered, which releases the inode and consequently, the target member of the inode is not initialized.In concurrent high pressure scenarios, iget_locked may enter the destroy_inode branch as described in the code. Since the destroy_inode functionality of jffs2 only releases the target, the fix method is to set target to NULL in jffs2_i_init_once. Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn> Reviewed-by: Yang Tao <yang.tao172@zte.com.cn> Cc: Xu Xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:41 +01:00
Mike Marshall	241a348067	orangefs: fix out-of-bounds fsid access [ Upstream commit 53e4efa470d5fc6a96662d2d3322cfc925818517 ] Arnd Bergmann sent a patch to fsdevel, he says: "orangefs_statfs() copies two consecutive fields of the superblock into the statfs structure, which triggers a warning from the string fortification helpers" Jan Kara suggested an alternate way to do the patch to make it more readable. I ran both ideas through xfstests and both seem fine. This patch is based on Jan Kara's suggestion. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:41 +01:00
Matthew Wilcox (Oracle)	8374921131	nfs: Leave pages in the pagecache if readpage failed commit 0b768a9610c6de9811c6d33900bebfb665192ee1 upstream. The pagecache handles readpage failing by itself; it doesn't want filesystems to remove pages from under it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:35 +01:00
Arnd Bergmann	5509e7c8d5	ftruncate: pass a signed offset commit 4b8e88e563b5f666446d002ad0dc1e6e8e7102b0 upstream. The old ftruncate() syscall, using the 32-bit off_t misses a sign extension when called in compat mode on 64-bit architectures. As a result, passing a negative length accidentally succeeds in truncating to file size between 2GiB and 4GiB. Changing the type of the compat syscall to the signed compat_off_t changes the behavior so it instead returns -EINVAL. The native entry point, the truncate() syscall and the corresponding loff_t based variants are all correct already and do not suffer from this mistake. Fixes: 3f6d078d4acc ("fix compat truncate/ftruncate") Reviewed-by: Christian Brauner <brauner@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:34 +01:00
Jan Kara	d61ae2c2ba	ocfs2: fix DIO failure due to insufficient transaction credits commit be346c1a6eeb49d8fda827d2a9522124c2f72f36 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:33 +01:00
Jeff Layton	5f8cdcf85b	nfsd: hold a lighter-weight client reference over CB_RECALL_ANY [ Upstream commit 10396f4df8b75ff6ab0aa2cd74296565466f2c8d ] Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes. If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation. The job is only holding a reference to the client so it can clear a flag after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:31 +01:00
Enzo Matsumiya	f06772241f	smb: client: fix deadlock in smb2_find_smb_tcon() [ Upstream commit 02c418774f76a0a36a6195c9dbf8971eb4130a15 ] Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such deadlock. Cc: stable@vger.kernel.org Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:30 +01:00
Shyam Prasad N	86dd4d181c	cifs: missed ref-counting smb session in find [ Upstream commit e695a9ad0305af6e8b0cbc24a54976ac2120cbb3 ] When we lookup an smb session based on session id, we did not up the ref-count for the session. This can potentially cause issues if the session is freed from under us. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Stable-dep-of: 02c418774f76 ("smb: client: fix deadlock in smb2_find_smb_tcon()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:30 +01:00
Roman Smirnov	a8e86c7d9c	udf: udftime: prevent overflow in udf_disk_stamp_to_time() [ Upstream commit 3b84adf460381169c085e4bc09e7b57e9e16db0a ] An overflow can occur in a situation where src.centiseconds takes the value of 255. This situation is unlikely, but there is no validation check anywere in the code. Found by Linux Verification Center (linuxtesting.org) with Svace. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240327132755.13945-1-r.smirnov@omp.ru> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:07 +01:00
Yunlei He	6f6c37f5f0	f2fs: remove clear SB_INLINECRYPT flag in default_options [ Upstream commit ac5eecf481c29942eb9a862e758c0c8b68090c33 ] In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set. If create new file or open file during this gap, these files will not use inlinecrypt. Worse case, it may lead to data corruption if wrappedkey_v0 is enable. Thread A: Thread B: -f2fs_remount -f2fs_file_open or f2fs_new_inode -default_options <- clear SB_INLINECRYPT flag -fscrypt_select_encryption_impl -parse_options <- set SB_INLINECRYPT again Signed-off-by: Yunlei He <heyunlei@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:07 +01:00
Ryusuke Konishi	dfe53a517f	nilfs2: fix potential kernel bug due to lack of writeback flag waiting commit a4ca369ca221bb7e06c725792ac107f0e48e82e7 upstream. Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: <TASK> nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:05 +01:00
Su Yue	ba49ec7160	ocfs2: fix races between hole punching and AIO+DIO commit 952b023f06a24b2ad6ba67304c4c84d45bea2f18 upstream. After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block", fstests/generic/300 become from always failed to sometimes failed: ======================================================================== [ 473.293420 ] run fstests generic/300 [ 475.296983 ] JBD2: Ignoring recovery information on journal [ 475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0) with ordered data mode. [ 494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag: Owner 5668 has an extent at cpos 78723 which can no longer be found [ 494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted. [ 494.292018 ] OCFS2: File system is now read-only. [ 494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272 ERROR: status = -30 [ 494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374 ERROR: status = -3 fio: io_u error on file /mnt/scratch/racer: Read-only file system: write offset=460849152, buflen=131072 ========================================================================= In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add unwritten extents to a list. extents are also inserted into extent tree in ocfs2_write_begin_nolock. Then another thread call fallocate to puch a hole at one of the unwritten extent. The extent at cpos was removed by ocfs2_remove_extent(). At end io worker thread, ocfs2_search_extent_list found there is no such extent at the cpos. T1 T2 T3 inode lock ... insert extents ... inode unlock ocfs2_fallocate __ocfs2_change_file_space inode lock lock ip_alloc_sem ocfs2_remove_inode_range inode ocfs2_remove_btree_range ocfs2_remove_extent ^---remove the extent at cpos 78723 ... unlock ip_alloc_sem inode unlock ocfs2_dio_end_io ocfs2_dio_end_io_write lock ip_alloc_sem ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_search_extent_list ^---failed to find extent ... unlock ip_alloc_sem In most filesystems, fallocate is not compatible with racing with AIO+DIO, so fix it by adding to wait for all dio before fallocate/punch_hole like ext4. Link: https://lkml.kernel.org/r/20240408082041.20925-3-glass.su@suse.com Fixes: b25801038da5 ("ocfs2: Support xfs style space reservation ioctls") Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:04 +01:00
Su Yue	32f6d6d21c	ocfs2: use coarse time for new created files commit b8cb324277ee16f3eca3055b96fce4735a5a41c6 upstream. The default atime related mount option is '-o realtime' which means file atime should be updated if atime <= ctime or atime <= mtime. atime should be updated in the following scenario, but it is not: ========================================================== $ rm /mnt/testfile; $ echo test > /mnt/testfile $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 $ sleep 5 $ cat /mnt/testfile > /dev/null $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 ========================================================== And the reason the atime in the test is not updated is that ocfs2 calls ktime_get_real_ts64() in __ocfs2_mknod_locked during file creation. Then inode_set_ctime_current() is called in inode_set_ctime_current() calls ktime_get_coarse_real_ts64() to get current time. ktime_get_real_ts64() is more accurate than ktime_get_coarse_real_ts64(). In my test box, I saw ctime set by ktime_get_coarse_real_ts64() is less than ktime_get_real_ts64() even ctime is set later. The ctime of the new inode is smaller than atime. The call trace is like: ocfs2_create ocfs2_mknod __ocfs2_mknod_locked .... ktime_get_real_ts64 <------- set atime,ctime,mtime, more accurate ocfs2_populate_inode ... ocfs2_init_acl ocfs2_acl_set_mode inode_set_ctime_current current_time ktime_get_coarse_real_ts64 <-------less accurate ocfs2_file_read_iter ocfs2_inode_lock_atime ocfs2_should_update_atime atime <= ctime ? <-------- false, ctime < atime due to accuracy So here call ktime_get_coarse_real_ts64 to set inode time coarser while creating new files. It may lower the accuracy of file times. But it's not a big deal since we already use coarse time in other places like ocfs2_update_inode_atime and inode_set_ctime_current. Link: https://lkml.kernel.org/r/20240408082041.20925-5-glass.su@suse.com Fixes: c62c38f6b91b ("ocfs2: replace CURRENT_TIME macro") Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:04 +01:00
Rik van Riel	c0411a8866	fs/proc: fix softlockup in __read_vmcore commit 5cbcb62dddf5346077feb82b7b0c9254222d3445 upstream. While taking a kernel core dump with makedumpfile on a larger system, softlockup messages often appear. While softlockup warnings can be harmless, they can also interfere with things like RCU freeing memory, which can be problematic when the kdump kexec image is configured with as little memory as possible. Avoid the softlockup, and give things like work items and RCU a chance to do their thing during __read_vmcore by adding a cond_resched. Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:04 +01:00
Greg Kroah-Hartman	699c4299d1	jfs: xattr: fix buffer overflow for invalid xattr commit 7c55b78818cfb732680c4a72ab270cc2d2ee3d0f upstream. When an xattr size is not what is expected, it is printed out to the kernel log in hex format as a form of debugging. But when that xattr size is bigger than the expected size, printing it out can cause an access off the end of the buffer. Fix this all up by properly restricting the size of the debug hex dump in the kernel log. Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com Cc: Dave Kleikamp <shaggy@kernel.org> Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-19 14:19:02 +01:00
Ryusuke Konishi	4ed0059779	nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors [ Upstream commit 7373a51e7998b508af7136530f3a997b286ce81c ] The error handling in nilfs_empty_dir() when a directory folio/page read fails is incorrect, as in the old ext2 implementation, and if the folio/page cannot be read or nilfs_check_folio() fails, it will falsely determine the directory as empty and corrupt the file system. In addition, since nilfs_empty_dir() does not immediately return on a failed folio/page read, but continues to loop, this can cause a long loop with I/O if i_size of the directory's inode is also corrupted, causing the log writer thread to wait and hang, as reported by syzbot. Fix these issues by making nilfs_empty_dir() immediately return a false value (0) if it fails to get a directory folio/page. Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:02 +01:00
Matthew Wilcox (Oracle)	4fbbbe0f3f	nilfs2: return the mapped address from nilfs_get_page() [ Upstream commit 09a46acb3697e50548bb265afa1d79163659dd85 ] In prepartion for switching from kmap() to kmap_local(), return the kmap address from nilfs_get_page() instead of having the caller look up page_address(). [konishi.ryusuke: fixed a missing blank line after declaration] Link: https://lkml.kernel.org/r/20231127143036.2425-7-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 7373a51e7998 ("nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:02 +01:00
Matthew Wilcox (Oracle)	a07d7a3a37	nilfs2: Remove check for PageError [ Upstream commit 79ea65563ad8aaab309d61eeb4d5019dd6cf5fa0 ] If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this test is not needed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Stable-dep-of: 7373a51e7998 ("nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:02 +01:00
Filipe Manana	cf6c837298	btrfs: fix leak of qgroup extent records after transaction abort [ Upstream commit fb33eb2ef0d88e75564983ef057b44c5b7e4fded ] Qgroup extent records are created when delayed ref heads are created and then released after accounting extents at btrfs_qgroup_account_extents(), called during the transaction commit path. If a transaction is aborted we free the qgroup records by calling btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(), unless we don't have delayed references. We are incorrectly assuming that no delayed references means we don't have qgroup extents records. We can currently have no delayed references because we ran them all during a transaction commit and the transaction was aborted after that due to some error in the commit path. So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs() even if we don't have any delayed references. Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/ Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 14:19:02 +01:00
Ksawlii	bf7fa5ebc4	Revert "namei: introduce struct renamedata" This reverts commit `01eed6c908`.	2024-11-19 13:45:08 +01:00
Ksawlii	8efaad0786	Revert "filelock: add a new locks_inode_context accessor function" This reverts commit `b680f89053`.	2024-11-19 13:30:52 +01:00
Ksawlii	bfc0293419	Revert "fs: add file and path permissions helpers" This reverts commit `ef696bd15f`.	2024-11-19 13:30:21 +01:00
Ksawlii	f7566e9585	Revert "file: Rename __close_fd to close_fd and remove the files parameter" This reverts commit `b0d8979a5d`.	2024-11-19 12:59:13 +01:00
Ksawlii	eef404784c	Revert "file: Replace ksys_close with close_fd" This reverts commit `d7b5b6740b`.	2024-11-19 12:59:08 +01:00
Trond Myklebust	fcdc6938e3	nfsd: Fix a regression in nfsd_setattr() [ Upstream commit 6412e44c40aaf8f1d7320b2099c5bdd6cb9126ac ] Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") broke the NFSv3 pre/post op attributes behaviour when doing a SETATTR rpc call by stripping out the calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Message-ID: <20240216012451.22725-1-trondmy@kernel.org> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
NeilBrown	a1bd05182c	nfsd: don't call locks_release_private() twice concurrently [ Upstream commit 05eda6e75773592760285e10ac86c56d683be17f ] It is possible for free_blocked_lock() to be called twice concurrently, once from nfsd4_lock() and once from nfsd4_release_lockowner() calling remove_blocked_locks(). This is why a kref was added. It is perfectly safe for locks_delete_block() and kref_put() to be called in parallel as they use locking or atomicity respectively as protection. However locks_release_private() has no locking. It is safe for it to be called twice sequentially, but not concurrently. This patch moves that call from free_blocked_lock() where it could race with itself, to free_nbl() where it cannot. This will slightly delay the freeing of private info or release of the owner - but not by much. It is arguably more natural for this freeing to happen in free_nbl() where the structure itself is freed. This bug was found by code inspection - it has not been seen in practice. Fixes: 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
NeilBrown	e054db9cd4	nfsd: don't take fi_lock in nfsd_break_deleg_cb() [ Upstream commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 ] A recent change to check_for_locks() changed it to take ->flc_lock while holding ->fi_lock. This creates a lock inversion (reported by lockdep) because there is a case where ->fi_lock is taken while holding ->flc_lock. ->flc_lock is held across ->fl_lmops callbacks, and nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However it doesn't need to. Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list and so needed the lock. Since then it doesn't walk the list and doesn't need the lock. Two actions are performed under the lock. One is to call nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on the nfs4_file at all, so don't need the lock. The other is to set ->fi_had_conflict which is in the nfs4_file. This field is only ever set here (except when initialised to false) so there is no possible problem will multiple threads racing when setting it. The field is tested twice in nfs4_set_delegation(). The first test does not hold a lock and is documented as an opportunistic optimisation, so it doesn't impose any need to hold ->fi_lock while setting ->fi_had_conflict. The second test in nfs4_set_delegation() is make under ->fi_lock, so removing the locking when ->fi_had_conflict is set could make a change. The change could only be interesting if ->fi_had_conflict tested as false even though nfsd_break_one_deleg() ran before ->fi_lock was unlocked. i.e. while hash_delegation_locked() was running. As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb() there can be no importance to this interaction. So this patch removes the locking from nfsd_break_one_deleg() and moves the final test on ->fi_had_conflict out of the locked region to make it clear that locking isn't important to the test. It is still tested after vfs_setlease() has succeeded. This might be significant and as vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called under ->flc_lock this "after" is a true ordering provided by a spinlock. Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
NeilBrown	0af81ba800	nfsd: fix RELEASE_LOCKOWNER [ Upstream commit edcf9725150e42beeca42d085149f4c88fa97afd ] The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep. First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour. If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error. The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request. When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID. So clearly it is harmful to allow a false positive, which testing so_count allows. The test is nonsense because ... well... it doesn't mean anything. so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks. When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be. In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail. In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success. The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe. So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'. Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
Jeff Layton	51db7bc3ed	nfsd: drop the nfsd_put helper [ Upstream commit 64e6304169f1e1f078e7f0798033f80a7fb0ea46 ] It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer. Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error. Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li <yieli@redhat.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeffrey Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
NeilBrown	4eaf9eb590	nfsd: call nfsd_last_thread() before final nfsd_put() [ Upstream commit 2a501f55cd641eb4d3c16a2eab0d678693fac663 ] If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put() without calling nfsd_last_thread(). This leaves nn->nfsd_serv pointing to a structure that has been freed. So remove 'static' from nfsd_last_thread() and call it when the nfsd_serv is about to be destroyed. Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
NeilBrown	14bbc7e468	NFSD: fix possible oops when nfsd/pool_stats is closed. [ Upstream commit 88956eabfdea7d01d550535af120d4ef265b1d02 ] If /proc/fs/nfsd/pool_stats is open when the last nfsd thread exits, then when the file is closed a NULL pointer is dereferenced. This is because nfsd_pool_stats_release() assumes that the pointer to the svc_serv cannot become NULL while a reference is held. This used to be the case but a recent patch split nfsd_last_thread() out from nfsd_put(), and clearing the pointer is done in nfsd_last_thread(). This is easily reproduced by running rpc.nfsd 8 ; ( rpc.nfsd 0;true) < /proc/fs/nfsd/pool_stats Fortunately nfsd_pool_stats_release() has easy access to the svc_serv pointer, and so can call svc_put() on it directly. Fixes: 9f28a971ee9f ("nfsd: separate nfsd_last_thread() from nfsd_put()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:33 +01:00
NeilBrown	b0701f7372	nfsd: separate nfsd_last_thread() from nfsd_put() [ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ] Now that the last nfsd thread is stopped by an explicit act of calling svc_set_num_threads() with a count of zero, we only have a limited number of places that can happen, and don't need to call nfsd_last_thread() in nfsd_put() So separate that out and call it at the two places where the number of threads is set to zero. Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all() into nfsd_last_thread(), as they are really part of the same action. nfsd_put() is now a thin wrapper around svc_put(), so make it a static inline. nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of places we have to use svc_put() instead. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
NeilBrown	1ae4b794eb	nfsd: Simplify code around svc_exit_thread() call in nfsd() [ Upstream commit 18e4cf915543257eae2925671934937163f5639b ] Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed. Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Chuck Lever	cff4ebbbbf	nfsd: don't allow nfsd threads to be signalled. [ Upstream commit 3903902401451b1cd9d797a8c79769eb26ac7fe5 ] The original implementation of nfsd used signals to stop threads during shutdown. In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads internally it if was asked to run "0" threads. After this user-space transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to threads was no longer an important part of the API. In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the use of signals for stopping threads, using kthread_stop() instead. This patch makes the "obvious" next step and removes the ability to signal nfsd threads - or any svc threads. nfsd stops allowing signals and we don't check for their delivery any more. This will allow for some simplification in later patches. A change worth noting is in nfsd4_ssc_setup_dul(). There was previously a signal_pending() check which would only succeed when the thread was being shut down. It should really have tested kthread_should_stop() as well. Now it just does the latter, not the former. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Tavian Barnes	6b1bd4df50	nfsd: Fix creation time serialization order [ Upstream commit d7dbed457c2ef83709a2a2723a2d58de43623449 ] In nfsd4_encode_fattr(), TIME_CREATE was being written out after all other times. However, they should be written out in an order that matches the bit flags in bmval1, which in this case are #define FATTR4_WORD1_TIME_ACCESS (1UL << 15) #define FATTR4_WORD1_TIME_CREATE (1UL << 18) #define FATTR4_WORD1_TIME_DELTA (1UL << 19) #define FATTR4_WORD1_TIME_METADATA (1UL << 20) #define FATTR4_WORD1_TIME_MODIFY (1UL << 21) so TIME_CREATE should come second. I noticed this on a FreeBSD NFSv4.2 client, which supports creation times. On this client, file times were weirdly permuted. With this patch applied on the server, times looked normal on the client. Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Link: https://unix.stackexchange.com/q/749605/56202 Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Chuck Lever	d97d3191f2	NFSD: Add an nfsd4_encode_nfstime4() helper [ Upstream commit 262176798b18b12fd8ab84c94cfece0a6a652476 ] Clean up: de-duplicate some common code. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
NeilBrown	f99888960d	lockd: drop inappropriate svc_get() from locked_get() [ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ] The below-mentioned patch was intended to simplify refcounting on the svc_serv used by locked. The goal was to only ever have a single reference from the single thread. To that end we dropped a call to lockd_start_svc() (except when creating thread) which would take a reference, and dropped the svc_put(serv) that would drop that reference. Unfortunately we didn't also remove the svc_get() from lockd_create_svc() in the case where the svc_serv already existed. So after the patch: - on the first call the svc_serv was allocated and the one reference was given to the thread, so there are no extra references - on subsequent calls svc_get() was called so there is now an extra reference. This is clearly not consistent. The inconsistency is also clear in the current code in lockd_get() takes two references, one on nlmsvc_serv and one by incrementing nlmsvc_users. This clearly does not match lockd_put(). So: drop that svc_get() from lockd_get() (which used to be in lockd_create_svc(). Reported-by: Ido Schimmel <idosch@idosch.org> Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Dan Carpenter	b85d90d836	nfsd: fix double fget() bug in __write_ports_addfd() [ Upstream commit c034203b6a9dae6751ef4371c18cb77983e30c28 ] The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug. The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function. Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Jeff Layton	4125888f90	nfsd: make a copy of struct iattr before calling notify_change [ Upstream commit d53d70084d27f56bcdf5074328f2c9ec861be596 ] notify_change can modify the iattr structure. In particular it can end up setting ATTR_MODE when ATTR_KILL_SUID is already set, causing a BUG() if the same iattr is passed to notify_change more than once. Make a copy of the struct iattr before calling notify_change. Reported-by: Zhi Li <yieli@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2207969 Tested-by: Zhi Li <yieli@redhat.com> Fixes: 34b91dda7124 ("NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Dai Ngo	dd72e526a9	NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop [ Upstream commit 147abcacee33781e75588869e944ddb07528a897 ] The following request sequence to the same file causes the NFS client and server getting into an infinite loop with COMMIT and NFS4ERR_DELAY: OPEN REMOVE WRITE COMMIT Problem reported by recall11, recall12, recall14, recall20, recall22, recall40, recall42, recall48, recall50 of nfstest suite. This patch restores the handling of race condition in nfsd_file_do_acquire with unlink to that prior of the regression. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Jeff Layton	430ccfd32c	nfsd: simplify the delayed disposal list code [ Upstream commit 92e4a6733f922f0fef1d0995f7b2d0eaff86c7ea ] When queueing a dispose list to the appropriate "freeme" lists, it pointlessly queues the objects one at a time to an intermediate list. Remove a few helpers and just open code a list_move to make it more clear and efficient. Better document the resulting functions with kerneldoc comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Chuck Lever	fc693e3ffc	NFSD: Convert filecache to rhltable [ Upstream commit c4c649ab413ba6a785b25f0edbb12f617c87db2a ] While we were converting the nfs4_file hashtable to use the kernel's resizable hashtable data structure, Neil Brown observed that the list variant (rhltable) would be better for managing nfsd_file items as well. The nfsd_file hash table will contain multiple entries for the same inode -- these should be kept together on a list. And, it could be possible for exotic or malicious client behavior to cause the hash table to resize itself on every insertion. A nice simplification is that rhltable_lookup() can return a list that contains only nfsd_file items that match a given inode, which enables us to eliminate specialized hash table helper functions and use the default functions provided by the rhashtable implementation). Since we are now storing nfsd_file items for the same inode on a single list, that effectively reduces the number of hash entries that have to be tracked in the hash table. The mininum bucket count is therefore lowered. Light testing with fstests generic/531 show no regressions. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Jeff Layton	c28473c81d	nfsd: allow reaping files still under writeback [ Upstream commit dcb779fcd4ed5984ad15991d574943d12a8693d1 ] On most filesystems, there is no reason to delay reaping an nfsd_file just because its underlying inode is still under writeback. nfsd just relies on client activity or the local flusher threads to do writeback. The main exception is NFS, which flushes all of its dirty data on last close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to signal that they do this, and only skip closing files under writeback on such filesystems. Also, remove a redundant NULL file pointer check in nfsd_file_check_writeback, and clean up nfs's export op flag definitions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:32 +01:00
Jeff Layton	1dd7dbd153	nfsd: update comment over __nfsd_file_cache_purge [ Upstream commit 972cc0e0924598cb293b919d39c848dc038b2c28 ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	a7a5c3a7f9	nfsd: don't take/put an extra reference when putting a file [ Upstream commit b2ff1bd71db2a1b193a6dde0845adcd69cbcf75e ] The last thing that filp_close does is an fput, so don't bother taking and putting the extra reference. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	f8d546333b	nfsd: add some comments to nfsd_file_do_acquire [ Upstream commit b680cb9b737331aad271feebbedafb865504e234 ] David Howells mentioned that he found this bit of code confusing, so sprinkle in some comments to clarify. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	ccc3df7195	nfsd: don't kill nfsd_files because of lease break error [ Upstream commit c6593366c0bf222be9c7561354dfb921c611745e ] An error from break_lease is non-fatal, so we needn't destroy the nfsd_file in that case. Just put the reference like we normally would and return the error. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	743db303f1	nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator [ Upstream commit d69b8dbfd0866abc5ec84652cc1c10fc3d4d91ef ] test_bit returns bool, so we can just compare the result of that to the key->gc value without the "!!". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	c27712e985	nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries [ Upstream commit 6c31e4c98853a4ba47355ea151b36a77c42b7734 ] Since v4 files are expected to be long-lived, there's little value in closing them out of the cache when there is conflicting access. Change the comparator to also match the gc value in the key. Change both of the current users of that key to set the gc value in the key to "true". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	171671afc3	nfsd: don't open-code clear_and_wake_up_bit [ Upstream commit b8bea9f6cdd7236c7c2238d022145e9b2f8aac22 ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	617de1a236	nfsd: call op_release, even when op_func returns an error [ Upstream commit 15a8b55dbb1ba154d82627547c5761cac884d810 ] For ops with "trivial" replies, nfsd4_encode_operation will shortcut most of the encoding work and skip to just marshalling up the status. One of the things it skips is calling op_release. This could cause a memory leak in the layoutget codepath if there is an error at an inopportune time. Have the compound processing engine always call op_release, even when op_func sets an error in op->status. With this change, we also need nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL on error to avoid a double free. Reported-by: Zhi Li <yieli@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403 Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Chuck Lever	7f83716ae4	NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL [ Upstream commit 804d8e0a6e54427268790472781e03bc243f4ee3 ] OPDESC() simply indexes into nfsd4_ops[] by the op's operation number, without range checking that value. It assumes callers are careful to avoid calling it with an out-of-bounds opnum value. nfsd4_decode_compound() is not so careful, and can invoke OPDESC() with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end of nfsd4_ops[]. Reported-by: Jeff Layton <jlayton@kernel.org> Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	671dddd4e5	nfsd: don't replace page in rq_pages if it's a continuation of last page [ Upstream commit 27c934dd8832dd40fd34776f916dc201e18b319b ] The splice read calls nfsd_splice_actor to put the pages containing file data into the svc_rqst->rq_pages array. It's possible however to get a splice result that only has a partial page at the end, if (e.g.) the filesystem hands back a short read that doesn't cover the whole page. nfsd_splice_actor will plop the partial page into its rq_pages array and return. Then later, when nfsd_splice_actor is called again, the remainder of the page may end up being filled out. At this point, nfsd_splice_actor will put the page into the array _again_ corrupting the reply. If this is done enough times, rq_next_page will overrun the array and corrupt the trailing fields -- the rq_respages and rq_next_page pointers themselves. If we've already added the page to the array in the last pass, don't add it to the array a second time when dealing with a splice continuation. This was originally handled properly in nfsd_splice_actor, but commit 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") removed the check for it. Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dario Lesca <d.lesca@solinos.it> Tested-by: David Critch <dcritch@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Jeff Layton	b3c26d143d	lockd: set file_lock start and end when decoding nlm4 testargs [ Upstream commit 7ff84910c66c9144cc0de9d9deed9fb84c03aff0 ] Commit 6930bcbfb6ce dropped the setting of the file_lock range when decoding a nlm_lock off the wire. This causes the client side grant callback to miss matching blocks and reject the lock, only to rerequest it 30s later. Add a helper function to set the file_lock range from the start and end values that the protocol uses, and have the nlm_lock decoder call that to set up the file_lock args properly. Fixes: 6930bcbfb6ce ("lockd: detect and reject lock arguments that overflow") Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Amir Goldstein <amir73il@gmail.com> Cc: stable@vger.kernel.org #6.0 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Chuck Lever	c2e03eff68	NFSD: Protect against filesystem freezing [ Upstream commit fd9a2e1d513823e840960cb3bc26d8b7749d4ac2 ] Flole observes this WARNING on occasion: [1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0 Reported-by: <flole@flole.de> Suggested-by: Jan Kara <jack@suse.cz> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123 Fixes: 73da852e3831 ("nfsd: use vfs_iter_read/write") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:31 +01:00
Chuck Lever	ea6be92ff8	NFSD: copy the whole verifier in nfsd_copy_write_verifier [ Upstream commit 90d2175572470ba7f55da8447c72ddd4942923c4 ] Currently, we're only memcpy'ing the first __be32. Ensure we copy into both words. Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	c9cd722efa	nfsd: don't fsync nfsd_files on last close [ Upstream commit 4c475eee02375ade6e864f1db16976ba0d96a0a2 ] Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is usually a no-op and doesn't block. We have a customer running knfsd over very slow storage (XFS over Ceph RBD). They were using the "async" export option because performance was more important than data integrity for this application. That export option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this codepath however, their final CLOSE calls would still stall (since a CLOSE effectively became a COMMIT). I think this fsync is not strictly necessary. We only use that result to reset the write verifier. Instead of fsync'ing all of the data when we free an nfsd_file, we can just check for writeback errors when one is acquired and when it is freed. If the client never comes back, then it'll never see the error anyway and there is no point in resetting it. If an error occurs after the nfsd_file is removed from the cache but before the inode is evicted, then it will reset the write verifier on the next nfsd_file_acquire, (since there will be an unseen error). The only exception here is if something else opens and fsyncs the file during that window. Given that local applications work with this limitation today, I don't see that as an issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-and-tested-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	e58e5a9ed0	nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open [ Upstream commit dcd779dc46540e174a6ac8d52fbed23593407317 ] The nested if statements here make no sense, as you can never reach "else" branch in the nested statement. Fix the error handling for when there is a courtesy client that holds a conflicting deny mode. Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server") Reported-by: 張智諺 <cc85nod@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Dai Ngo	2e28687db0	NFSD: fix problems with cleanup on errors in nfsd4_copy [ Upstream commit 81e722978ad21072470b73d8f6a50ad62c7d5b7d ] When nfsd4_copy fails to allocate memory for async_copy->cp_src, or nfs4_init_copy_state fails, it calls cleanup_async_copy to do the cleanup for the async_copy which causes page fault since async_copy is not yet initialized. This patche rearranges the order of initializing the fields in async_copy and adds checks in cleanup_async_copy to skip un-initialized fields. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	d6d6dd9a01	nfsd: don't hand out delegation on setuid files being opened for write [ Upstream commit 826b67e6376c2a788e3a62c4860dcd79500a27d5 ] We had a bug report that xfstest generic/355 was failing on NFSv4.0. This test sets various combinations of setuid/setgid modes and tests whether DIO writes will cause them to be stripped. What I found was that the server did properly strip those bits, but the client didn't notice because it held a delegation that was not recalled. The recall didn't occur because the client itself was the one generating the activity and we avoid recalls in that case. Clearing setuid bits is an "implicit" activity. The client didn't specifically request that we do that, so we need the server to issue a CB_RECALL, or avoid the situation entirely by not issuing a delegation. The easiest fix here is to simply not give out a delegation if the file is being opened for write, and the mode has the setuid and/or setgid bit set. Note that there is a potential race between the mode and lease being set, so we test for this condition both before and after setting the lease. This patch fixes generic/355, generic/683 and generic/684 for me. (Note that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and fail). Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Dai Ngo	08def480f9	NFSD: fix leaked reference count of nfsd4_ssc_umount_item [ Upstream commit 34e8f9ec4c9ac235f917747b23a200a5e0ec857b ] The reference count of nfsd4_ssc_umount_item is not decremented on error conditions. This prevents the laundromat from unmounting the vfsmount of the source file. This patch decrements the reference count of nfsd4_ssc_umount_item on error. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	b9cd1e8628	nfsd: clean up potential nfsd_file refcount leaks in COPY codepath [ Upstream commit 6ba434cb1a8d403ea9aad1b667c3ea3ad8b3191f ] There are two different flavors of the nfsd4_copy struct. One is embedded in the compound and is used directly in synchronous copies. The other is dynamically allocated, refcounted and tracked in the client struture. For the embedded one, the cleanup just involves releasing any nfsd_files held on its behalf. For the async one, the cleanup is a bit more involved, and we need to dequeue it from lists, unhash it, etc. There is at least one potential refcount leak in this code now. If the kthread_create call fails, then both the src and dst nfsd_files in the original nfsd4_copy object are leaked. The cleanup in this codepath is also sort of weird. In the async copy case, we'll have up to four nfsd_file references (src and dst for both flavors of copy structure). They are both put at the end of nfsd4_do_async_copy, even though the ones held on behalf of the embedded one outlive that structure. Change it so that we always clean up the nfsd_file refs held by the embedded copy structure before nfsd4_copy returns. Rework cleanup_async_copy to handle both inter and intra copies. Eliminate nfsd4_cleanup_intra_ssc since it now becomes a no-op. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	d32a91f5cf	nfsd: allow nfsd_file_get to sanely handle a NULL pointer [ Upstream commit 70f62231cdfd52357836733dd31db787e0412ab2 ] ...and remove some now-useless NULL pointer checks in its callers. Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Dai Ngo	849af9fdc7	NFSD: enhance inter-server copy cleanup [ Upstream commit df24ac7a2e3a9d0bc68f1756a880e50bfe4b4522 ] Currently nfsd4_setup_inter_ssc returns the vfsmount of the source server's export when the mount completes. After the copy is done nfsd4_cleanup_inter_ssc is called with the vfsmount of the source server and it searches nfsd_ssc_mount_list for a matching entry to do the clean up. The problems with this approach are (1) the need to search the nfsd_ssc_mount_list and (2) the code has to handle the case where the matching entry is not found which looks ugly. The enhancement is instead of nfsd4_setup_inter_ssc returning the vfsmount, it returns the nfsd4_ssc_umount_item which has the vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called it's passed with the nfsd4_ssc_umount_item directly to do the clean up so no searching is needed and there is no need to handle the 'not found' case. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ cel: adjusted whitespace and variable/function names ] Reviewed-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	a7fb622c25	nfsd: don't destroy global nfs4_file table in per-net shutdown [ Upstream commit 4102db175b5d884d133270fdbd0e59111ce688fc ] The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net. Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin <jiyin@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Jeff Layton	7566172f2f	nfsd: don't free files unconditionally in __nfsd_file_cache_purge [ Upstream commit 4bdbba54e9b1c769da8ded9abd209d765715e1d6 ] nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed. Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-by: Torkil Svensgaard <torkil@drcmr.dk> Reported-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Dai Ngo	0c58dbb961	NFSD: replace delayed_work with work_struct for nfsd_client_shrinker [ Upstream commit 7c24fa225081f31bc6da6a355c1ba801889ab29a ] Since nfsd4_state_shrinker_count always calls mod_delayed_work with 0 delay, we can replace delayed_work with work_struct to save some space and overhead. Also add the call to cancel_work after unregister the shrinker in nfs4_state_shutdown_net. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:30 +01:00
Dai Ngo	3f14ff974c	NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time [ Upstream commit f385f7d244134246f984975ed34cd75f77de479f ] Currently the nfsd-client shrinker is registered and unregistered at the time the nfsd module is loaded and unloaded. The problem with this is the shrinker is being registered before all of the relevant fields in nfsd_net are initialized when nfsd is started. This can lead to an oops when memory is low and the shrinker is called while nfsd is not running. This patch moves the register/unregister of nfsd-client shrinker from module load/unload time to nfsd startup/shutdown time. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: adjusted to apply without e33c267ab70d ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Xingyuan Mo	677bb9b27e	NFSD: fix use-after-free in nfsd4_ssc_setup_dul() [ Upstream commit e6cf91b7b47ff82b624bdfe2fdcde32bb52e71dd ] If signal_pending() returns true, schedule_timeout() will not be executed, causing the waiting task to remain in the wait queue. Fixed by adding a call to finish_wait(), which ensures that the waiting task will always be removed from the wait queue. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Chuck Lever	f06c52431e	NFSD: Use set_bit(RQ_DROPME) [ Upstream commit 5304930dbae82d259bcf7e5611db7c81e7a42eff ] The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Fixes: 9315564747cb ("NFSD: Use only RQ_DROPME to signal the need to drop a reply") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Chuck Lever	5b133c1ad1	Revert "SUNRPC: Use RMW bitops in single-threaded hot paths" [ Upstream commit 7827c81f0248e3c2f40d438b020f3d222f002171 ] The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Found via KCSAN. Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Jeff Layton	9696c3cb78	nfsd: fix handling of cached open files in nfsd4_open codepath [ Upstream commit 0b3a551fa58b4da941efeb209b3770868e2eddd7 ] Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") added the ability to cache an open fd over a compound. There are a couple of problems with the way this currently works: It's racy, as a newly-created nfsd_file can end up with its PENDING bit cleared while the nf is hashed, and the nf_file pointer is still zeroed out. Other tasks can find it in this state and they expect to see a valid nf_file, and can oops if nf_file is NULL. Also, there is no guarantee that we'll end up creating a new nfsd_file if one is already in the hash. If an extant entry is in the hash with a valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with the value of op_file and the old nf_file will leak. Fix both issues by making a new nfsd_file_acquirei_opened variant that takes an optional file pointer. If one is present when this is called, we'll take a new reference to it instead of trying to open the file. If the nfsd_file already has a valid nf_file, we'll just ignore the optional file and pass the nfsd_file back as-is. Also rework the tracepoints a bit to allow for an "opened" variant and don't try to avoid counting acquisitions in the case where we already have a cached open file. Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") Cc: Trond Myklebust <trondmy@hammerspace.com> Reported-by: Stanislav Saner <ssaner@redhat.com> Reported-and-Tested-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-and-Tested-by: Torkil Svensgaard <torkil@drcmr.dk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Jeff Layton	853853fc1e	nfsd: rework refcounting in filecache [ Upstream commit ac3a2585f018f10039b4a856dcb122da88c1c1c9 ] The filecache refcounting is a bit non-standard for something searchable by RCU, in that we maintain a sentinel reference while it's hashed. This in turn requires that we have to do things differently in the "put" depending on whether its hashed, which we believe to have led to races. There are other problems in here too. nfsd_file_close_inode_sync can end up freeing an nfsd_file while there are still outstanding references to it, and there are a number of subtle ToC/ToU races. Rework the code so that the refcount is what drives the lifecycle. When the refcount goes to zero, then unhash and rcu free the object. A task searching for a nfsd_file is allowed to bump its refcount, but only if it's not already 0. Ensure that we don't make any other changes to it until a reference is held. With this change, the LRU carries a reference. Take special care to deal with it when removing an entry from the list, and ensure that we only repurpose the nf_lru list_head when the refcount is 0 to ensure exclusive access to it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Kees Cook	a4c2756d7b	NFSD: Avoid clashing function prototypes [ Upstream commit e78e274eb22d966258a3845acc71d3c5b8ee2ea8 ] When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1]. There were 97 warnings produced by NFS. For example: fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 ()(struct nfsd4_compoundargs , struct nfsd4_access )' (aka 'unsigned int ()(struct nfsd4_compoundargs , struct nfsd4_access )') to 'nfsd4_dec' (aka 'unsigned int ()(struct nfsd4_compoundargs , void )') converts to incompatible function type [-Wcast-function-type-strict] [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The enc/dec callbacks were defined as passing "void " as the second argument, but were being implicitly cast to a new type. Replace the argument with union nfsd4_op_u, and perform explicit member selection in the function body. There are no resulting binary differences. Changes were made mechanically using the following Coccinelle script, with minor by-hand fixes for members that didn't already match their existing argument name: @find@ identifier func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = (T) func, }; @already_void@ identifier find.func; identifier name; @@ func(..., -void +union nfsd4_op_u name) { ... } @proto depends on !already_void@ identifier find.func; type T; identifier name; position p; @@ func@p(..., T name ) { ... } @script:python get_member@ type_name << proto.T; member; @@ coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0]) @convert@ identifier find.func; type proto.T; identifier proto.name; position proto.p; identifier get_member.member; @@ func@p(..., - T name + union nfsd4_op_u u ) { + T name = &u->member; ... } @cast@ identifier find.func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = - (T) func, }; Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Chuck Lever	05afc461f1	NFSD: Use only RQ_DROPME to signal the need to drop a reply [ Upstream commit 9315564747cb6a570e99196b3a4880fb817635fd ] Clean up: NFSv2 has the only two usages of rpc_drop_reply in the NFSD code base. Since NFSv2 is going away at some point, replace these in order to simplify the "drop this reply?" check in nfsd_dispatch(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Dai Ngo	eb1f5350b0	NFSD: add delegation reaper to react to low memory condition [ Upstream commit 44df6f439a1790a5f602e3842879efa88f346672 ] The delegation reaper is called by nfsd memory shrinker's on the 'count' callback. It scans the client list and sends the courtesy CB_RECALL_ANY to the clients that hold delegations. To avoid flooding the clients with CB_RECALL_ANY requests, the delegation reaper sends only one CB_RECALL_ANY request to each client per 5 seconds. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Dai Ngo	d92173345d	NFSD: add support for sending CB_RECALL_ANY [ Upstream commit 3959066b697b5dfbb7141124ae9665337d4bc638 ] Add XDR encode and decode function for CB_RECALL_ANY. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:29 +01:00
Dai Ngo	1c7eecade4	NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker [ Upstream commit a1049eb47f20b9eabf9afb218578fff16b4baca6 ] Refactoring courtesy_client_reaper to generic low memory shrinker so it can be used for other purposes. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Brian Foster	f459774968	NFSD: pass range end to vfs_fsync_range() instead of count [ Upstream commit 79a1d88a36f77374c77fd41a4386d8c2736b8704 ] _nfsd_copy_file_range() calls vfs_fsync_range() with an offset and count (bytes written), but the former wants the start and end bytes of the range to sync. Fix it up. Fixes: eac0b17a77fb ("NFSD add vfs_fsync after async copy is done") Signed-off-by: Brian Foster <bfoster@redhat.com> Tested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	b4046b2336	lockd: fix file selection in nlmsvc_cancel_blocked [ Upstream commit 9f27783b4dd235ef3c8dbf69fc6322777450323c ] We currently do a lock_to_openmode call based on the arguments from the NLM_UNLOCK call, but that will always set the fl_type of the lock to F_UNLCK, and the O_RDONLY descriptor is always chosen. Fix it to use the file_lock from the block instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	a0ebbd2486	lockd: ensure we use the correct file descriptor when unlocking [ Upstream commit 69efce009f7df888e1fede3cb2913690eb829f52 ] Shared locks are set on O_RDONLY descriptors and exclusive locks are set on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once for each descriptor, but it doesn't reset fl_file. Ensure that it does. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	559e7dd79f	lockd: set missing fl_flags field when retrieving args [ Upstream commit 75c7940d2a86d3f1b60a0a265478cb8fc887b970 ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Xiu Jianfeng	dd09c15f31	NFSD: Use struct_size() helper in alloc_session() [ Upstream commit 85a0d0c9a58002ef7d1bf5e3ea630f4fbd42a4f0 ] Use struct_size() helper to simplify the code, no functional changes. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	35346e6ecf	nfsd: return error if nfs4_setacl fails [ Upstream commit 01d53a88c08951f88f2a42f1f1e6568928e0590e ] With the addition of POSIX ACLs to struct nfsd_attrs, we no longer return an error if setting the ACL fails. Ensure we return the na_aclerr error on SETATTR if there is one. Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Neil Brown <neilb@suse.de> Reported-by: Yongcheng Yang <yoyang@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Trond Myklebust	d9c678470a	lockd: set other missing fields when unlocking files [ Upstream commit 18ebd35b61b4693a0ddc270b6d4f18def232e770 ] vfs_lock_file() expects the struct file_lock to be fully initialised by the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field is NULL. Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216582 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Chuck Lever	60297b262f	NFSD: Add an nfsd_file_fsync tracepoint [ Upstream commit d7064eaf688cfe454c50db9f59298463d80d403c ] Add a tracepoint to capture the number of filecache-triggered fsync calls and which files needed it. Also, record when an fsync triggers a write verifier reset. Examples: <...>-97 [007] 262.505611: nfsd_file_free: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 <...>-97 [007] 262.505612: nfsd_file_fsync: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0 <...>-97 [007] 262.505623: nfsd_file_free: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 <...>-97 [007] 262.505624: nfsd_file_fsync: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	0bb0fff135	nfsd: fix up the filecache laundrette scheduling [ Upstream commit 22ae4c114f77b55a4c5036e8f70409a0799a08f8 ] We don't really care whether there are hashed entries when it comes to scheduling the laundrette. They might all be non-gc entries, after all. We only want to schedule it if there are entries on the LRU. Switch to using list_lru_count, and move the check into nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to count entries, since it only schedules the laundrette after adding an entry to the LRU. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	2cb80c1130	nfsd: reorganize filecache.c [ Upstream commit 8214118589881b2d390284410c5ff275e7a5e03c ] In a coming patch, we're going to rework how the filecache refcounting works. Move some code around in the function to reduce the churn in the later patches, and rename some of the functions with (hopefully) clearer names: nfsd_file_flush becomes nfsd_file_fsync, and nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue. Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free, to better match the name of the function from which it's called. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Jeff Layton	09ea1b1def	nfsd: remove the pages_flushed statistic from filecache [ Upstream commit 1f696e230ea5198e393368b319eb55651828d687 ] We're counting mapping->nrpages, but not all of those are necessarily dirty. We don't really have a simple way to count just the dirty pages, so just remove this stat since it's not accurate. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Chuck Lever	b4bd3bc3d3	NFSD: Fix licensing header in filecache.c [ Upstream commit 3f054211b29c0fa06dfdcab402c795fd7e906be1 ] Add a missing SPDX header. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:28 +01:00
Chuck Lever	63cf07cbfe	NFSD: Use rhashtable for managing nfs4_file objects [ Upstream commit d47b295e8d76a4d69f0e2ea0cd8a79c9d3488280 ] fh_match() is costly, especially when filehandles are large (as is the case for NFSv4). It needs to be used sparingly when searching data structures. Unfortunately, with common workloads, I see multiple thousands of objects stored in file_hashtbl[], which has just 256 buckets, making its bucket hash chains quite lengthy. Walking long hash chains with the state_lock held blocks other activity that needs that lock. Sizable hash chains are a common occurrance once the server has handed out some delegations, for example -- IIUC, each delegated file is held open on the server by an nfs4_file object. To help mitigate the cost of searching with fh_match(), replace the nfs4_file hash table with an rhashtable, which can dynamically resize its bucket array to minimize hash chain length. The result of this modification is an improvement in the latency of NFSv4 operations, and the reduction of nfsd CPU utilization due to eliminating the cost of multiple calls to fh_match() and reducing the CPU cache misses incurred while walking long hash chains in the nfs4_file hash table. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	a4db255604	NFSD: Refactor find_file() [ Upstream commit 15424748001a9b5ea62b3e6ad45f0a8b27f01df9 ] find_file() is now the only caller of find_file_locked(), so just fold these two together. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	6b9ec66eaa	NFSD: Clean up find_or_add_file() [ Upstream commit 9270fc514ba7d415636b23bcb937573a1ce54f6a ] Remove the call to find_file_locked() in insert_nfs4_file(). Tracing shows that over 99% of these calls return NULL. Thus it is not worth the expense of the extra bucket list traversal. insert_file() already deals correctly with the case where the item is already in the hash bucket. Since nfsd4_file_hash_insert() is now just a wrapper around insert_file(), move the meat of insert_file() into nfsd4_file_hash_insert() and get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	da08c0004c	NFSD: Add a nfsd4_file_hash_remove() helper [ Upstream commit 3341678f2fd6106055cead09e513fad6950a0d19 ] Refactor to relocate hash deletion operation to a helper function that is close to most other nfs4_file data structure operations. The "noinline" annotation will become useful in a moment when the hlist_del_rcu() is replaced with a more complex rhash remove operation. It also guarantees that hash remove operations can be traced with "-p function -l remove_nfs4_file_locked". This also simplifies the organization of forward declarations: the to-be-added rhashtable and its param structure will be defined /after/ put_nfs4_file(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	47e26f845b	NFSD: Clean up nfsd4_init_file() [ Upstream commit 81a21fa3e7fdecb3c5b97014f0fc5a17d5806cae ] Name this function more consistently. I'm going to use nfsd4_file_ and nfsd4_file_hash_ for these helpers. Change the @fh parameter to be const pointer for better type safety. Finally, move the hash insertion operation to the caller. This is typical for most other "init_object" type helpers, and it is where most of the other nfs4_file hash table operations are located. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	959ce47191	NFSD: Update file_hashtbl() helpers [ Upstream commit 3fe828caddd81e68e9d29353c6e9285a658ca056 ] Enable callers to use const pointers for type safety. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	8e8cbb20aa	NFSD: Use const pointers as parameters to fh_ helpers [ Upstream commit b48f8056c034f28dd54668399f1d22be421b0bef ] Enable callers to use const pointers where they are able to. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	a311157fbd	NFSD: Trace delegation revocations [ Upstream commit a1c74569bbde91299f24535abf711be5c84df9de ] Delegation revocation is an exceptional event that is not otherwise visible externally (eg, no network traffic is emitted). Generate a trace record when it occurs so that revocation can be observed or other activity can be triggered. Example: nfsd-1104 [005] 1912.002544: nfsd_stid_revoke: client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG Trace infrastructure is provided for subsequent additional tracing related to nfs4_stid activity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	1f3a273889	NFSD: Trace stateids returned via DELEGRETURN [ Upstream commit 20eee313ff4b8a7e71ae9560f5c4ba27cd763005 ] Handing out a delegation stateid is recorded with the nfsd_deleg_read tracepoint, but there isn't a matching tracepoint for recording when the stateid is returned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	9e94840263	NFSD: Clean up nfs4_preprocess_stateid_op() call sites [ Upstream commit eeff73f7c1c583f79a401284f46c619294859310 ] Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op() call sites. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:27 +01:00
Chuck Lever	747260445d	NFSD: Flesh out a documenting comment for filecache.c [ Upstream commit b3276c1f5b268ff56622e9e125b792b4c3dc03ac ] Record what we've learned recently about the NFSD filecache in a documenting comment so our future selves don't forget what all this is for. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Chuck Lever	e4ab944aae	NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection [ Upstream commit 4d1ea8455716ca070e3cd85767e6f6a562a58b1b ] NFSv4 operations manage the lifetime of nfsd_file items they use by means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be garbage collected. Introduce a mechanism to enable garbage collection for nfsd_file items used only by NFSv2/3 callers. Note that the change in nfsd_file_put() ensures that both CLOSE and DELEGRETURN will actually close out and free an nfsd_file on last reference of a non-garbage-collected file. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394 Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Chuck Lever	bfcd8d8cf5	NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" [ Upstream commit dcf3f80965ca787c70def402cdf1553c93c75529 ] This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2. That commit attempted to make files available to other users as soon as all NFSv4 clients were done with them, rather than waiting until the filecache LRU had garbage collected them. It gets the reference counting wrong, for one thing. But it also misses that DELEGRETURN should release a file in the same fashion. In fact, any nfsd_file_put() on an file held open by an NFSv4 client needs potentially to release the file immediately... Clear the way for implementing that idea. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Chuck Lever	4324671684	NFSD: Pass the target nfsd_file to nfsd_commit() [ Upstream commit c252849082ff525af18b4f253b3c9ece94e951ed ] In a moment I'm going to introduce separate nfsd_file types, one of which is garbage-collected; the other, not. The garbage-collected variety is to be used by NFSv2 and v3, and the non-garbage-collected variety is to be used by NFSv4. nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want nfsd_commit() to find and use the correct variety of cached nfsd_file object for the NFS version that is in use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
David Disseldorp	6fae1ce94d	exportfs: use pr_debug for unreachable debug statements [ Upstream commit 427505ffeaa464f683faba945a88d3e3248f6979 ] expfs.c has a bunch of dprintk statements which are unusable due to: #define dprintk(fmt, args...) do{}while(0) Use pr_debug so that they can be enabled dynamically. Also make some minor changes to the debug statements to fix some incorrect types, and remove __func__ which can be handled by dynamic debug separately. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Jeff Layton	08474128b1	nfsd: allow disabling NFSv2 at compile time [ Upstream commit 2f3a4b2ac2f28b9be78ad21f401f31e263845214 ] rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical step toward deprecating it and allow NFSv2 support to be compiled out. Add a new CONFIG_NFSD_V2 option that can be turned off and rework the CONFIG_NFSD_V?_ACL option dependencies. Add a description that discourages enabling it. Also, change the description of CONFIG_NFSD to state that the always-on version is now 3 instead of 2. Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2 is disabled at compile time, this should make the kernel ignore attempts to disable it at runtime, but still error out when trying to enable it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Jeff Layton	088f060dea	nfsd: move nfserrno() to vfs.c [ Upstream commit cb12fae1c34b1fa7eaae92c5aadc72d86d7fae19 ] nfserrno() is common to all nfs versions, but nfsproc.c is specifically for NFSv2. Move it to vfs.c, and the prototype to vfs.h. While we're in here, remove the #ifdef EDQUOT check in this function. It's apparently a holdover from the initial merge of the nfsd code in 1997. No other place in the kernel checks that that symbol is defined before using it, so I think we can dispense with it here. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Jeff Layton	dce08a01f8	nfsd: ignore requests to disable unsupported versions [ Upstream commit 8e823bafff2308753d430566256c83d8085952da ] The kernel currently errors out if you attempt to enable or disable a version that it doesn't recognize. Change it to ignore attempts to disable an unrecognized version. If we don't support it, then there is no harm in doing so. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Chuck Lever	570eaff034	NFSD: Finish converting the NFSv3 GETACL result encoder [ Upstream commit 841fd0a3cb490eae5dfd262eccb8c8b11d57f8b8 ] For some reason, the NFSv2 GETACL result encoder was fully converted to use the new nfs_stream_encode_acl(), but the NFSv3 equivalent was not similarly converted. Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Chuck Lever	b099398d3f	NFSD: Finish converting the NFSv2 GETACL result encoder [ Upstream commit ea5021e911d3479346a75ac9b7d9dcd751b0fb99 ] The xdr_stream conversion inadvertently left some code that set the page_len of the send buffer. The XDR stream encoders should handle this automatically now. This oversight adds garbage past the end of the Reply message. Clients typically ignore the garbage, but NFSD does not need to send it, as it leaks stale memory contents onto the wire. Fixes: f8cba47344f7 ("NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Colin Ian King	42712fbd4e	NFSD: Remove redundant assignment to variable host_err [ Upstream commit 69eed23baf877bbb1f14d7f4df54f89807c9ee2a ] Variable host_err is assigned a value that is never read, it is being re-assigned a value in every different execution path in the following switch statement. The assignment is redundant and can be removed. Cleans up clang-scan warning: warning: Value stored to 'host_err' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Anna Schumaker	4ddad8c2f3	NFSD: Simplify READ_PLUS [ Upstream commit eeadcb75794516839078c28b3730132aeb700ce6 ] Chuck had suggested reverting READ_PLUS so it returns a single DATA segment covering the requested read range. This prepares the server for a future "sparse read" function so support can easily be added without needing to rip out the old READ_PLUS code at the same time. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:26 +01:00
Jeff Layton	d4d1b030f5	nfsd: use locks_inode_context helper [ Upstream commit 77c67530e1f95ac25c7075635f32f04367380894 ] nfsd currently doesn't access i_flctx safely everywhere. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	a94f9c633d	lockd: use locks_inode_context helper [ Upstream commit 98b41ffe0afdfeaa1439a5d6bd2db4a94277e31b ] lockd currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	b680f89053	filelock: add a new locks_inode_context accessor function [ Upstream commit 401a8b8fd5acd51582b15238d72a8d0edd580e9f ] There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure. Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Chuck Lever	80ea82674d	NFSD: Fix reads with a non-zero offset that don't end on a page boundary [ Upstream commit ac8db824ead0de2e9111337c401409d010fba2f0 ] This was found when virtual machines with nfs-mounted qcow2 disks failed to boot properly. Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132 Fixes: bfbfb6182ad1 ("nfsd_splice_actor(): handle compound pages") [ cel: "‘for’ loop initial declarations are only allowed in C99 or C11 mode" ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	597f386f19	nfsd: put the export reference in nfsd4_verify_deleg_dentry [ Upstream commit 50256e4793a5e5ab77703c82a47344ad2e774a59 ] nfsd_lookup_dentry returns an export reference in addition to the dentry ref. Ensure that we put it too. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138866 Fixes: 876c553cb410 ("NFSD: verify the opened dentry after setting a delegation") Reported-by: Yongcheng Yang <yoyang@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	e234918ef2	nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint [ Upstream commit bdd6b5624c62d0acd350d07564f1c82fe649235f ] When we fail to insert into the hashtable with a non-retryable error, we'll free the object and then goto out_status. If the tracepoint is enabled, it'll end up accessing the freed object when it tries to grab the fields out of it. Set nf to NULL after freeing it to avoid the issue. Fixes: 243a5263014a ("nfsd: rework hashtable handling in nfsd_do_file_acquire") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	cd363ec97d	nfsd: fix net-namespace logic in __nfsd_file_cache_purge [ Upstream commit d3aefd2b29ff5ffdeb5c06a7d3191a027a18cdb8 ] If the namespace doesn't match the one in "net", then we'll continue, but that doesn't cause another rhashtable_walk_next call, so it will loop infinitely. Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable") Reported-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/ Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Tetsuo Handa	85fe1743c9	NFSD: unregister shrinker when nfsd_init_net() fails [ Upstream commit bd86c69dae65de30f6d47249418ba7889809e31a ] syzbot is reporting UAF read at register_shrinker_prepared() [1], for commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") missed that nfsd4_leases_net_shutdown() from nfsd_exit_net() is called only when nfsd_init_net() succeeded. If nfsd_init_net() fails due to nfsd_reply_cache_init() failure, register_shrinker() from nfsd4_init_leases_net() has to be undone before nfsd_init_net() returns. Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89 [1] Reported-by: syzbot <syzbot+ff796f04613b4c84ad89@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	b32940c723	nfsd: rework hashtable handling in nfsd_do_file_acquire [ Upstream commit 243a5263014a30436c93ed3f1f864c1da845455e ] nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough to get a reference after finding it in the hash. Take the rcu_read_lock() and call rhashtable_lookup directly. Switch to using rhashtable_lookup_insert_key as well, and use the usual retry mechanism if we hit an -EEXIST. Rename the "retry" bool to open_retry, and eliminiate the insert_err goto target. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Jeff Layton	750b504bd3	nfsd: fix nfsd_file_unhash_and_dispose [ Upstream commit 8d0d254b15cc5b7d46d85fb7ab8ecede9575e672 ] nfsd_file_unhash_and_dispose() is called for two reasons: We're either shutting down and purging the filecache, or we've gotten a notification about a file delete, so we want to go ahead and unhash it so that it'll get cleaned up when we close. We're either walking the hashtable or doing a lookup in it and we don't take a reference in either case. What we want to do in both cases is to try and unhash the object and put it on the dispose list if that was successful. If it's no longer hashed, then we don't want to touch it, with the assumption being that something else is already cleaning up the sentinel reference. Instead of trying to selectively decrement the refcount in this function, just unhash it, and if that was successful, move it to the dispose list. Then, the disposal routine will just clean that up as usual. Also, just make this a void function, drop the WARN_ON_ONCE, and the comments about deadlocking since the nature of the purported deadlock is no longer clear. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Gaosheng Cui	f92da4eeb7	fanotify: Remove obsoleted fanotify_event_has_path() [ Upstream commit 7a80bf902d2bc722b4477442ee772e8574603185 ] All uses of fanotify_event_has_path() have been removed since commit 9c61f3b560f5 ("fanotify: break up fanotify_alloc_event()"), now it is useless, so remove it. Link: https://lore.kernel.org/r/20220926023018.1505270-1-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> [ cel: resolved merge conflict ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Gaosheng Cui	2bccd4ca2b	fsnotify: remove unused declaration [ Upstream commit f847c74d6e89f10926db58649a05b99237258691 ] fsnotify_alloc_event_holder() and fsnotify_destroy_event_holder() has been removed since commit 7053aee26a35 ("fsnotify: do not share events between notification groups"), so remove it. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:25 +01:00
Al Viro	4019788d30	fs/notify: constify path [ Upstream commit d5bf88895f24686641c39420ee6df716dc1d95d8 ] Reviewed-by: Matthew Bobrowski <repnop@google.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Jeff Layton	cee9958b56	nfsd: extra checks when freeing delegation stateids [ Upstream commit 895ddf5ed4c54ea9e3533606d7a8b4e4f27f95ef ] We've had some reports of problems in the refcounting for delegation stateids that we've yet to track down. Add some extra checks to ensure that we've removed the object from various lists before freeing it. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2127067 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Jeff Layton	e449198bdf	nfsd: make nfsd4_run_cb a bool return function [ Upstream commit b95239ca4954a0d48b19c09ce7e8f31b453b4216 ] queue_work can return false and not queue anything, if the work is already queued. If that happens in the case of a CB_RECALL, we'll have taken an extra reference to the stid that will never be put. Ensure we throw a warning in that case. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Jeff Layton	97136d46f3	nfsd: fix comments about spinlock handling with delegations [ Upstream commit 25fbe1fca14142beae6c882f7906510363d42bff ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Jeff Layton	3ebcecc70b	nfsd: only fill out return pointer on success in nfsd4_lookup_stateid [ Upstream commit 4d01416ab41540bb13ec4a39ac4e6c4aa5934bc9 ] In the case of a revoked delegation, we still fill out the pointer even when returning an error, which is bad form. Only overwrite the pointer on success. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Chuck Lever	49512f390c	NFSD: Cap rsize_bop result based on send buffer size [ Upstream commit 76ce4dcec0dc08a032db916841ddc4e3998be317 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. Add an NFSv4 helper that computes the size of the send buffer. It replaces svc_max_payload() in spots where svc_max_payload() returns a value that might be larger than the remaining send buffer space. Callers who need to know the transport's actual maximum payload size will continue to use svc_max_payload(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Chuck Lever	1f2e1bc5f1	NFSD: Rename the fields in copy_stateid_t [ Upstream commit 781fde1a2ba2391f31142f46f964cf1148ca1791 ] Code maintenance: The name of the copy_stateid_t::sc_count field collides with the sc_count field in struct nfs4_stid, making the latter difficult to grep for when auditing stateid reference counting. No behavior change expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
ChenXiaoSong	dd6e1f79a4	nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops [ Upstream commit 1342f9dd3fc219089deeb2620f6790f19b4129b1 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
ChenXiaoSong	9281c30c7b	nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops [ Upstream commit 64776611a06322b99386f8dfe3b3ba1aa0347a38 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. nfsd_net is converted from seq_file->file instead of seq_file->private in nfsd_reply_cache_stats_show(). Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> [ cel: reduce line length ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
ChenXiaoSong	e238dc11a8	nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops [ Upstream commit 1d7f6b302b75ff7acb9eb3cab0c631b10cfa7542 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. inode is converted from seq_file->file instead of seq_file->private in client_info_show(). Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
ChenXiaoSong	f59427520b	nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops [ Upstream commit 9beeaab8e05d353d709103cafa1941714b4d5d94 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> [ cel: reduce line length ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
ChenXiaoSong	0e8c37ea28	nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops [ Upstream commit 0cfb0c4228a5c8e2ed2b58f8309b660b187cef02 ] Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:24 +01:00
Chuck Lever	d437b8382a	NFSD: Pack struct nfsd4_compoundres [ Upstream commit 9f553e61bd36c1048543ac2f6945103dd2f742be ] Remove a couple of 4-byte holes on platforms with 64-bit pointers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	7679d5f949	NFSD: Remove unused nfsd4_compoundargs::cachetype field [ Upstream commit 77e378cf2a595d8e39cddf28a31efe6afd9394a0 ] This field was added by commit 1091006c5eb1 ("nfsd: turn on reply cache for NFSv4") but was never put to use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	b9f25e7749	NFSD: Remove "inline" directives on op_rsize_bop helpers [ Upstream commit 6604148cf961b57fc735e4204f8996536da9253c ] These helpers are always invoked indirectly, so the compiler can't inline these anyway. While we're updating the synopses of these helpers, defensively convert their parameters to const pointers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	ad36491e05	NFSD: Clean up nfs4svc_encode_compoundres() [ Upstream commit 9993a66317fc9951322483a9edbfae95a640b210 ] In today's Linux NFS server implementation, the NFS dispatcher initializes each XDR result stream, and the NFSv4 .pc_func and .pc_encode methods all use xdr_stream-based encoding. This keeps rq_res.len automatically updated. There is no longer a need for the WARN_ON_ONCE() check in nfs4svc_encode_compoundres(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	8e9e8ab202	NFSD: Clean up WRITE arg decoders [ Upstream commit d4da5baa533215b14625458e645056baf646bb2e ] xdr_stream_subsegment() already returns a boolean value. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	8e65b00166	NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks [ Upstream commit c3d2a04f05c590303c125a176e6e43df4a436fdb ] Replace the check for buffer over/underflow with a helper that is commonly used for this purpose. The helper also sets xdr->nwords correctly after successfully linearizing the symlink argument into the stream's scratch buffer. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	f6d3097d21	NFSD: Refactor common code out of dirlist helpers [ Upstream commit 98124f5bd6c76699d514fbe491dd95265369cc99 ] The dust has settled a bit and it's become obvious what code is totally common between nfsd_init_dirlist_pages() and nfsd3_init_dirlist_pages(). Move that common code to SUNRPC. The new helper brackets the existing xdr_init_decode_pages() API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	1b7f6bc7bb	NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing [ Upstream commit 3fdc546462348b8a497c72bc894e0cde9f10fc40 ] Have SunRPC clear everything except for the iops array. Then have each NFSv4 XDR decoder clear it's own argument before decoding. Now individual operations may have a large argument struct while not penalizing the vast majority of operations with a small struct. And, clearing the argument structure occurs as the argument fields are initialized, enabling the CPU to do write combining on that memory. In some cases, clearing is not even necessary because all of the fields in the argument structure are initialized by the decoder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	f98b7203d3	SUNRPC: Parametrize how much of argsize should be zeroed [ Upstream commit 103cc1fafee48adb91fca0e19deb869fd23e46ab ] Currently, SUNRPC clears the whole of .pc_argsize before processing each incoming RPC transaction. Add an extra parameter to struct svc_procedure to enable upper layers to reduce the amount of each operation's argument structure that is zeroed by SUNRPC. The size of struct nfsd4_compoundargs, in particular, is a lot to clear on each incoming RPC Call. A subsequent patch will cut this down to something closer to what NFSv2 and NFSv3 uses. This patch should cause no behavior changes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Dai Ngo	69510ff32c	NFSD: add shrinker to reap courtesy clients on low memory condition [ Upstream commit 7746b32f467b3813fb61faaab3258de35806a7ac ] Add courtesy_client_reaper to react to low memory condition triggered by the system memory shrinker. The delayed_work for the courtesy_client_reaper is scheduled on the shrinker's count callback using the laundry_wq. The shrinker's scan callback is not used for expiring the courtesy clients due to potential deadlocks. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: adjusted to apply without e33c267ab70d ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Dai Ngo	6a3b8b1a29	NFSD: keep track of the number of courtesy clients in the system [ Upstream commit 3a4ea23d86a317c4b68b9a69d51f7e84e1e04357 ] Add counter nfs4_courtesy_client_count to nfsd_net to keep track of the number of courtesy clients in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:23 +01:00
Chuck Lever	9f62fbb137	NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY [ Upstream commit 5f5f8b6d655fd947e899b1771c2f7cb581a06764 ] nfsd_unlink() can kick off a CB_RECALL (via vfs_unlink() -> leases_conflict()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_unlink() again, once. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	83968fd144	NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY [ Upstream commit 68c522afd0b1936b48a03a4c8b81261e7597c62d ] nfsd_rename() can kick off a CB_RECALL (via vfs_rename() -> leases_conflict()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_rename() again, once. This version of the patch handles renaming an existing file, but does not deal with renaming onto an existing file. That case will still always trigger an NFS4ERR_DELAY. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	247f336ab3	NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY [ Upstream commit 34b91dda7124fc3259e4b2ae53e0c933dedfec01 ] nfsd_setattr() can kick off a CB_RECALL (via notify_change() -> break_lease()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_setattr() again, once. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	f6546852a6	NFSD: Refactor nfsd_setattr() [ Upstream commit c0aa1913db57219e91a0a8832363cbafb3a9cf8f ] Move code that will be retried (in a subsequent patch) into a helper function. Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	e54d48549d	NFSD: Add a mechanism to wait for a DELEGRETURN [ Upstream commit c035362eb935fe9381d9d1cc453bc2a37460e24c ] Subsequent patches will use this mechanism to wake up an operation that is waiting for a client to return a delegation. The new tracepoint records whether the wait timed out or was properly awoken by the expected DELEGRETURN: nfsd-1155 [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out) Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	b7367d586d	NFSD: Add tracepoints to report NFSv4 callback completions [ Upstream commit 1035d65446a018ca2dd179e29a2fcd6d29057781 ] Wireshark has always been lousy about dissecting NFSv4 callbacks, especially NFSv4.0 backchannel requests. Add tracepoints so we can surgically capture these events in the trace log. Tracepoints are time-stamped and ordered so that we can now observe the timing relationship between a CB_RECALL Reply and the client's DELEGRETURN Call. Example: nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001 nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2 nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0 nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED\|REFERENCED may=READ file=0xffff929103b3ea00 nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0 kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Gaosheng Cui	5ea6d5466c	nfsd: remove nfsd4_prepare_cb_recall() declaration [ Upstream commit 18224dc58d960c65446971930d0487fc72d00598 ] nfsd4_prepare_cb_recall() has been removed since commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Jeff Layton	010163df19	nfsd: clean up mounted_on_fileid handling [ Upstream commit 6106d9119b6599fa23dc556b429d887b4c2d9f62 ] We only need the inode number for this, not a full rack of attributes. Rename this function make it take a pointer to a u64 instead of struct kstat, and change it to just request STATX_INO. Signed-off-by: Jeff Layton <jlayton@kernel.org> [ cel: renamed get_mounted_on_ino() ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	d5de26ae3e	NFSD: Fix handling of oversized NFSv4 COMPOUND requests [ Upstream commit 7518a3dc5ea249d4112156ce71b8b184eb786151 ] If an NFS server returns NFS4ERR_RESOURCE on the first operation in an NFSv4 COMPOUND, there's no way for a client to know where the problem is and then simplify the compound to make forward progress. So instead, make NFSD process as many operations in an oversized COMPOUND as it can and then return NFS4ERR_RESOURCE on the first operation it did not process. pynfs NFSv4.0 COMP6 exercises this case, but checks only for the COMPOUND status code, not whether the server has processed any of the operations. pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects too many operations per COMPOUND by checking against the limits negotiated when the session was created. Suggested-by: Bruce Fields <bfields@fieldses.org> Fixes: 0078117c6d91 ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
NeilBrown	3568fa858e	NFSD: drop fname and flen args from nfsd_create_locked() [ Upstream commit 9558f9304ca1903090fa5d995a3269a8e82804b4 ] nfsd_create_locked() does not use the "fname" and "flen" arguments, so drop them from declaration and all callers. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	0fe5d64021	NFSD: Protect against send buffer overflow in NFSv3 READ [ Upstream commit fa6be9cc6e80ec79892ddf08a8c10cabab9baf38 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:22 +01:00
Chuck Lever	c15bc8a9c4	NFSD: Protect against send buffer overflow in NFSv2 READ [ Upstream commit 401bc1f90874280a80b93f23be33a0e7e2d1f912 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Chuck Lever	956015a28a	NFSD: Protect against send buffer overflow in NFSv3 READDIR [ Upstream commit 640f87c190e0d1b2a0fcb2ecf6d2cd53b1c41991 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply message at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Thanks to Aleksi Illikainen and Kari Hulkko for uncovering this issue. Reported-by: Ben Ronallo <Benjamin.Ronallo@synopsys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Chuck Lever	abcc21fbd6	NFSD: Protect against send buffer overflow in NFSv2 READDIR [ Upstream commit 00b4492686e0497fdb924a9d4c8f6f99377e176c ] Restore the previous limit on the @count argument to prevent a buffer overflow attack. Fixes: 53b1119a6e50 ("NFSD: Fix READDIR buffer overflow") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Chuck Lever	9cbaef3ecb	NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND [ Upstream commit 80e591ce636f3ae6855a0ca26963da1fdd6d4508 ] When attempting an NFSv4 mount, a Solaris NFSv4 client builds a single large COMPOUND that chains a series of LOOKUPs to get to the pseudo filesystem root directory that is to be mounted. The Linux NFS server's current maximum of 16 operations per NFSv4 COMPOUND is not large enough to ensure that this works for paths that are more than a few components deep. Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not trigger any re-allocation of the operation array on the server), increasing this maximum should result in little to no impact. The ops array can get large now, so allocate it via vmalloc() to help ensure memory fragmentation won't cause an allocation failure. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Christophe JAILLET	99d370f0ed	nfsd: Propagate some error code returned by memdup_user() [ Upstream commit 30a30fcc3fc1ad4c5d017c9fcb75dc8f59e7bdad ] Propagate the error code returned by memdup_user() instead of a hard coded -EFAULT. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Christophe JAILLET	f3221f6ee2	nfsd: Avoid some useless tests [ Upstream commit d44899b8bb0b919f923186c616a84f0e70e04772 ] memdup_user() can't return NULL, so there is no point for checking for it. Simplify some tests accordingly. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Jinpeng Cui	0af51701b1	NFSD: remove redundant variable status [ Upstream commit 4ab3442ca384a02abf8b1f2b3449a6c547851873 ] Return value directly from fh_verify() do_open_permission() exp_pseudoroot() instead of getting value from redundant variable status. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Olga Kornievskaia	1304ec2084	NFSD enforce filehandle check for source file in COPY [ Upstream commit 754035ff79a14886e68c0c9f6fa80adb21f12b53 ] If the passed in filehandle for the source file in the COPY operation is not a regular file, the server MUST return NFS4ERR_WRONG_TYPE. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Wolfram Sang	3f584d9d17	lockd: move from strlcpy with unused retval to strscpy [ Upstream commit 97f8e62572555f8ad578d7b1739ba64d5d2cac0f ] Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Wolfram Sang	56409d62dc	NFSD: move from strlcpy with unused retval to strscpy [ Upstream commit 72f78ae00a8e5d7abe13abac8305a300f6afd74b ] Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Al Viro	bf60e51731	nfsd_splice_actor(): handle compound pages [ Upstream commit bfbfb6182ad1d7d184b16f25165faad879147f79 ] pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE worth of data). Theoretically it had been possible since way back, but nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change. Fortunately, the only thing that changes for compound pages is that we need to stuff each relevant subpage in and convert the offset into offset in the first subpage. Acked-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ cel: "‘for’ loop initial declarations are only allowed in C99 or C11 mode" ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
NeilBrown	a7f502860d	NFSD: fix regression with setting ACLs. [ Upstream commit 00801cd92d91e94aa04d687f9bb9a9104e7c3d46 ] A recent patch moved ACL setting into nfsd_setattr(). Unfortunately it didn't work as nfsd_setattr() aborts early if iap->ia_valid is 0. Remove this test, and instead avoid calling notify_change() when ia_valid is 0. This means that nfsd_setattr() will now always lock the inode. Previously it didn't if only a ATTR_MODE change was requested on a symlink (see Commit 15b7a1b86d66 ("[PATCH] knfsd: fix setattr-on-symlink error return")). I don't think this change really matters. Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:21 +01:00
Jeff Layton	ecc7638919	lockd: detect and reject lock arguments that overflow [ Upstream commit 6930bcbfb6ceda63e298c6af6d733ecdf6bd4cde ] lockd doesn't currently vet the start and length in nlm4 requests like it should, and can end up generating lock requests with arguments that overflow when passed to the filesystem. The NLM4 protocol uses unsigned 64-bit arguments for both start and length, whereas struct file_lock tracks the start and end as loff_t values. By the time we get around to calling nlm4svc_retrieve_args, we've lost the information that would allow us to determine if there was an overflow. Start tracking the actual start and len for NLM4 requests in the nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they won't cause an overflow, and return NLM4_FBIG if they do. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak <j.kasiak@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> # 5.14+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:20 +01:00
NeilBrown	a81e7962f6	NFSD: use (un)lock_inode instead of fh_(un)lock for file operations [ Upstream commit bb4d53d66e4b8c8b8e5634802262e53851a2d2db ] When locking a file to access ACLs and xattrs etc, use explicit locking with inode_lock() instead of fh_lock(). This means that the calls to fh_fill_pre/post_attr() are also explicit which improves readability and allows us to place them only where they are needed. Only the xattr calls need pre/post information. When locking a file we don't need I_MUTEX_PARENT as the file is not a parent of anything, so we can use inode_lock() directly rather than the inode_lock_nested() call that fh_lock() uses. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	574c240441	NFSD: use explicit lock/unlock for directory ops [ Upstream commit debf16f0c671cb8db154a9ebcd6014cfff683b80 ] When creating or unlinking a name in a directory use explicit inode_lock_nested() instead of fh_lock(), and explicit calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done for renames, with lock_rename() as the explicit locking. Also move the 'fill' calls closer to the operation that might change the attributes. This way they are avoided on some error paths. For the v2-only code in nfsproc.c, the fill calls are not replaced as they aren't needed. Making the locking explicit will simplify proposed future changes to locking for directories. It also makes it easily visible exactly where pre/post attributes are used - not all callers of fh_lock() actually need the pre/post attributes. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	c5547a3fc7	NFSD: only call fh_unlock() once in nfsd_link() [ Upstream commit e18bcb33bc5b69bccc2b532075aa00bb49cc01c5 ] On non-error paths, nfsd_link() calls fh_unlock() twice. This is safe because fh_unlock() records that the unlock has been done and doesn't repeat it. However it makes the code a little confusing and interferes with changes that are planned for directory locking. So rearrange the code to ensure fh_unlock() is called exactly once if fh_lock() was called. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	1c0a2ac5b9	NFSD: always drop directory lock in nfsd_unlink() [ Upstream commit b677c0c63a135a916493c064906582e9f3ed4802 ] Some error paths in nfsd_unlink() allow it to exit without unlocking the directory. This is not a problem in practice as the directory will be locked with an fh_put(), but it is untidy and potentially confusing. This allows us to remove all the fh_unlock() calls that are immediately after nfsd_unlink() calls. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	8b67124316	NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. [ Upstream commit 927bfc5600cd6333c9ef9f090f19e66b7d4c8ee1 ] nfsd_create() usually returns with the directory still locked. nfsd_symlink() usually returns with it unlocked. This is clumsy. Until recently nfsd_create() needed to keep the directory locked until ACLs and security label had been set. These are now set inside nfsd_create() (in nfsd_setattr()) so this need is gone. So change nfsd_create() and nfsd_symlink() to always unlock, and remove any fh_unlock() calls that follow calls to these functions. Signed-off-by: NeilBrown <neilb@suse.de> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	21505959e5	NFSD: add posix ACLs to struct nfsd_attrs [ Upstream commit c0cbe70742f4a70893cd6e5f6b10b6e89b6db95b ] pacl and dpacl pointers are added to struct nfsd_attrs, which requires that we have an nfsd_attrs_free() function to free them. Those nfsv4 functions that can set ACLs now set up these pointers based on the passed in NFSv4 ACL. nfsd_setattr() sets the acls as appropriate. Errors are handled as with security labels. Signed-off-by: NeilBrown <neilb@suse.de> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	6b797de75d	NFSD: add security label to struct nfsd_attrs [ Upstream commit d6a97d3f589a3a46a16183e03f3774daee251317 ] nfsd_setattr() now sets a security label if provided, and nfsv4 provides it in the 'open' and 'create' paths and the 'setattr' path. If setting the label failed (including because the kernel doesn't support labels), an error field in 'struct nfsd_attrs' is set, and the caller can respond. The open/create callers clear FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case. The setattr caller returns the error. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	e738779ebd	NFSD: set attributes when creating symlinks [ Upstream commit 93adc1e391a761441d783828b93979b38093d011 ] The NFS protocol includes attributes when creating symlinks. Linux does store attributes for symlinks and allows them to be set, though they are not used for permission checking. NFSD currently doesn't set standard (struct iattr) attributes when creating symlinks, but for NFSv4 it does set ACLs and security labels. This is inconsistent. To improve consistency, pass the provided attributes into nfsd_symlink() and call nfsd_create_setattr() to set them. NOTE: this results in a behaviour change for all NFS versions when the client sends non-default attributes with a SYMLINK request. With the Linux client, the only attributes are: attr.ia_mode = S_IFLNK \| S_IRWXUGO; attr.ia_valid = ATTR_MODE; so the final outcome will be unchanged. Other clients might sent different attributes, and if they did they probably expect them to be honoured. We ignore any error from nfsd_create_setattr(). It isn't really clear what should be done if a file is successfully created, but the attributes cannot be set. NFS doesn't allow partial success to be reported. Reporting failure is probably more misleading than reporting success, so the status is ignored. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:04 +01:00
NeilBrown	3b8c5ad99e	NFSD: introduce struct nfsd_attrs [ Upstream commit 7fe2a71dda349a1afa75781f0cc7975be9784d15 ] The attributes that nfsd might want to set on a file include 'struct iattr' as well as an ACL and security label. The latter two are passed around quite separately from the first, in part because they are only needed for NFSv4. This leads to some clumsiness in the code, such as the attributes NOT being set in nfsd_create_setattr(). We need to keep the directory locked until all attributes are set to ensure the file is never visibile without all its attributes. This need combined with the inconsistent handling of attributes leads to more clumsiness. As a first step towards tidying this up, introduce 'struct nfsd_attrs'. This is passed (by reference) to vfs.c functions that work with attributes, and is assembled by the various nfs*proc functions which call them. As yet only iattr is included, but future patches will expand this. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Jeff Layton	8b69402310	NFSD: verify the opened dentry after setting a delegation [ Upstream commit 876c553cb41026cb6ad3cef970a35e5f69c42a25 ] Between opening a file and setting a delegation on it, someone could rename or unlink the dentry. If this happens, we do not want to grant a delegation on the open. On a CLAIM_NULL open, we're opening by filename, and we may (in the non-create case) or may not (in the create case) be holding i_rwsem when attempting to set a delegation. The latter case allows a race. After getting a lease, redo the lookup of the file being opened and validate that the resulting dentry matches the one in the open file description. To properly redo the lookup we need an rqst pointer to pass to nfsd_lookup_dentry(), so make sure that is available. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Jeff Layton	a6651c65bb	NFSD: drop fh argument from alloc_init_deleg [ Upstream commit bbf936edd543e7220f60f9cbd6933b916550396d ] Currently, we pass the fh of the opened file down through several functions so that alloc_init_deleg can pass it to delegation_blocked. The filehandle of the open file is available in the nfs4_file however, so there's no need to pass it in a separate argument. Drop the argument from alloc_init_deleg, nfs4_open_delegation and nfs4_set_delegation. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	a483435728	NFSD: Move copy offload callback arguments into a separate structure [ Upstream commit a11ada99ce93a79393dc6683d22f7915748c8f6b ] Refactor so that CB_OFFLOAD arguments can be passed without allocating a whole struct nfsd4_copy object. On my system (x86_64) this removes another 96 bytes from struct nfsd4_copy. [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	fb3a5a8f17	NFSD: Add nfsd4_send_cb_offload() [ Upstream commit e72f9bc006c08841c46d27747a4debc747a8fe13 ] Refactor for legibility. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	dd21bfb08b	NFSD: Remove kmalloc from nfsd4_do_async_copy() [ Upstream commit ad1e46c9b07b13659635ee5405f83ad0df143116 ] Instead of manufacturing a phony struct nfsd_file, pass the struct file returned by nfs42_ssc_open() directly to nfsd4_do_copy(). [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	5cccd2a70a	NFSD: Refactor nfsd4_do_copy() [ Upstream commit 3b7bf5933cada732783554edf0dc61283551c6cf ] Refactor: Now that nfsd4_do_copy() no longer calls the cleanup helpers, plumb the use of struct file pointers all the way down to _nfsd_copy_file_range(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	0faacbdb09	NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) [ Upstream commit 478ed7b10d875da2743d1a22822b9f8a82df8f12 ] Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A subsequent patch will modify one of the new call sites to avoid the need to manufacture the phony struct nfsd_file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	428836a908	NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) [ Upstream commit 24d796ea383b8a4c8234e06d1b14bbcd371192ea ] The @src parameter is sometimes a pointer to a struct nfsd_file and sometimes a pointer to struct file hiding in a phony struct nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter is always an explicit struct file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	71edd77802	NFSD: Replace boolean fields in struct nfsd4_copy [ Upstream commit 1913cdf56cb5bfbc8170873728d13598cbecda23 ] Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy() with an atomic bitop. [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	97b56f1995	NFSD: Make nfs4_put_copy() static [ Upstream commit 8ea6e2c90bb0eb74a595a12e23a1dff9abbc760a ] Clean up: All call sites are in fs/nfsd/nfs4proc.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	47dc48b1b3	NFSD: Reorder the fields in struct nfsd4_op [ Upstream commit d314309425ad5dc1b6facdb2d456580fb5fa5e3a ] Pack the fields to reduce the size of struct nfsd4_op, which is used an array in struct nfsd4_compoundargs. sizeof(struct nfsd4_op): Before: /* size: 672, cachelines: 11, members: 5 / After: / size: 640, cachelines: 10, members: 5 */ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	976e4c5d40	NFSD: Shrink size of struct nfsd4_copy [ Upstream commit 87689df694916c40e8e6c179ab1c8710f65cb6c6 ] struct nfsd4_copy is part of struct nfsd4_op, which resides in an 8-element array. sizeof(struct nfsd4_op): Before: /* size: 1696, cachelines: 27, members: 5 / After: / size: 672, cachelines: 11, members: 5 */ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:03 +01:00
Chuck Lever	61c143ed41	NFSD: Shrink size of struct nfsd4_copy_notify [ Upstream commit 09426ef2a64ee189ca1e3298f1e874842dbf35ea ] struct nfsd4_copy_notify is part of struct nfsd4_op, which resides in an 8-element array. sizeof(struct nfsd4_op): Before: /* size: 2208, cachelines: 35, members: 5 / After: / size: 1696, cachelines: 27, members: 5 */ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	82a8990f91	NFSD: nfserrno(-ENOMEM) is nfserr_jukebox [ Upstream commit bb4d842722b84a2731257054b6405f2d866fc5f3 ] Suggested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	22ed210e34	NFSD: Fix strncpy() fortify warning [ Upstream commit 5304877936c0a67e1a01464d113bae4c81eacdb6 ] In function ‘strncpy’, inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3, inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11: /home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation] 52 \| #define __underlying_strncpy __builtin_strncpy \| ^ /home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’ 89 \| return __underlying_strncpy(p, q, size); \| ^~~~~~~~~~~~~~~~~~~~ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	7b1c5f6a61	NFSD: Clean up nfsd4_encode_readlink() [ Upstream commit 99b002a1fa00d90e66357315757e7277447ce973 ] Similar changes to nfsd4_encode_readv(), all bundled into a single patch. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	1684ec2426	NFSD: Use xdr_pad_size() [ Upstream commit 5e64d85c7d0c59cfcd61d899720b8ccfe895d743 ] Clean up: Use a helper instead of open-coding the calculation of the XDR pad size. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	31930f0bd1	NFSD: Simplify starting_len [ Upstream commit 071ae99feadfc55979f89287d6ad2c6a315cb46d ] Clean-up: Now that nfsd4_encode_readv() does not have to encode the EOF or rd_length values, it no longer needs to subtract 8 from @starting_len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	bb201cc062	NFSD: Optimize nfsd4_encode_readv() [ Upstream commit 28d5bc468efe74b790e052f758ce083a5015c665 ] write_bytes_to_xdr_buf() is pretty expensive to use for inserting an XDR data item that is always 1 XDR_UNIT at an address that is always XDR word-aligned. Since both the readv and splice read paths encode EOF and maxcount values, move both to a common code path. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	11f94493ca	NFSD: Add an nfsd4_read::rd_eof field [ Upstream commit 24c7fb85498eda1d4c6b42cc4886328429814990 ] Refactor: Make the EOF result available in the entire NFSv4 READ path. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	abc7f2871b	NFSD: Clean up SPLICE_OK in nfsd4_encode_read() [ Upstream commit c738b218a2e5a753a336b4b7fee6720b902c7ace ] Do the test_bit() once -- this reduces the number of locked-bus operations and makes the function a little easier to read. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	b91534de99	NFSD: Optimize nfsd4_encode_fattr() [ Upstream commit ab04de60ae1cc64ae16b77feae795311b97720c7 ] write_bytes_to_xdr_buf() is a generic way to place a variable-length data item in an already-reserved spot in the encoding buffer. However, it is costly. In nfsd4_encode_fattr(), it is unnecessary because the data item is fixed in size and the buffer destination address is always word-aligned. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Chuck Lever	aa5514527c	NFSD: Optimize nfsd4_encode_operation() [ Upstream commit 095a764b7afb06c9499b798c04eaa3cbf70ebe2d ] write_bytes_to_xdr_buf() is a generic way to place a variable-length data item in an already-reserved spot in the encoding buffer. However, it is costly, and here, it is unnecessary because the data item is fixed in size, the buffer destination address is always word-aligned, and the destination location is already in @p. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Jeff Layton	aedaa70d28	nfsd: silence extraneous printk on nfsd.ko insertion [ Upstream commit 3a5940bfa17fb9964bf9688b4356ca643a8f5e2d ] This printk pops every time nfsd.ko gets plugged in. Most kmods don't do that and this one is not very informative. Olaf's email address seems to be defunct at this point anyway. Just drop it. Cc: Olaf Kirch <okir@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Dai Ngo	5e0ff4dbe3	NFSD: limit the number of v4 clients to 1024 per 1GB of system memory [ Upstream commit 4271c2c0887562318a0afef97d32d8a71cbe0743 ] Currently there is no limit on how many v4 clients are supported by the system. This can be a problem in systems with small memory configuration to function properly when a very large number of clients exist that creates memory shortage conditions. This patch enforces a limit of 1024 NFSv4 clients, including courtesy clients, per 1GB of system memory. When the number of the clients reaches the limit, requests that create new clients are returned with NFS4ERR_DELAY and the laundromat is kicked start to trim old clients. Due to the overhead of the upcall to remove the client record, the maximun number of clients the laundromat removes on each run is limited to 128. This is done to ensure the laundromat can still process the other tasks in a timely manner. Since there is now a limit of the number of clients, the 24-hr idle time limit of courtesy client is no longer needed and was removed. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Dai Ngo	65c0a8306a	NFSD: keep track of the number of v4 clients in the system [ Upstream commit 0926c39515aa065a296e97dfc8790026f1e53f86 ] Add counter nfs4_client_count to keep track of the total number of v4 clients, including courtesy clients, in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:02 +01:00
Dai Ngo	90f61df124	NFSD: refactoring v4 specific code to a helper in nfs4state.c [ Upstream commit 6867137ebcf4155fe25f2ecf7c29b9fb90a76d1d ] This patch moves the v4 specific code from nfsd_init_net() to nfsd4_init_leases_net() helper in nfs4state.c Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	c035874cc9	NFSD: Ensure nf_inode is never dereferenced [ Upstream commit 427f5f83a3191cbf024c5aea6e5b601cdf88d895 ] The documenting comment for struct nf_file states: /* * A representation of a file that has been opened by knfsd. These are hashed * in the hashtable by inode pointer value. Note that this object doesn't * hold a reference to the inode by itself, so the nf_inode pointer should * never be dereferenced, only used for comparison. */ Replace the two existing dereferences to make the comment always true. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	276969802c	NFSD: NFSv4 CLOSE should release an nfsd_file immediately [ Upstream commit 5e138c4a750dc140d881dab4a8804b094bbc08d2 ] The last close of a file should enable other accessors to open and use that file immediately. Leaving the file open in the filecache prevents other users from accessing that file until the filecache garbage-collects the file -- sometimes that takes several seconds. Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	450cd975fe	NFSD: Move nfsd_file_trace_alloc() tracepoint [ Upstream commit b40a2839470cd62ed68c4a32d72a18ee8975b1ac ] Avoid recording the allocation of an nfsd_file item that is immediately released because a matching item was already inserted in the hash. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	96ab7b7d71	NFSD: Separate tracepoints for acquire and create [ Upstream commit be0230069fcbf7d332d010b57c1d0cfd623a84d6 ] These tracepoints collect different information: the create case does not open a file, so there's no nf_file available. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	8135682ff8	NFSD: Clean up unused code after rhashtable conversion [ Upstream commit 0ec8e9d1539a7b8109a554028bbce441052f847e ] Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	fb4d9b9e07	NFSD: Convert the filecache to use rhashtable [ Upstream commit ce502f81ba884c1fe45dc0ebddbcaaa4ec0fc5fb ] Enable the filecache hash table to start small, then grow with the workload. Smaller server deployments benefit because there should be lower memory utilization. Larger server deployments should see improved scaling with the number of open files. Suggested-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	304260c184	NFSD: Set up an rhashtable for the filecache [ Upstream commit fc22945ecc2a0a028f3683115f98a922d506c284 ] Add code to initialize and tear down an rhashtable. The rhashtable is not used yet. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	5d5d11d1ab	NFSD: Replace the "init once" mechanism [ Upstream commit c7b824c3d06c85e054caf86e227255112c5e3c38 ] In a moment, the nfsd_file_hashtbl global will be replaced with an rhashtable. Replace the one or two spots that need to check if the hash table is available. We can easily reuse the SHUTDOWN flag for this purpose. Document that this mechanism relies on callers to hold the nfsd_mutex to prevent init, shutdown, and purging to run concurrently. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	c5c427f95c	NFSD: Remove nfsd_file::nf_hashval [ Upstream commit f0743c2b25c65debd4f599a7c861428cd9de5906 ] The value in this field can always be computed from nf_inode, thus it is no longer used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	76fe92d538	NFSD: nfsd_file_hash_remove can compute hashval [ Upstream commit cb7ec76e73ff6640241c8f1f2f35c81d4005a2d6 ] Remove an unnecessary use of nf_hashval. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	d365398d77	NFSD: Refactor __nfsd_file_close_inode() [ Upstream commit a845511007a63467fee575353c706806c21218b1 ] The code that computes the hashval is the same in both callers. To prevent them from going stale, reframe the documenting comments to remove descriptions of the underlying hash table structure, which is about to be replaced. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	adfe497c23	NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode [ Upstream commit 8755326399f471ec3b31e2ab8c5074c0d28a0fb5 ] Remove an unnecessary usage of nf_hashval. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	be3cb94280	NFSD: Remove lockdep assertion from unhash_and_release_locked() [ Upstream commit f53cef15dddec7203df702cdc62e554190385450 ] IIUC, holding the hash bucket lock is needed only in nfsd_file_unhash, and there is already a lockdep assertion there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:01 +01:00
Chuck Lever	4f7dbf0537	NFSD: No longer record nf_hashval in the trace log [ Upstream commit 54f7df7094b329ca35d9f9808692bb16c48b13e9 ] I'm about to replace nfsd_file_hashtbl with an rhashtable. The individual hash values will no longer be visible or relevant, so remove them from the tracepoints. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	1df5db6e78	NFSD: Never call nfsd_file_gc() in foreground paths [ Upstream commit 6df19411367a5fb4ef61854cbd1af269c077f917 ] The checks in nfsd_file_acquire() and nfsd_file_put() that directly invoke filecache garbage collection are intended to keep cache occupancy between a low- and high-watermark. The reason to limit the capacity of the filecache is to keep filecache lookups reasonably fast. However, invoking garbage collection at those points has some undesirable negative impacts. Files that are held open by NFSv4 clients often push the occupancy of the filecache over these watermarks. At that point: - Every call to nfsd_file_acquire() and nfsd_file_put() results in an LRU walk. This has the same effect on lookup latency as long chains in the hash table. - Garbage collection will then run on every nfsd thread, causing a lot of unnecessary lock contention. - Limiting cache capacity pushes out files used only by NFSv3 clients, which are the type of files the filecache is supposed to help. To address those negative impacts, remove the direct calls to the garbage collector. Subsequent patches will address maintaining lookup efficiency as cache capacity increases. Suggested-by: Wang Yugui <wangyugui@e16-tech.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	1d683366d4	NFSD: Fix the filecache LRU shrinker [ Upstream commit edead3a55804739b2e4af0f35e9c7326264e7b22 ] Without LRU item rotation, the shrinker visits only a few items on the end of the LRU list, and those would always be long-term OPEN files for NFSv4 workloads. That makes the filecache shrinker completely ineffective. Adopt the same strategy as the inode LRU by using LRU_ROTATE. Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	f65a30361a	NFSD: Leave open files out of the filecache LRU [ Upstream commit 4a0e73e635e3f36b616ad5c943e3d23debe4632f ] There have been reports of problems when running fstests generic/531 against Linux NFS servers with NFSv4. The NFS server that hosts the test's SCRATCH_DEV suffers from CPU soft lock-ups during the test. Analysis shows that: fs/nfsd/filecache.c 482 ret = list_lru_walk(&nfsd_file_lru, 483 nfsd_file_lru_cb, 484 &head, LONG_MAX); causes nfsd_file_gc() to walk the entire length of the filecache LRU list every time it is called (which is quite frequently). The walk holds a spinlock the entire time that prevents other nfsd threads from accessing the filecache. What's more, for NFSv4 workloads, none of the items that are visited during this walk may be evicted, since they are all files that are held OPEN by NFS clients. Address this by ensuring that open files are not kept on the LRU list. Reported-by: Frank van der Linden <fllinden@amazon.com> Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386 Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	4155bf65c7	NFSD: Trace filecache LRU activity [ Upstream commit c46203acddd9b9200dbc53d0603c97355fd3a03b ] Observe the operation of garbage collection and the lifetime of filecache items. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	2e3445246e	NFSD: WARN when freeing an item still linked via nf_lru [ Upstream commit 668ed92e651d3c25f9b6e8cb7ceca54d00daa96d ] Add a guardrail to prevent freeing memory that is still on a list. This includes either a dispose list or the LRU list. This is the sign of a bug, but this class of bugs can be detected so that they don't endanger system stability, especially while debugging. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	2310e47e13	NFSD: Hook up the filecache stat file [ Upstream commit 2e6c6e4c4375bfd3defa5b1ff3604d9f33d1c936 ] There has always been the capability of exporting filecache metrics via /proc, but it was never hooked up. Let's surface these metrics to enable better observability of the filecache. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	08cf06f7ea	NFSD: Zero counters when the filecache is re-initialized [ Upstream commit 8b330f78040cbe16cf8029df70391b2a491f17e2 ] If nfsd_file_cache_init() is called after a shutdown, be sure the stat counters are reset. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	a679a24e03	NFSD: Record number of flush calls [ Upstream commit df2aff524faceaf743b7c5ab0f4fb86cb511f782 ] Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	3e67c0dce1	NFSD: Report the number of items evicted by the LRU walk [ Upstream commit 94660cc19c75083af046b0f8362e3d3bc2eba21d ] Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	d2190956b0	NFSD: Refactor nfsd_file_lru_scan() [ Upstream commit 39f1d1ff8148902c5692ffb0e1c4479416ab44a7 ] Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	f2f80d62e1	NFSD: Refactor nfsd_file_gc() [ Upstream commit 3bc6d3470fe412f818f9bff6b71d1be3a76af8f3 ] Refactor nfsd_file_gc() to use the new list_lru helper. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	2103e0252c	NFSD: Add nfsd_file_lru_dispose_list() helper [ Upstream commit 0bac5a264d9a923f5b01f3521e1519a8d0358342 ] Refactor the invariant part of nfsd_file_lru_walk_list() into a separate helper function. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	8a559507dc	NFSD: Report average age of filecache items [ Upstream commit 904940e94a887701db24401e3ed6928a1d4e329f ] This is a measure of how long items stay in the filecache, to help assess how efficient the cache is. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	5164b98baf	NFSD: Report count of freed filecache items [ Upstream commit d63293272abb51c02457f1017dfd61c3270d9ae3 ] Surface the count of freed nfsd_file items. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:28:00 +01:00
Chuck Lever	359bbdb6f0	NFSD: Report count of calls to nfsd_file_acquire() [ Upstream commit 29d4bdbbb910f33d6058d2c51278f00f656df325 ] Count the number of successful acquisitions that did not create a file (ie, acquisitions that do not result in a compulsory cache miss). This count can be compared directly with the reported hit count to compute a hit ratio. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Chuck Lever	6eaf4cc583	NFSD: Report filecache LRU size [ Upstream commit 0fd244c115f0321fc5e34ad2291f2a572508e3f7 ] Surface the NFSD filecache's LRU list length to help field troubleshooters monitor filecache issues. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Chuck Lever	0c9dcf0d22	NFSD: Demote a WARN to a pr_warn() [ Upstream commit ca3f9acb6d3faf78da2b63324f7c737dbddf7f69 ] The call trace doesn't add much value, but it sure is noisy. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Colin Ian King	3d521baa0c	nfsd: remove redundant assignment to variable len [ Upstream commit 842e00ac3aa3b4a4f7f750c8ab54f8578fc875d3 ] Variable len is being assigned a value zero and this is never read, it is being re-assigned later. The assignment is redundant and can be removed. Cleans up clang scan-build warning: fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Zhang Jiaming	f739c31d1d	NFSD: Fix space and spelling mistake [ Upstream commit f532c9ff103897be0e2a787c0876683c3dc39ed3 ] Add a blank space after ','. Change 'succesful' to 'successful'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Benjamin Coddington	e2a434f8c8	NLM: Defend against file_lock changes after vfs_test_lock() [ Upstream commit 184cefbe62627730c30282df12bcff9aae4816ea ] Instead of trusting that struct file_lock returns completely unchanged after vfs_test_lock() when there's no conflicting lock, stash away our nlm_lockowner reference so we can properly release it for all cases. This defends against another file_lock implementation overwriting fl_owner when the return type is F_UNLCK. Reported-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Tested-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Jeff Layton	92c9b806c7	nfsd: eliminate the NFSD_FILE_BREAK_* flags [ Upstream commit 23ba98de6dcec665e15c0ca19244379bb0d30932 ] We had a report from the spring Bake-a-thon of data corruption in some nfstest_interop tests. Looking at the traces showed the NFS server allowing a v3 WRITE to proceed while a read delegation was still outstanding. Currently, we only set NFSD_FILE_BREAK_* flags if NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for COMMIT ops, where we need a writeable filehandle but don't need to break read leases. It doesn't make any sense to consult that flag when allocating a file since the file may be used on subsequent calls where we do want to break the lease (and the usage of it here seems to be reverse from what it should be anyway). Also, after calling nfsd_open_break_lease, we don't want to clear the BREAK_* bits. A lease could end up being set on it later (more than once) and we need to be able to break those leases as well. This means that the NFSD_FILE_BREAK_* flags now just mirror NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just drop those flags and unconditionally call nfsd_open_break_lease every time. Reported-by: Olga Kornieskaia <kolga@netapp.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) Cc: <stable@vger.kernel.org> # 5.4.x : bb283ca18d1e NFSD: Clean up the show_nf_flags() macro Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Xin Gao	eadde0fa3e	fsnotify: Fix comment typo [ Upstream commit feee1ce45a5666bbdb08c5bb2f5f394047b1915b ] The double `if' is duplicated in line 104, remove one. Signed-off-by: Xin Gao <gaoxin@cdjrlc.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220722194639.18545-1-gaoxin@cdjrlc.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Amir Goldstein	059dc1dee5	fanotify: introduce FAN_MARK_IGNORE [ Upstream commit e252f2ed1c8c6c3884ab5dd34e003ed21f1fe6e0 ] This flag is a new way to configure ignore mask which allows adding and removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask. The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on directories and would ignore events on children depending on whether the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask. FAN_MARK_IGNORE can be used to ignore events on children without setting FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on directories unconditionally, only when FAN_ONDIR is set in ignore mask. The new behavior is non-downgradable. After calling fanotify_mark() with FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK on the same object will return EEXIST error. Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark has no meaning and will return ENOTDIR error. The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new FAN_MARK_IGNORE flag, but with a few semantic differences: 1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount marks and on an inode mark on a directory. Omitting this flag will return EINVAL or EISDIR error. 2. An ignore mask on a non-directory inode that survives modify could never be downgraded to an ignore mask that does not survive modify. With new FAN_MARK_IGNORE semantics we make that rule explicit - trying to update a surviving ignore mask without the flag FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error. The conveniene macro FAN_MARK_IGNORE_SURV is added for (FAN_MARK_IGNORE \| FAN_MARK_IGNORED_SURV_MODIFY), because the common case should use short constant names. Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Amir Goldstein	3d09e15ac0	fanotify: cleanups for fanotify_mark() input validations [ Upstream commit 8afd7215aa97f8868d033f6e1d01a276ab2d29c0 ] Create helper fanotify_may_update_existing_mark() for checking for conflicts between existing mark flags and fanotify_mark() flags. Use variable mark_cmd to make the checks for mark command bits cleaner. Link: https://lore.kernel.org/r/20220629144210.2983229-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Amir Goldstein	0f17199cd8	fanotify: prepare for setting event flags in ignore mask [ Upstream commit 31a371e419c885e0f137ce70395356ba8639dc52 ] Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories. Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask. Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories. To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags. This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments. Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Oliver Ford	33ebb84b9a	fs: inotify: Fix typo in inotify comment [ Upstream commit c05787b4c2f80a3bebcb9cdbf255d4fa5c1e24e1 ] Correct spelling in comment. Signed-off-by: Oliver Ford <ojford@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220518145959.41-1-ojford@gmail.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Jeff Layton	f45ed58c81	lockd: fix nlm_close_files [ Upstream commit 1197eb5906a5464dbaea24cac296dfc38499cc00 ] This loop condition tries a bit too hard to be clever. Just test for the two indices we care about explicitly. Cc: J. Bruce Fields <bfields@fieldses.org> Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Jeff Layton	5f03738282	lockd: set fl_owner when unlocking files [ Upstream commit aec158242b87a43d83322e99bc71ab4428e5ab79 ] Unlocking a POSIX lock on an inode with vfs_lock_file only works if the owner matches. Ensure we set it in the request. Cc: J. Bruce Fields <bfields@fieldses.org> Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
Chuck Lever	38aea79732	NFSD: Decode NFSv4 birth time attribute [ Upstream commit 5b2f3e0777da2a5dd62824bbe2fdab1d12caaf8f ] NFSD has advertised support for the NFSv4 time_create attribute since commit e377a3e698fb ("nfsd: Add support for the birth time attribute"). Igor Mammedov reports that Mac OS clients attempt to set the NFSv4 birth time attribute via OPEN(CREATE) and SETATTR if the server indicates that it supports it, but since the above commit was merged, those attempts now fail. Table 5 in RFC 8881 lists the time_create attribute as one that can be both set and retrieved, but the above commit did not add server support for clients to provide a time_create attribute. IMO that's a bug in our implementation of the NFSv4 protocol, which this commit addresses. Whether NFSD silently ignores the new birth time or actually sets it is another matter. I haven't found another filesystem service in the Linux kernel that enables users or clients to modify a file's birth time attribute. This commit reflects my (perhaps incorrect) understanding of whether Linux users can set a file's birth time. NFSD will now recognize a time_create attribute but it ignores its value. It clears the time_create bit in the returned attribute bitmask to indicate that the value was not used. Reported-by: Igor Mammedov <imammedo@redhat.com> Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Tested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:59 +01:00
NeilBrown	b108aeaf56	NFS: restore module put when manager exits. [ Upstream commit 080abad71e99d2becf38c978572982130b927a28 ] Commit f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed calls to module_put_and_kthread_exit() from threads that acted as SUNRPC servers and had a related svc_serv_ops structure. This was correct. It ALSO removed the module_put_and_kthread_exit() call from nfs4_run_state_manager() which is NOT a SUNRPC service. Consequently every time the NFSv4 state manager runs the module count increments and won't be decremented. So the nfsv4 module cannot be unloaded. So restore the module_put_and_kthread_exit() call. Fixes: f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Amir Goldstein	60cb3eaddd	fanotify: refine the validation checks on non-dir inode mask [ Upstream commit 8698e3bab4dd7968666e84e111d0bfd17c040e77 ] Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	812ad3cdc4	NFSD: Fix potential use-after-free in nfsd_file_put() [ Upstream commit b6c71c66b0ad8f2b59d9bc08c7a5079b110bec01 ] nfsd_file_put_noref() can free @nf, so don't dereference @nf immediately upon return from nfsd_file_put_noref(). Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Fixes: 999397926ab3 ("nfsd: Clean up nfsd_file_put()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	b09116ac30	NFSD: nfsd_file_put() can sleep [ Upstream commit 08af54b3e5729bc1d56ad3190af811301bdc37a1 ] Now that there are no more callers of nfsd_file_put() that might hold a spin lock, ensure the lockdep infrastructure can catch newly introduced calls to nfsd_file_put() made while a spinlock is held. Link: https://lore.kernel.org/linux-nfs/ece7fd1d-5fb3-5155-54ba-347cfc19bd9a@oracle.com/T/#mf1855552570cf9a9c80d1e49d91438cd9085aada Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	bb58cbbf0a	NFSD: Add documenting comment for nfsd4_release_lockowner() [ Upstream commit 043862b09cc00273e35e6c3a6389957953a34207 ] And return explicit nfserr values that match what is documented in the new comment / API contract. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	8b8f536a81	NFSD: Modernize nfsd4_release_lockowner() [ Upstream commit bd8fdb6e545f950f4654a9a10d7e819ad48146e5 ] Refactor: Use existing helpers that other lock operations use. This change removes several automatic variables, so re-organize the variable declarations for readability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Julian Schroeder	9ca8d2154e	nfsd: destroy percpu stats counters after reply cache shutdown [ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ] Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed. Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder <jumaco@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Zhang Xiaoxu	6c1ecdcf60	nfsd: Fix null-ptr-deref in nfsd_fill_super() [ Upstream commit 6f6f84aa215f7b6665ccbb937db50860f9ec2989 ] KASAN report null-ptr-deref as follows: BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd] Write of size 8 at addr 000000000000005d by task a.out/852 CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xab/0x120 ? nfsd_mkdir+0x71/0x1c0 [nfsd] ? nfsd_fill_super+0xc6/0xe0 [nfsd] nfsd_fill_super+0xc6/0xe0 [nfsd] ? nfsd_mkdir+0x1c0/0x1c0 [nfsd] get_tree_keyed+0x8e/0x100 vfs_get_tree+0x41/0xf0 __do_sys_fsconfig+0x590/0x670 ? fscontext_read+0x180/0x180 ? anon_inode_getfd+0x4f/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduce by concurrent operations: 1. fsopen(nfsd)/fsconfig 2. insmod/rmmod nfsd Since the nfsd file system is registered before than nfsd_net allocated, the caller may get the file_system_type and use the nfsd_net before it allocated, then null-ptr-deref occurred. So init_nfsd() should call register_filesystem() last. Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Zhang Xiaoxu	39414a484a	nfsd: Unregister the cld notifier when laundry_wq create failed [ Upstream commit 62fdb65edb6c43306c774939001f3a00974832aa ] If laundry_wq create failed, the cld notifier should be unregistered. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	7da9080d87	SUNRPC: Use RMW bitops in single-threaded hot paths [ Upstream commit 28df0988815f63e2af5e6718193c9f68681ad7ff ] I noticed CPU pipeline stalls while using perf. Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are not needed outside the svc thread scheduler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	4338883018	NFSD: Clean up the show_nf_flags() macro [ Upstream commit bb283ca18d1e67c82d22a329c96c9d6036a74790 ] The flags are defined using C macros, so TRACE_DEFINE_ENUM is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	47ef8b0821	NFSD: Trace filecache opens [ Upstream commit 0122e882119ddbd9efa6edfeeac3f5c704a7aeea ] Instrument calls to nfsd_open_verified() to get a sense of the filecache hit rate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	09e30c0ad5	NFSD: Move documenting comment for nfsd4_process_open2() [ Upstream commit 7e2ce0cc15a509b859199235a2bad9cece00f67a ] Clean up nfsd4_open() by converting a large comment at the only call site for nfsd4_process_open2() to a kerneldoc comment in front of that function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:58 +01:00
Chuck Lever	e139ce38c4	NFSD: Fix whitespace [ Upstream commit 26320d7e317c37404c811603d50d811132aef78c ] Clean up: Pull case arms back one tab stop to conform every other switch statement in fs/nfsd/nfs4proc.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	1b66509ece	NFSD: Remove dprintk call sites from tail of nfsd4_open() [ Upstream commit f67a16b147045815b6aaafeef8663e5faeb6d569 ] Clean up: These relics are not likely to benefit server administrators. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	96f1516e3e	NFSD: Instantiate a struct file when creating a regular NFSv4 file [ Upstream commit fb70bf124b051d4ded4ce57511dfec6d3ebf2b43 ] There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case. To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked. Two new APIs are added: + Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file. + Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: JianHong Yin <jiyin@redhat.com> [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	52b01f1e64	NFSD: Clean up nfsd_open_verified() [ Upstream commit f4d84c52643ae1d63a8e73e2585464470e7944d1 ] Its only caller always passes S_IFREG as the @type parameter. As an additional clean-up, add a kerneldoc comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	0d45ecb602	NFSD: Remove do_nfsd_create() [ Upstream commit 1c388f27759c5d9271d4fca081f7ee138986eb7d ] Now that its two callers have their own version-specific instance of this function, do_nfsd_create() is no longer used. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	381d307366	NFSD: Refactor NFSv4 OPEN(CREATE) [ Upstream commit 254454a5aa4a9f696d6bae080c08d5863e650f49 ] Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	68549f7a67	NFSD: Refactor NFSv3 CREATE [ Upstream commit df9606abddfb01090d5ece7dcc2441d848f690f0 ] The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to diverge such that it makes sense to split do_nfsd_create() into one version for NFSv3 and one for NFSv4. As a first step, copy do_nfsd_create() to nfs3proc.c and remove NFSv4-specific logic. One immediate legibility benefit is that the logic for handling NFSv3 createhow is now quite straightforward. NFSv4 createhow has some subtleties that IMO do not belong in generic code. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	e47a03cb8c	NFSD: Refactor nfsd_create_setattr() [ Upstream commit 5f46e950c395b9c14c282b53ba78c5fd46d6c256 ] I'd like to move do_nfsd_create() out of vfs.c. Therefore nfsd_create_setattr() needs to be made publicly visible. Note that both call sites in vfs.c commit both the new object and its parent directory, so just combine those common metadata commits into nfsd_create_setattr(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	ae6e6cbebe	NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() [ Upstream commit 14ee45b70dd0d9ae76fb066cd8c0652d657353f6 ] Clean up: The "out" label already invokes fh_drop_write(). Note that fh_drop_write() is already careful not to invoke mnt_drop_write() if either it has already been done or there is nothing to drop. Therefore no change in behavior is expected. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Chuck Lever	ffa8158613	NFSD: Clean up nfsd3_proc_create() [ Upstream commit e61568599c9ad638fdaba150fee07d7065e31851 ] As near as I can tell, mode bit masking and setting S_IFREG is already done by do_nfsd_create() and vfs_create(). The NFSv4 path (do_open_lookup), for example, does not bother with this special processing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Dai Ngo	deb077af3e	NFSD: Show state of courtesy client in client info [ Upstream commit e9488d5ae13c0a72223c507e2508dc2ac66cad4f ] Update client_info_show to show state of courtesy client and seconds since last renew. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Dai Ngo	d7dd55b38e	NFSD: add support for lock conflict to courteous server [ Upstream commit 27431affb0dbc259ac6ffe6071243a576c8f38f1 ] This patch allows expired client with lock state to be in COURTESY state. Lock conflict with COURTESY client is resolved by the fs/lock code using the lm_lock_expirable and lm_expire_lock callback in the struct lock_manager_operations. If conflict client is in COURTESY state, set it to EXPIRABLE and schedule the laundromat to run immediately to expire the client. The callback lm_expire_lock waits for the laundromat to flush its work queue before returning to caller. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:57 +01:00
Dai Ngo	0935eeceb4	fs/lock: add helper locks_owner_has_blockers to check for blockers [ Upstream commit 591502c5cb325b1c6ec59ab161927d606b918aa0 ] Add helper locks_owner_has_blockers to check if there is any blockers for a given lockowner. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Dai Ngo	08d438e1aa	NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd [ Upstream commit d76cc46b37e123e8d245cc3490978dbda56f979d ] This patch moves create/destroy of laundry_wq from nfs4_state_start and nfs4_state_shutdown_net to init_nfsd and exit_nfsd to prevent the laundromat from being freed while a thread is processing a conflicting lock. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Dai Ngo	1657b2fb98	NFSD: add support for share reservation conflict to courteous server [ Upstream commit 3d69427151806656abf129342028f3f4e5e1fee0 ] This patch allows expired client with open state to be in COURTESY state. Share/access conflict with COURTESY client is resolved by setting COURTESY client to EXPIRABLE state, schedule laundromat to run and returning nfserr_jukebox to the request client. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Dai Ngo	eb360efd38	NFSD: add courteous server support for thread with only delegation [ Upstream commit 66af25799940b26efd41ea6e648f75c41a48a2c2 ] This patch provides courteous server support for delegation only. Only expired client with delegation but no conflict and no open or lock state is allowed to be in COURTESY state. Delegation conflict with COURTESY/EXPIRABLE client is resolved by setting it to EXPIRABLE, queue work for the laundromat and return delay to the caller. Conflict is resolved when the laudromat runs and expires the EXIRABLE client while the NFS client retries the OPEN request. Local thread request that gets conflict is doing the retry in _break_lease. Client in COURTESY or EXPIRABLE state is allowed to reconnect and continues to have access to its state. Access to the nfs4_client by the reconnecting thread and the laundromat is serialized via the client_lock. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Chuck Lever	d98ea952eb	NFSD: Clean up nfsd_splice_actor() [ Upstream commit 91e23b1c39820bfed642119ff6b6ef9f43cf09ce ] nfsd_splice_actor() checks that the page being spliced does not match the previous element in the svc_rqst::rq_pages array. We believe this is to prevent a double put_page() in cases where the READ payload is partially contained in the xdr_buf's head buffer. However, the NFSD READ proc functions no longer place any part of the READ payload in the head buffer, in order to properly support NFS/RDMA READ with Write chunks. Therefore, simplify the logic in nfsd_splice_actor() to remove this unnecessary check. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Vasily Averin	b1713b4bc8	fanotify: fix incorrect fmode_t casts [ Upstream commit dccd855771b37820b6d976a99729c88259549f85 ] Fixes sparce warnings: fs/notify/fanotify/fanotify_user.c:267:63: sparse: warning: restricted fmode_t degrades to integer fs/notify/fanotify/fanotify_user.c:1351:28: sparse: warning: restricted fmode_t degrades to integer FMODE_NONTIFY have bitwise fmode_t type and requires __force attribute for any casts. Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/9adfd6ac-1b89-791e-796b-49ada3293985@openvz.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Amir Goldstein	848958a605	fsnotify: consistent behavior for parent not watching children [ Upstream commit e730558adffb88a52e562db089e969ee9510184a ] The logic for handling events on child in groups that have a mark on the parent inode, but without FS_EVENT_ON_CHILD flag in the mask is duplicated in several places and inconsistent. Move the logic into the preparation of mark type iterator, so that the parent mark type will be excluded from all mark type iterations in that case. This results in several subtle changes of behavior, hopefully all desired changes of behavior, for example: - Group A has a mount mark with FS_MODIFY in mask - Group A has a mark with ignore mask that does not survive FS_MODIFY and does not watch children on directory D. - Group B has a mark with FS_MODIFY in mask that does watch children on directory D. - FS_MODIFY event on file D/foo should not clear the ignore mask of group A, but before this change it does And if group A ignore mask was set to survive FS_MODIFY: - FS_MODIFY event on file D/foo should be reported to group A on account of the mount mark, but before this change it is wrongly ignored Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara <jack@suse.com> Link: https://lore.kernel.org/linux-fsdevel/20220314113337.j7slrb5srxukztje@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220511190213.831646-3-amir73il@gmail.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Amir Goldstein	2de91bbaaf	fsnotify: introduce mark type iterator [ Upstream commit 14362a2541797cf9df0e86fb12dcd7950baf566e ] fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code of iterating all marks of a specific group interested in an event by consulting the iterator report_mask. Use an open coded version of that iterator in fsnotify_iter_next() that collects all marks of the current iteration group without consulting the iterator report_mask. At the moment, the two iterator variants are the same, but this decoupling will allow us to exclude some of the group's marks from reporting the event, for example for event on child and inode marks on parent did not request to watch events on children. Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara <jack@suse.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220511190213.831646-2-amir73il@gmail.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Amir Goldstein	195391a650	fanotify: enable "evictable" inode marks [ Upstream commit 5f9d3bd520261fd7a850818c71809fd580e0f30c ] Now that the direct reclaim path is handled we can enable evictable inode marks. Link: https://lore.kernel.org/r/20220422120327.3459282-17-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Amir Goldstein	30505ccce4	fanotify: use fsnotify group lock helpers [ Upstream commit e79719a2ca5c61912c0493bc1367db52759cf6fd ] Direct reclaim from fanotify mark allocation context may try to evict inodes with evictable marks of the same group and hit this deadlock: [<0>] fsnotify_destroy_mark+0x1f/0x3a [<0>] fsnotify_destroy_marks+0x71/0xd9 [<0>] __destroy_inode+0x24/0x7e [<0>] destroy_inode+0x2c/0x67 [<0>] dispose_list+0x49/0x68 [<0>] prune_icache_sb+0x5b/0x79 [<0>] super_cache_scan+0x11c/0x16f [<0>] shrink_slab.constprop.0+0x23e/0x40f [<0>] shrink_node+0x218/0x3e7 [<0>] do_try_to_free_pages+0x12a/0x2d2 [<0>] try_to_free_pages+0x166/0x242 [<0>] __alloc_pages_slowpath.constprop.0+0x30c/0x903 [<0>] __alloc_pages+0xeb/0x1c7 [<0>] cache_grow_begin+0x6f/0x31e [<0>] fallback_alloc+0xe0/0x12d [<0>] ____cache_alloc_node+0x15a/0x17e [<0>] kmem_cache_alloc_trace+0xa1/0x143 [<0>] fanotify_add_mark+0xd5/0x2b2 [<0>] do_fanotify_mark+0x566/0x5eb [<0>] __x64_sys_fanotify_mark+0x21/0x24 [<0>] do_syscall_64+0x6d/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under fanotify group lock and use the safe group lock helpers. Link: https://lore.kernel.org/r/20220422120327.3459282-16-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:56 +01:00
Amir Goldstein	350eb32a22	fanotify: implement "evictable" inode marks [ Upstream commit 7d5e005d982527e4029b0139823d179986e34cdc ] When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not pin the marked inode to inode cache, so when inode is evicted from cache due to memory pressure, the mark will be lost. When an inode mark with flag FAN_MARK_EVICATBLE is updated without using this flag, the marked inode is pinned to inode cache. When an inode mark is updated with flag FAN_MARK_EVICTABLE but an existing mark already has the inode pinned, the mark update fails with error EEXIST. Evictable inode marks can be used to setup inode marks with ignored mask to suppress events from uninteresting files or directories in a lazy manner, upon receiving the first event, without having to iterate all the uninteresting files or directories before hand. The evictbale inode mark feature allows performing this lazy marks setup without exhausting the system memory with pinned inodes. This change does not enable the feature yet. Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/ Link: https://lore.kernel.org/r/20220422120327.3459282-15-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	3eb8ebb0d1	fanotify: factor out helper fanotify_mark_update_flags() [ Upstream commit 8998d110835e3781ccd3f1ae061a590b4aaba911 ] Handle FAN_MARK_IGNORED_SURV_MODIFY flag change in a helper that is called after updating the mark mask. Replace the added and removed return values and help variables with bool recalc return values and help variable, which makes the code a bit easier to follow. Rename flags argument to fan_flags to emphasize the difference from mark->flags. Link: https://lore.kernel.org/r/20220422120327.3459282-14-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	c0b56e0b9c	fanotify: create helper fanotify_mark_user_flags() [ Upstream commit 4adce25ccfff215939ee465b8c0aa70526d5c352 ] To translate from fsnotify mark flags to user visible flags. Link: https://lore.kernel.org/r/20220422120327.3459282-13-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	a835cba552	fsnotify: allow adding an inode mark without pinning inode [ Upstream commit c3638b5b13740fa31762d414bbce8b7a694e582a ] fsnotify_add_mark() and variants implicitly take a reference on inode when attaching a mark to an inode. Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF. Instead of taking the inode reference when attaching connector to inode and dropping the inode reference when detaching connector from inode, take the inode reference on attach of the first mark that wants to hold an inode reference and drop the inode reference on detach of the last mark that wants to hold an inode reference. Backends can "upgrade" an existing mark to take an inode reference, but cannot "downgrade" a mark with inode reference to release the refernce. This leaves the choice to the backend whether or not to pin the inode when adding an inode mark. This is intended to be used when adding a mark with ignored mask that is used for optimization in cases where group can afford getting unneeded events and reinstate the mark with ignored mask when inode is accessed again after being evicted. Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	584b88a1c8	dnotify: use fsnotify group lock helpers [ Upstream commit aabb45fdcb31f00f1e7cae2bce83e83474a87c03 ] Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context. There is no guarantee that others memory shrinkers don't do the same and no guarantee that future shrinkers won't do that. For example, if overlayfs implements inode cache of fscache would keep open files to cached objects, inode shrinkers could end up closing open files to underlying fs. Direct reclaim from dnotify mark allocation context may try to close open files that have dnotify marks of the same group and hit a deadlock on mark_mutex. Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under dnotify group lock and use the safe group lock helpers. Link: https://lore.kernel.org/r/20220422120327.3459282-11-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	924ad2359b	nfsd: use fsnotify group lock helpers [ Upstream commit b8962a9d8cc2d8c93362e2f684091c79f702f6f3 ] Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context and that could cause a deadlock when fsnotify mark allocation went into direct reclaim and nfsd shrinker tried to free existing fsnotify marks. To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS flag on nfsd fsnotify group to prevent going into direct reclaim from fsnotify_add_inode_mark(). Link: https://lore.kernel.org/r/20220422120327.3459282-10-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	e26b208539	inotify: use fsnotify group lock helpers [ Upstream commit 642054b87058019be36033f73c3e48ffff1915aa ] inotify inode marks pin the inode so there is no need to set the FSNOTIFY_GROUP_NOFS flag. Link: https://lore.kernel.org/r/20220422120327.3459282-8-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	30578afe61	fsnotify: create helpers for group mark_mutex lock [ Upstream commit 43b245a788e2d8f1bb742668a9bdace02fcb3e96 ] Create helpers to take and release the group mark_mutex lock. Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines if the mark_mutex lock is fs reclaim safe or not. If not safe, the lock helpers take the lock and disable direct fs reclaim. In that case we annotate the mutex with a different lockdep class to express to lockdep that an allocation of mark of an fs reclaim safe group may take the group lock of another "NOFS" group to evict inodes. For now, converted only the callers in common code and no backend defines the NOFS flag. It is intended to be set by fanotify for evictable marks support. Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	c1c38650f3	fsnotify: make allow_dups a property of the group [ Upstream commit f3010343d9e119da35ee864b3a28993bb5c78ed7 ] Instead of passing the allow_dups argument to fsnotify_add_mark() as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express the allow_dups behavior and set this behavior at group creation time for all calls of fsnotify_add_mark(). Rename the allow_dups argument to generic add_flags argument for future use. Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	ef9c24cf07	fsnotify: pass flags argument to fsnotify_alloc_group() [ Upstream commit 867a448d587e7fa845bceaf4ee1c632448f2a9fa ] Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation. Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags. Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	3a34a387fc	inotify: move control flags from mask to mark flags [ Upstream commit 38035c04f5865c4ef9597d6beed6a7178f90f64a ] The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not relevant to object interest mask, so move them to the mark flags. This frees up some bits in the object interest mask. Link: https://lore.kernel.org/r/20220422120327.3459282-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Amir Goldstein	9ebbf28184	fanotify: do not allow setting dirent events in mask of non-dir [ Upstream commit ceaf69f8eadcafb323392be88e7a5248c415d423 ] Dirent events (create/delete/move) are only reported on watched directory inodes, but in fanotify as well as in legacy inotify, it was always allowed to set them on non-dir inode, which does not result in any meaningful outcome. Until kernel v5.17, dirent events in fanotify also differed from events "on child" (e.g. FAN_OPEN) in the information provided in the event. For example, FAN_OPEN could be set in the mask of a non-dir or the mask of its parent and event would report the fid of the child regardless of the marked object. By contrast, FAN_DELETE is not reported if the child is marked and the child fid was not reported in the events. Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the fid of the child is reported with dirent events, like events "on child", which may create confusion for users expecting the same behavior as events "on child" when setting events in the mask on a child. The desired semantics of setting dirent events in the mask of a child are not clear, so for now, deny this action for a group initialized with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME. We may relax this restriction in the future if we decide on the semantics and implement them. Fixes: d61fd650e9d2 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID") Fixes: 8cc3b1ccd930 ("fanotify: wire up FAN_RENAME event") Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Trond Myklebust	12c4345fdb	nfsd: Clean up nfsd_file_put() [ Upstream commit 999397926ab3f78c7d1235cc4ca6e3c89d2769bf ] Make it a little less racy, by removing the refcount_read() test. Then remove the redundant 'is_hashed' variable. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Trond Myklebust	3850692c41	nfsd: Fix a write performance regression [ Upstream commit 6b8a94332ee4f7d9a8ae0cbac7609f79c212f06c ] The call to filemap_flush() in nfsd_file_put() is there to ensure that we clear out any writes belonging to a NFSv3 client relatively quickly and avoid situations where the file can't be evicted by the garbage collector. It also ensures that we detect write errors quickly. The problem is this causes a regression in performance for some workloads. So try to improve matters by deferring writeback until we're ready to close the file, and need to detect errors so that we can force the client to resend. Tested-by: Jan Kara <jack@suse.cz> Fixes: b6669305d35a ("nfsd: Reduce the number of calls to nfsd_file_gc()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:55 +01:00
Haowen Bai	097f4985e1	SUNRPC: Return true/false (not 1/0) from bool functions [ Upstream commit 5f7b839d47dbc74cf4a07beeab5191f93678673e ] Return boolean values ("true" or "false") instead of 1 or 0 from bool functions. This fixes the following warnings from coccicheck: ./fs/nfsd/nfs2acl.c:289:9-10: WARNING: return of 0/1 in function 'nfsaclsvc_encode_accessres' with return type bool ./fs/nfsd/nfs2acl.c:252:9-10: WARNING: return of 0/1 in function 'nfsaclsvc_encode_getaclres' with return type bool Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Bang Li	867ffd6f75	fsnotify: remove redundant parameter judgment [ Upstream commit f92ca72b0263d601807bbd23ed25cbe6f4da89f4 ] iput() has already judged the incoming parameter, so there is no need to repeat the judgment here. Link: https://lore.kernel.org/r/20220311151240.62045-1-libang.linuxer@gmail.com Signed-off-by: Bang Li <libang.linuxer@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Amir Goldstein	8d128b31ef	fsnotify: optimize FS_MODIFY events with no ignored masks [ Upstream commit 04e317ba72d07901b03399b3d1525e83424df5b3 ] fsnotify() treats FS_MODIFY events specially - it does not skip them even if the FS_MODIFY event does not apear in the object's fsnotify mask. This is because send_to_group() checks if FS_MODIFY needs to clear ignored mask of marks. The common case is that an object does not have any mark with ignored mask and in particular, that it does not have a mark with ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag. Set FS_MODIFY in object's fsnotify mask during fsnotify_recalc_mask() if object has a mark with an ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag and remove the special treatment of FS_MODIFY in fsnotify(), so that FS_MODIFY events could be optimized in the common case. Call fsnotify_recalc_mask() from fanotify after adding or removing an ignored mask from a mark without FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY or when adding the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag to a mark with ignored mask (the flag cannot be removed by fanotify uapi). Performance results for doing 10000000 write(2)s to tmpfs: vanilla patched without notification mark 25.486+-1.054 24.965+-0.244 with notification mark 30.111+-0.139 26.891+-1.355 So we can see the overhead of notification subsystem has been drastically reduced. Link: https://lore.kernel.org/r/20220223151438.790268-3-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Amir Goldstein	362f2ebfa5	fsnotify: fix merge with parent's ignored mask [ Upstream commit 4f0b903ded728c505850daf2914bfc08841f0ae6 ] fsnotify_parent() does not consider the parent's mark at all unless the parent inode shows interest in events on children and in the specific event. So unless parent added an event to both its mark mask and ignored mask, the event will not be ignored. Fix this by declaring the interest of an object in an event when the event is in either a mark mask or ignored mask. Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Jakob Koschel	58137c0ba5	nfsd: fix using the correct variable for sizeof() [ Upstream commit 4fc5f5346592cdc91689455d83885b0af65d71b8 ] While the original code is valid, it is not the obvious choice for the sizeof() call and in preparation to limit the scope of the list iterator variable the sizeof should be changed to the size of the destination. Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	c4ed3e405e	NFSD: Clean up _lm_ operation names [ Upstream commit 35aff0678f99b0623bb72d50112de9e163a19559 ] The common practice is to name function instances the same as the method names, but with a uniquifying prefix. Commit aef9583b234a ("NFSD: Get reference of lockowner when coping file_lock") missed this -- the new function names should both have been of the form "nfsd4_lm_*". Before more lock manager operations are added in NFSD, rename these two functions for consistency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	3cc88ae66e	NFSD: Remove CONFIG_NFSD_V3 [ Upstream commit 5f9a62ff7d2808c7b56c0ec90f3b7eae5872afe6 ] Eventually support for NFSv2 in the Linux NFS server is to be deprecated and then removed. However, NFSv2 is the "always supported" version that is available as soon as CONFIG_NFSD is set. Before NFSv2 support can be removed, we need to choose a different "always supported" version. This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported, as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become the only "always supported" NFS version. The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	2c57725d69	NFSD: Move svc_serv_ops::svo_function into struct svc_serv [ Upstream commit 37902c6313090235c847af89c5515591261ee338 ] Hoist svo_function back into svc_serv and remove struct svc_serv_ops, since the struct is now devoid of fields. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	2b70332cac	NFSD: Remove svc_serv_ops::svo_module [ Upstream commit f49169c97fceb21ad6a0aaf671c50b0f520f15a5 ] struct svc_serv_ops is about to be removed. Neil Brown says: > I suspect svo_module can go as well - I don't think the thread is > ever the thing that primarily keeps a module active. A random sample of kthread_create() callers shows sunrpc is the only one that manages module reference count in this way. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	e3a0f44371	SUNRPC: Remove svc_shutdown_net() [ Upstream commit c7d7ec8f043e53ad16e30f5ebb8b9df415ec0f2b ] Clean up: svc_shutdown_net() now does nothing but call svc_close_net(). Replace all external call sites. svc_close_net() is renamed to be the inverse of svc_xprt_create(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	758fcc041e	SUNRPC: Rename svc_close_xprt() [ Upstream commit 4355d767a21b9445958fc11bce9a9701f76529d3 ] Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	bee0c024d6	SUNRPC: Rename svc_create_xprt() [ Upstream commit 352ad31448fecc78a2e9b78da64eea5d63b8d0ce ] Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	9b54c62e38	SUNRPC: Remove svo_shutdown method [ Upstream commit 87cdd8641c8a1ec6afd2468265e20840a57fd888 ] Clean up. Neil observed that "any code that calls svc_shutdown_net() knows what the shutdown function should be, and so can call it directly." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:54 +01:00
Chuck Lever	de06412bad	SUNRPC: Remove the .svo_enqueue_xprt method [ Upstream commit a9ff2e99e9fa501ec965da03c18a5422b37a2f44 ] We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support. As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly. This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:53 +01:00
Chuck Lever	265c1446e6	NFSD: Streamline the rare "found" case [ Upstream commit add1511c38166cf1036765f8c4aa939f0275a799 ] Move a rarely called function call site out of the hot path. This is an exceptionally small improvement because the compiler inlines most of the functions that nfsd_cache_lookup() calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-19 12:27:53 +01:00

... 5 6 7 8 9 ...

1345 commits