[ Upstream commit b840be2f00c0bc00d993f8f76e251052b83e4382 ]
As in the v4 case, it doesn't work well to block waiting for a lock on
an nfs filesystem.
As in the v4 case, that means we're depending on the client to poll.
It's probably incorrect to depend on that, but I *think* clients do poll
in practice. In any case, it's an improvement over hanging the lockd
thread indefinitely as we currently are.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f657f8eef3ff870552c9fd2839e0061046f44618 ]
NFS implements blocking locks by blocking inside its lock method. In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock. It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.
Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks. The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f024fcd5c97dc70bb9121c80407cf3cf9be7159 ]
We shouldn't really be using a read-only file descriptor to take a write
lock.
Most filesystems will put up with it. But NFS, for example, won't.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b661601a9fdf1af8516e1100de8bba84bd41cca4 ]
Update comment to reflect that we *do* allow reexport, whether it's a
good idea or not....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a81041b7d8f08c4e1014173c5483a0f18724a576 ]
Make this lookup slightly more concise, and prepare for changing how we
look this up in a following patch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2dc6f19e4f438d4c14987cb17aee38aaf7304e7f ]
It'll come in handy to get the whole nlm_lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d02a3a2cb25d384005a6e3446a445013342024b7 ]
nsm_use_hostnames is a module parameter and it will be exported to sysctl
procfs. This is to let user sometimes change it from userspace. But the
minimal unit for sysctl procfs read/write it sizeof(int).
In big endian system, the converting from/to bool to/from int will cause
error for proc items.
This patch use a new proc_handler proc_dobool to fix it.
Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
[thuth: Fix typo in commit message]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea49dc79002c416a9003f3204bc14f846a0dbcae ]
Including one's name in copyright claims is appropriate. Including it
in random comments is just vanity. After 2 decades, it is time for
these to be gone.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 496d83cf0f2fa70cfe256c2499e2d3523d3868f3 ]
Large splice reads call put_page() repeatedly. put_page() is
relatively expensive to call, so replace it with the new
svc_rqst_replace_page() helper to help amortize that cost.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7e0b781b73c2e26e442ed71397cc2bc5945a732 ]
A few useful observations:
- The value in @size is never modified.
- splice_desc.len is an unsigned int, and so is xdr_buf.page_len.
An implicit cast to size_t is unnecessary.
- The computation of .page_len is the same in all three arms
of the "if" statement, so hoist it out to make it clear that
the operation is an unconditional invariant.
The resulting function is 18 bytes shorter on my system (-Os).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec44610fe2b86daef70f3f53f47d2a2542d7094f ]
Rename s_fsnotify_inode_refs to s_fsnotify_connectors and count all
objects with attached connectors, not only inodes with attached
connectors.
This will be used to optimize fsnotify() calls on sb without any
type of marks.
Link: https://lore.kernel.org/r/20210810151220.285179-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11fa333b58ba1518e7c69fafb6513a0117f8fe33 ]
Instead of incrementing s_fsnotify_inode_refs when detaching connector
from inode, increment it earlier when attaching connector to inode.
Next patch is going to use s_fsnotify_inode_refs to count all objects
with attached connectors.
Link: https://lore.kernel.org/r/20210810151220.285179-3-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09ddbe69c9925b42cb9529f60678c25b241d8b18 ]
We must have a reference on inode, so ihold is cheaper.
Link: https://lore.kernel.org/r/20210810151220.285179-2-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af579beb666aefb17e9a335c12c788c92932baf1 ]
Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
allows userspace applications to control whether a pidfd information
record containing a pidfd is to be returned alongside the generic
event metadata for each event.
If FAN_REPORT_PIDFD is enabled for a notification group, an additional
struct fanotify_event_info_pidfd object type will be supplied
alongside the generic struct fanotify_event_metadata for a single
event. This functionality is analogous to that of FAN_REPORT_FID in
terms of how the event structure is supplied to a userspace
application. Usage of FAN_REPORT_PIDFD with
FAN_REPORT_FID/FAN_REPORT_DFID_NAME is permitted, and in this case a
struct fanotify_event_info_pidfd object will likely follow any struct
fanotify_event_info_fid object.
Currently, the usage of the FAN_REPORT_TID flag is not permitted along
with FAN_REPORT_PIDFD as the pidfd API currently only supports the
creation of pidfds for thread-group leaders. Additionally, usage of
the FAN_REPORT_PIDFD flag is limited to privileged processes only
i.e. event listeners that are running with the CAP_SYS_ADMIN
capability. Attempting to supply the FAN_REPORT_TID initialization
flags with FAN_REPORT_PIDFD or creating a notification group without
CAP_SYS_ADMIN will result with -EINVAL being returned to the caller.
In the event of a pidfd creation error, there are two types of error
values that can be reported back to the listener. There is
FAN_NOPIDFD, which will be reported in cases where the process
responsible for generating the event has terminated prior to the event
listener being able to read the event. Then there is FAN_EPIDFD, which
will be reported when a more generic pidfd creation error has occurred
when fanotify calls pidfd_create().
Link: https://lore.kernel.org/r/5f9e09cff7ed62bfaa51c1369e0f7ea5f16a91aa.1628398044.git.repnop@google.com
Signed-off-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0aca67bb7f0d8c997dfef8ff0bfeb0afb361f0e6 ]
The copy_info_records_to_user() helper allows for the separation of
info record copying routines/conditionals from copy_event_to_user(),
which reduces the overall clutter within this function. This becomes
especially true as we start introducing additional info records in the
future i.e. struct fanotify_event_info_pidfd. On success, this helper
returns the total amount of bytes that have been copied into the user
supplied buffer and on error, a negative value is returned to the
caller.
The newly defined macro FANOTIFY_INFO_MODES can be used to obtain info
record types that have been enabled for a specific notification
group. This macro becomes useful in the subsequent patch when the
FAN_REPORT_PIDFD initialization flag is introduced.
Link: https://lore.kernel.org/r/8872947dfe12ce8ae6e9a7f2d49ea29bc8006af0.1628398044.git.repnop@google.com
Signed-off-by: Matthew Bobrowski <repnop@google.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[ cel: adjusted to apply to v5.10.y ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3424c9bac893bd06f38a20474cd622881d384ca ]
With the idea to support additional info record types in the future
i.e. fanotify_event_info_pidfd, it's a good idea to rename some of the
labels assigned to some of the existing fid related functions,
parameters, etc which more accurately represent the intent behind
their usage.
For example, copy_info_to_user() was defined with a generic function
label, which arguably reads as being supportive of different info
record types, however the parameter list for this function is
explicitly tailored towards the creation and copying of the
fanotify_event_info_fid records. This same point applies to the macro
defined as FANOTIFY_INFO_HDR_LEN.
With fanotify_event_info_len(), we change the parameter label so that
the function implies that it can be extended to calculate the length
for additional info record types.
Link: https://lore.kernel.org/r/7c3ec33f3c718dac40764305d4d494d858f59c51.1628398044.git.repnop@google.com
Signed-off-by: Matthew Bobrowski <repnop@google.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ab1016d39cc052064e32f25ad18ef8767a0ee3b8 ]
In error cases the dentry may be NULL.
Before 20798dfe249a, the encoder also checked dentry and
d_really_is_positive(dentry), but that looks like overkill to me--zero
status should be enough to guarantee a positive dentry.
This isn't the first time we've seen an error-case NULL dereference
hidden in the initialization of a local variable in an xdr encoder. But
I went back through the other recent rewrites and didn't spot any
similar bugs.
Reported-by: JianHong Yin <jiyin@redhat.com>
Reviewed-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder...")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b08cf62b1239a4322427d677ea9363f0ab677c6 ]
The double copy of the string is a mistake, plus __assign_str()
uses strlen(), which is wrong to do on a string that isn't
guaranteed to be NUL-terminated.
Fixes: 6019ce0742ca ("NFSD: Add a tracepoint to record directory entry encoding")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e34c0ce9136a0fe96f0f547898d14c44f3c9f147 ]
The pointer 'this' is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6a63ca5652ea05637ecfe349f9e895031529556 ]
Add a .h file containing xdr_stream-based XDR helpers common to both
NLMv3 and NLMv4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9ad1a8090f58b2ed1774dd0f4c7cdb8210a3793 ]
To enable xdr_stream-based encoding and decoding, create a bespoke
RPC dispatch function for the lockd service.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05570a2b01117209b500e1989ce8f1b0524c489f ]
I'm not even sure cl_xprt can change here, but we're getting "suspicious
RCU usage" warnings, and other rpc_peeraddr2str callers are taking the
rcu lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54185267e1fe476875e649bb18e1c4254c123305 ]
'status' has been overwritten to 0 after nfsd4_ssc_setup_dul(), this
cause 0 will be return in vfs_kern_mount() error case. Fix to return
nfserr_nodev in this error.
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f47dc2d3013c65631bf8903becc7d88dc9d9966e ]
Fix by initializing pointer nfsd4_ssc_umount_item with NULL instead of 0.
Replace return value of nfsd4_ssc_setup_dul with __be32 instead of int.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3518c8666f15cdd5d38878005dab1d589add1c19 ]
In addition to the client's address, display the callback channel
state and address in the 'info' file.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 934bd07fae7e55232845f909f78873ab8678ca74 ]
This was causing a "sleeping function called from invalid context"
warning.
I don't think we need the set_and_test_bit() here; clients move from
unconfirmed to confirmed only once, under the client_lock.
The (conf == unconf) is a way to check whether we're in that confirming
case, hopefully that's not too obscure.
Fixes: 472d155a0631 "nfsd: report client confirmation status in "info" file"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4e44b393389c77958f7c58bf4415032b4cda15b ]
Currently the source's export is mounted and unmounted on every
inter-server copy operation. This patch is an enhancement to delay
the unmount of the source export for a certain period of time to
eliminate the mount and unmount overhead on subsequent copy operations.
After a copy operation completes, a work entry is added to the
delayed unmount list with an expiration time. This list is serviced
by the laundromat thread to unmount the export of the expired entries.
Each time the export is being used again, its expiration time is
extended and the entry is re-inserted to the tail of the list.
The unmount task and the mount operation of the copy request are
synced to make sure the export is not unmounted while it's being
used.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eac0b17a77fbd763d305a5eaa4fd1119e5a0fe0d ]
Currently, the server does all copies as NFS_UNSTABLE. For synchronous
copies linux client will append a COMMIT to the COPY compound but for
async copies it does not (because COMMIT needs to be done after all
bytes are copied and not as a reply to the COPY operation).
However, in order to save the client doing a COMMIT as a separate
rpc, the server can reply back with NFS_FILE_SYNC copy. This patch
proposed to add vfs_fsync() call at the end of the async copy.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: adjusted to apply to v5.10.y ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eeeadbb9bd5652c47bb9b31aa9ad8b4f1b4aa8b3 ]
The commit may be time-consuming and there's no need to hold the lock
for it.
More of these are possible, these were just some easy ones.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e5d74a2d0ee67ae00edad43c3d7811016e4d2e21 ]
Truncation of an unlinked inode may take a long time for I/O waiting, and
it doesn't have to prevent access to the directory. Thus, let truncation
occur outside the directory's mutex, just like do_unlinkat() does.
Signed-off-by: Yu Hsiang Huang <nickhuang@synology.com>
Signed-off-by: Bing Jing Chang <bingjingc@synology.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6cbe98ff32aef795462a309ef048cfb89d1a11d ]
Clean-up: Re-order the display of IP address and client ID to be
consistent with other _cb_ tracepoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d2bf65983a137121c165a7e69b2885572954915 ]
Clean up: These are noise in properly working systems. If you really
need to observe the operation of the callback mechanism, use the
sunrpc:rpc\* tracepoints along with the workqueue tracepoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ade892ae1c35527584decb7fa026553d53cd03f ]
Record a tracepoint event when the server performs a callback
probe. This event can be enabled as a group with other nfsd_cb
tracepoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17d76ddf76e4972411402743eea7243d9a46f4f9 ]
Renamed so it can be enabled as a set with the other nfsd_cb_
tracepoints. And, consistent with those tracepoints, report the
address of the client, the client ID the server has given it, and
the state ID being recalled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87512386e951ee28ba2e7ef32b843ac97621d371 ]
Record the arguments of CB_OFFLOAD callbacks so we can better
observe asynchronous copy-offload behavior. For example:
nfsd-995 [008] 7721.934222: nfsd_cb_offload:
addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8739113a
count=116528 status=0
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cde7f8118f0fea29ad73ddcf28817f95adeffd5 ]
When the server kicks off a CB_LM_NOTIFY callback, record its
arguments so we can better observe asynchronous locking behavior.
For example:
nfsd-998 [002] 1471.705873: nfsd_cb_notify_lock: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8950b23a
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f57c6062bf3ce2c6ab9ba60040b34e8134ef259 ]
Display the transport protocol and authentication flavor so admins
can see what they might be getting wrong.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b200f0e35338b052976b6c5759e4f77a3013e6f6 ]
Show when the upper layer requested a shutdown. RPC tracepoints can
already show when rpc_shutdown_client() is called.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 806d65b617d89be887fe68bfa051f78143669cd7 ]
Provide more clarity about when the callback channel is in trouble.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 167145cc64ce4b4b177e636829909a6b14004f9e ]
TRACE_DEFINE_ENUM() is necessary for enum {} but not for C macros.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8476c69a7fa0f1f9705ec0caa4e97c08b5045779 ]
We were missing one.
As a clean-up, add a helper that sets the new CB state and fires
a tracepoint. The tracepoint fires only when the state changes, to
help reduce trace log noise.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1736aec82a15cb5d4b3bbe0b2fbae0ede66b1a1a ]
Enable knfsd_fh_hash() to be invoked in functions where the
filehandle pointer is a const.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8f80c5545ec5794644b48537449e48b009d608d ]
Some of the most common cases are traced. Enough infrastructure is
now in place that more can be added later, as needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c41a9b7a906fb872f8b2b1a34d2a1d5ef7f94adb ]
Record client-requested termination of client IDs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0bfaacac57e64aa342f865b8ddcab06ca59a6f83 ]
This tracepoint has been replaced by nfsd_clid_cred_mismatch and
nfsd_clid_verf_mismatch, and can simply be removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 744ea54c869cebe41fbad5f53f8a8ca5d93a5c97 ]
Record when a client presents a different boot verifier than the
one we know about. Typically this is a sign the client has
rebooted, but sometimes it signals a conflicting client ID, which
the client's administrator will need to address.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 27787733ef44332fce749aa853f2749d141982b0 ]
Record when a client tries to establish a lease record but uses an
unexpected credential. This is often a sign of a configuration
problem.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87b2394d60c32c158ebb96ace4abee883baf1239 ]
To be used in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a8b98c808eab3ec8f1b5a64be967b0f4af4cae43 ]
Reporting event->pid should depend on the privileges of the user that
initialized the group, not the privileges of the user reading the
events.
Use an internal group flag FANOTIFY_UNPRIV to record the fact that the
group was initialized by an unprivileged user.
To be on the safe side, the premissions to setup filesystem and mount
marks now require that both the user that initialized the group and
the user setting up the mark have CAP_SYS_ADMIN.
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiA77_P5vtv7e83g0+9d7B5W9ZTE4GfQEYbWmfT1rA=VA@mail.gmail.com/
Fixes: 7cea2a3c505e ("fanotify: support limited functionality for unprivileged users")
Cc: <Stable@vger.kernel.org> # v5.12+
Link: https://lore.kernel.org/r/20210524135321.2190062-1-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b876d708316bf9b6b9678eb2beb289b93cfe6369 ]
The change attribute is always set by all NFS client versions so get rid
of the open-coded version.
Fixes: 3cc55f4434b4 ("nfs: use change attribute for NFS re-exports")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9092b4bb2109502eb8972021a3f74febc931a63 ]
The client SSC code should not depend on any of the CONFIG_NFSD config.
This patch removes all CONFIG_NFSD from NFSv4.2 client SSC code and
simplifies the config of CONFIG_NFS_V4_2_SSC_HELPER, NFSD_V4_2_INTER_SSC.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76c50eb70d8e1133eaada0013845619c36345fbc ]
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding a couple of break statements instead of
just letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aba2072f452346d56a462718bcde93d697383148 ]
It's OK to grant a read delegation to a client that holds a write,
as long as it's the only client holding the write.
We originally tried to do this in commit 94415b06eb8a ("nfsd4: a
client's own opens needn't prevent delegations"), which had to be
reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own
opens needn't prevent delegations"").
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebd9d2c2f5a7ebaaed2d7bb4dee148755f46033d ]
No change in behavior, I'm just moving some code around to avoid forward
references in a following patch.
(To do someday: figure out how to split up nfs4state.c. It's big and
disorganized.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0ce48375a367222989c2618fe68bf34db8c7bb7 ]
It's unusual but possible for multiple filehandles to point to the same
file. In that case, we may end up with multiple nfs4_files referencing
the same inode.
For delegation purposes it will turn out to be useful to flag those
cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9b60e2209213fdfcc504ba25a404977c5d08b77 ]
The nfs4_file structure is per-filehandle, not per-inode, because the
spec requires open and other state to be per filehandle.
But it will turn out to be convenient for nfs4_files associated with the
same inode to be hashed to the same bucket, so let's hash on the inode
instead of the filehandle.
Filehandle aliasing is rare, so that shouldn't have much performance
impact.
(If you have a ton of exported filesystems, though, and all of them have
a root with inode number 2, could that get you an overlong hash chain?
Perhaps this (and the v4 open file cache) should be hashed on the inode
pointer instead.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 70c5307564035c160078401f541c397d77b95415 ]
Since commit 501cb1849f86 ("nfsd: rip out the raparms cache")
nrservs is not used in nfsd_startup_generic()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22d483b99863202e3631ff66fa0f3c2302c0f96f ]
I don't see an obvious reason why the upper 32 bit check needs to be
open-coded this way. Switch to upper_32_bits() which is more idiomatic and
should conceptually be the same check.
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210325083742.2334933-1-brauner@kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7cea2a3c505e87a9d6afc78be4a7f7be636a73a7 ]
Add limited support for unprivileged fanotify groups.
An unprivileged users is not allowed to get an open file descriptor in
the event nor the process pid of another process. An unprivileged user
cannot request permission events, cannot set mount/filesystem marks and
cannot request unlimited queue/marks.
This enables the limited functionality similar to inotify when watching a
set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without
requiring SYS_CAP_ADMIN privileges.
The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged
listener watching a set of directories (with FAN_EVENT_ON_CHILD) to monitor
all changes inside those directories.
This typically requires that the listener keeps a map of watched directory
fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at()
before starting to watch for changes.
When getting an event, the reported fid of the parent should be resolved
to dirfd and fstatsat(2) with dirfd and name should be used to query the
state of the filesystem entry.
Link: https://lore.kernel.org/r/20210304112921.3996419-3-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5b8fea65d197f408bb00b251c70d842826d6b70b ]
fanotify has some hardcoded limits. The only APIs to escape those limits
are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.
Allow finer grained tuning of the system limits via sysfs tunables under
/proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify,
with some minor differences.
- max_queued_events - global system tunable for group queue size limit.
Like the inotify tunable with the same name, it defaults to 16384 and
applies on initialization of a new group.
- max_user_marks - user ns tunable for marks limit per user.
Like the inotify tunable named max_user_watches, on a machine with
sufficient RAM and it defaults to 1048576 in init userns and can be
further limited per containing user ns.
- max_user_groups - user ns tunable for number of groups per user.
Like the inotify tunable named max_user_instances, it defaults to 128
in init userns and can be further limited per containing user ns.
The slightly different tunable names used for fanotify are derived from
the "group" and "mark" terminology used in the fanotify man pages and
throughout the code.
Considering the fact that the default value for max_user_instances was
increased in kernel v5.10 from 8192 to 1048576, leaving the legacy
fanotify limit of 8192 marks per group in addition to the max_user_marks
limit makes little sense, so the per group marks limit has been removed.
Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own
marks are not accounted in the per user marks account, so in effect the
limit of max_user_marks is only for the collection of groups that are
not initialized with FAN_UNLIMITED_MARKS.
Link: https://lore.kernel.org/r/20210304112921.3996419-2-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8cd0ee8cda68a888a317991c1e918a8cba1a568 ]
Event merges are expensive when event queue size is large, so limit the
linear search to 128 merge tests.
In combination with 128 size hash table, there is a potential to merge
with up to 16K events in the hashed queue.
Link: https://lore.kernel.org/r/20210304104826.3993892-6-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94e00d28a680dff18805ca472b191364347d2234 ]
In order to improve event merge performance, hash events in a 128 size
hash table by the event merge key.
The fanotify_event size grows by two pointers, but we just reduced its
size by removing the objectid member, so overall its size is increased
by one pointer.
Permission events and overflow event are not merged so they are also
not hashed.
Link: https://lore.kernel.org/r/20210304104826.3993892-5-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e3e5c6943994943eb76cab2d3a1806bc10b9045 ]
Improve the merge key hash by mixing more values relevant for merge.
For example, all FAN_CREATE name events in the same dir used to have the
same merge key based on the dir inode. With this change the created
file name is mixed into the merge key.
The object id that was used as merge key is redundant to the event info
so it is no longer mixed into the hash.
Permission events are not hashed, so no need to hash their info.
Link: https://lore.kernel.org/r/20210304104826.3993892-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8988f11abb820bacfcc53d498370bfb30f792ec4 ]
objectid is only used by fanotify backend and it is just an optimization
for event merge before comparing all fields in event.
Move the objectid member from common struct fsnotify_event into struct
fanotify_event and reduce it to 29-bit hash to cram it together with the
3-bit event type.
Events of different types are never merged, so the combination of event
type and hash form a 32-bit key for fast compare of events.
This reduces the size of events by one pointer and paves the way for
adding hashed queue support for fanotify.
Link: https://lore.kernel.org/r/20210304104826.3993892-3-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f73171e192366ff7c98af9fb50615ef9615f8a7 ]
Current code has an assumtion that fsnotify_notify_queue_is_empty() is
called to verify that queue is not empty before trying to peek or remove
an event from queue.
Remove this assumption by moving the fsnotify_notify_queue_is_empty()
into the functions, allow them to return NULL value and check return
value by all callers.
This is a prep patch for multi event queues.
Link: https://lore.kernel.org/r/20210304104826.3993892-2-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b73ac6808b0f7994a05ebc38571e2e9eaf98a0f4 ]
spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 472d155a0631bd1a09b5c0c275a254e65605d683 ]
mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.
Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.
So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.
This requires a bit of infrastructure to keep the dentry for the "info"
file. There is no need to take a counted reference as the dentry must
remain around until the client is removed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7a833e9cc6c3b58fe94f049d2b40943cba07086 ]
Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.
Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 792a5112aa90e59c048b601c6382fe3498d75db7 ]
>From https://tools.ietf.org/html/rfc7862#page-65
A count of 0 (zero) requests that all bytes from ca_src_offset
through EOF be copied to the destination.
Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f988a7b71d1e66e63f79cd59c763875347943a7a ]
`printk()`, by default, uses the log level warning, which leaves the
user reading
NFSD: Using UMH upcall client tracking operations.
wondering what to do about it (`dmesg --level=warn`).
Several client tracking methods are tried, and expected to fail. That’s
why a message is printed only on success. It might be interesting for
users to know the chosen method, so use info-level instead of
debug-level.
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f7e7a4006f74b031718055a0751c70c2e3d5e7e ]
We do this same logic repeatedly, and it's easy to get the sense of the
comparison wrong.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 219a170502b3d597c52eeec088aee8fbf7b90da5 ]
These are no longer needed because there are no dprintk() call sites
in these files.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 778f068fa0c0846b650ebdb8795fd51b5badc332 ]
The SETACL result encoder is exactly the same as the NFSv2
attrstatres decoder.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8141d6a2bb6c655ff0c0b81ced80d9025f03e926 ]
Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d52532002ffa217ad3fa4c3ba86c95203d21dd21 ]
Refactor: Add helper function similar to nfs3svc_encode_cookie3().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 76ed0dd96eeb2771b21bf5dcbd88326ef89ee0ed ]
During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances
rq_next_page to the full size of the client-requested buffer, then
releases all those pages at the end of the request. The next request
to use that nfsd thread has to refill the pages.
NFSD does this even when the dirlist in the reply is small. With
NFSv3 clients that send READDIR operations with large buffer sizes,
that can be 256 put_page/alloc_page pairs per READDIR request, even
though those pages often remain unused.
We can save some work by not releasing dirlist buffer pages that
were not used to form the READDIR Reply. I've left the NFSv2 code
alone since there are never more than three pages involved in an
NFSv2 READDIR Reply.
Eventually we should nail down why these pages need to be released
at all in order to avoid allocating and releasing pages
unnecessarily.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f87fc2d34d475225e78b7f5c4eabb121f4282b2 ]
The benefit of the xdr_stream helpers is that they transparently
handle encoding an XDR data item that crosses page boundaries.
Most of the open-coded logic to do that here can be eliminated.
A sub-buffer and sub-stream are set up as a sink buffer for the
directory entry encoder. As an entry is encoded, it is added to
the end of the content in this buffer/stream. The total length of
the directory list is tracked in the buffer's @len field.
When it comes time to encode the Reply, the sub-buffer is merged
into rq_res's page array at the correct place using
xdr_write_pages().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1409e2de4f11034c8eb30775cc3e37039a4ef13 ]
Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ef2826c761079e27904c85034df34e601b82d94 ]
As an additional clean up, encode_wcc_data() is removed because it
is now no longer used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5cf353354af1a385f29dec4609a1532d32c83a25 ]
Also, clean up: Rename the encoder function to match the name of
the result structure in RFC 1813, consistent with other encoder
function names in nfs3xdr.c. "diropres" is an NFSv2 thingie.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c42f804d30f6a8d86665eca84071b316821ea08 ]
As an additional clean up, some renaming is done to more closely
reflect the data type and variable names used in the NFSv3 XDR
definition provided in RFC 1813. "attrstat" is an NFSv2 thingie.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bddfdbcddbe267519cd36aeb115fdf8620980111 ]
NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.
nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fe61450972d3900bffb1dc26a17ebb9cdd92db2 ]
In order to handle idmapped mounts we will extend the vfs rename helper
to take two new arguments in follow up patches. Since this operations
already takes a bunch of arguments add a simple struct renamedata and
make the current helper use it before we extend it.
Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
[ cel: backported to 5.10.y, prior to idmapped mounts ]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02f92b3868a1b34ab98464e76b0e4e060474ba10 ]
Add two simple helpers to check permissions on a file and path
respectively and convert over some callers. It simplifies quite a few
codepaths and also reduces the churn in later patches quite a bit.
Christoph also correctly points out that this makes codepaths (e.g.
ioctls) way easier to follow that would otherwise have to do more
complex argument passing than necessary.
Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac7b79fd190b02e7151bc7d2b9da692f537657f3 ]
Currently the fs sysctl inotify/max_user_instances is used to limit the
number of inotify instances on the system. For systems running multiple
workloads, the per-user namespace sysctl max_inotify_instances can be
used to further partition inotify instances. However there is no easy
way to set a sensible system level max limit on inotify instances and
further partition it between the workloads. It is much easier to charge
the underlying resource (i.e. memory) behind the inotify instances to
the memcg of the workload and let their memory limits limit the number
of inotify instances they can create.
With inotify instances charged to memcg, the admin can simply set
max_user_instances to INT_MAX and let the memcg limits of the jobs limit
their inotify instances.
Link: https://lore.kernel.org/r/20201220044608.1258123-1-shakeelb@google.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 428a23d2bf0ca8fd4d364a464c3e468f0e81671e ]
In the typical case of v4 and an i_version-supporting filesystem, we can
skip a stat which is only required to fake up a change attribute from
ctime.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cc55f4434b421d37300aa9a167ace7d60b45ccf ]
When exporting NFS, we may as well use the real change attribute
returned by the original server instead of faking up a change attribute
from the ctime.
Note we can't do that by setting I_VERSION--that would also turn on the
logic in iversion.h which treats the lower bit specially, and that
doesn't make sense for NFS.
So instead we define a new export operation for filesystems like NFS
that want to manage the change attribute themselves.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02591f9febd5f69bb4c266a4abf899c4cf21964f ]
Currently NFSv4_2 SSC helper, nfs_ssc, incorrectly uses GRACE_PERIOD
as its config. Fix by adding new config NFS_V4_2_SSC_HELPER which
depends on NFS_V4_2 and is automatically selected when NFSD_V4 is
enabled. Also removed the file name from a comment in nfs_ssc.c.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ec59659b4972ec25851aa03b4b5baba6764a62e4 ]
I'm not sure why we're writing this out the hard way in so many places.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1722b04624806ced51693f546edb83e8b2297a77 ]
The set_client() was already taken care of by process_open1().
The comments here are mostly redundant with the code.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f71475ba8c2a77fff8051903cf4b7d826c3d1693 ]
Every caller is setting this argument to false, so we don't need it.
Also cut this comment a bit and remove an unnecessary warning.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47fdb22dacae78f37701d82a94c16a014186d34e ]
I think this unusual use of struct compound_state could cause confusion.
It's not that much more complicated just to open-code this stateid
lookup.
The only change in behavior should be a different error return in the
case the copy is using a source stateid that is a revoked delegation,
but I doubt that matters.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: squashed in fix reported by Coverity ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 460d27091ae2c23e7ac959a61cd481c58832db58 ]
I think this is a better name, and I'm going to reuse elsewhere the code
that does the lookup itself.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4587eb2cf4b6271f67fb93b75f7de2a2026e853 ]
You can take the single-exit thing too far, I think.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9d53a75cf574d6aa41f3cb4968fffe4f64e0fad ]
Similarly, this STALE_CLIENTID check is already handled by:
nfs4_preprocess_confirmed_seqid_op()->
nfs4_preprocess_seqid_op()->
nfsd4_lookup_stateid()->
set_client()->
STALE_CLIENTID()
(This may cause it to return a different error in some cases where
there are multiple things wrong; pynfs test SEQ10 regressed on this
commit because of that, but I think that's the test's fault, and I've
fixed it separately.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33311873adb0d55c287b164117b5b4bb7b1bdc40 ]
This STALE_CLIENTID check is redundant with the one in
lookup_clientid().
There's a difference in behavior is in case of memory allocation
failure, which I think isn't a big deal.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20ad856e47323e208ae8d6a9ecfe5bf0be6f505e ]
Collect some nfsd stats per export in addition to the global stats.
A new nfsdfs export_stats file is created. It uses the same ops as the
exports file to iterate the export entries and we use the file's name to
determine the reported info per export. For example:
$ cat /proc/fs/nfsd/export_stats
# Version 1.1
# Path Client Start-time
# Stats
/test localhost 92
fh_stale: 0
io_read: 9
io_write: 1
Every export entry reports the start time when stats collection
started, so stats collecting scripts can know if stats where reset
between samples.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e567b98ce9a4b35b63c364d24828a9e5cd7a8179 ]
nfsd stats counters can be updated by concurrent nfsd threads without any
protection.
Convert some nfsd_stats and nfsd_net struct members to use percpu counters.
The longest_chain* members of struct nfsd_net remain unprotected.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b76d1df1a3683b6b23cd1c813d13c5e6a9d35e5 ]
Commit 501cb1849f86 ("nfsd: rip out the raparms cache") removed the
code that updates read-ahead cache stats counters,
commit 8bbfa9f3889b ("knfsd: remove the nfsd thread busy histogram")
removed code that updates the thread busy stats counters back in 2009
and code that updated filehandle cache stats was removed back in 2002.
Remove the unused stats counters from nfsd_stats struct and print
hardcoded zeros in /proc/net/rpc/nfsd.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 571d31f37a57729c9d3463b5a692a84e619b408a ]
Since the ACL GETATTR procedure is the same as the normal GETATTR
procedure, simply re-use nfssvc_decode_fhandleargs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5650682e16f41722f735b7beeb2dbc3411dfbeb6 ]
Now that the argument decoders for NFSv2 and NFSv3 use the
xdr_stream mechanism, the version-specific length checking logic in
nfsd_dispatch() is no longer necessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8688361ae2edb8f7e61d926dc5000c9a44f29370 ]
As an additional clean up, move code not related to XDR decoding
into readdir's .pc_func call out.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 788cd46ecf83ee2d561cb4e754e276dc8089b787 ]
Add a helper similar to nfsd3_init_dirlist_pages().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1fcbd1c9456ba129d38420e345e91c4b6363db47 ]
If the code that sets up the sink buffer for nfsd_readlink() is
moved adjacent to the nfsd_readlink() call site that uses it, then
the only argument is a file handle, and the fhandle decoder can be
used instead.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c293ef993c8df0b1bea9ecb0de6eb96dec3ac9d ]
The code that sets up rq_vec is refactored so that it is now
adjacent to the nfsd_read() call site where it is used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8a38e2d6c885f9d7cd03febc515d36293de4a5b ]
This commit removes the last usage of the original decode_sattr3(),
so it is removed as a clean-up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da39201637297460c13134c29286a00f3a1c92fe ]
Similar to the WRITE decoder, code that checks the sanity of the
payload size is re-wired to work with xdr_stream infrastructure.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9cedc2e64c296efb3bebe93a0ceeb5e71e8d722d ]
As an additional clean up, neither nfsd3_proc_readdir() nor
nfsd3_proc_readdirplus() make use of the dircount argument, so
remove it from struct nfsd3_readdirargs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 40116ebd0934cca7e46423bdb3397d3d27eb9fb9 ]
De-duplicate some code that is used by both READDIR and READDIRPLUS
to build the dirlist in the Reply. Because this code is not related
to decoding READ arguments, it is moved to a more appropriate spot.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a8f37fb34a96267c656f7254e69bb9a2fc89fe4 ]
Code inspection shows that the server's NFSv3 READDIR implementation
handles offset cookies slightly differently than the NFSv2 READDIR,
NFSv3 READDIRPLUS, and NFSv4 READDIR implementations,
and there doesn't seem to be any need for this difference.
As a clean up, I copied the logic from nfsd3_proc_readdirplus().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 224c1c894e48cd72e4dd9fb6311be80cbe1369b0 ]
The NFSv3 READLINK request takes a single filehandle, so it can
re-use GETATTR's decoder.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>