We've implemented two new NFSD procfs files: o /proc/fs/nfsd/unlock_ip o /proc/fs/nfsd/unlock_filesystem They are intended to allow admin or user mode script to release NLM locks based on either a path name or a server in-bound ip address (ipv4 for now) as; shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem The expected usage is for High Availability (HA) environment where nfs servers are clustered together to provide either load balancing or take over upon server failure. The task is normally started by transferring a floating IP address from serverA to serverB with the following sequences: ServerA: 1. Tear down the IP address 2. Unexport the path 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files 4. If unmount required, write path name to /proc/fs/nfsd/unlock_filesystem, then unmount. 5. Signal peer to begin take-over. Another patch (005-nlm_set_grace.patch) that allows peer server to resume the file serving is described in 005.txt. How does it work ----------------------- Assume ServerA exports two virtual IP addresses (10.1.1.1 and 10.1.1.2) - each serves its own set of exported filesystem(s). If root user decides to move 10.1.1.2 from ServerA to ServerB for whatever reason, the associated NLM locks must be dropped from ServerA (and re-acquired from ServerB). Triggered by "echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip", the logic walks thru lockd's global "nlm_files" list to check its lock entry. If the lock is originally requested from in-bound 10.1.1.2, a match is found - the file is subsequently unlock by the existing nlm_traverse_locks() logic. What are the issues ------------------------- A root user of ServerA can advertise to NFS clients saying "if you want to access my filesystemA, please connect to 10.1.1.2". In reality, there is no easy way to dis-allow client from mounting filesystemA from another interface (10.1.1.1). When this happens, the unlocking logic is not able to identify the match. It is normally not a problem for cluster filesystems since locks are recognized across the cluster. However, for local filesystems such as ext3, it is required to umount filesystemA from serverA (before serverB mounts the filesystem) for IP take over to work. With locks staying in nlm_files list, umount would fail that ends up aborting the lock migration process. To remedy the issue, we've added an additional /proc/fs/nfsd/unlock_filesystem interface that takes the exported path name and converts it into a filesystem identifier (in our case, we use vfsmount structure to identify the filesystem). The unlocking logic then walks thru the nlm_files list to search for match. If vfsmount matches, the file is subsequently unlocked. Implementation Notes ---------------------------- We had discussed the possibility of using export path names as the main interface as in [3]. Unfortunately, the prototype code shows issues. One major problem is current nlm_files list doesn't know about NFSD export entries. To do the matching for lock purging logic, non-trivial amount of export info and logic are required to be added into lockd code. Other than mutting the interface between nfsd and lockd, the add-on lock latency is also a concern. As the result, we bring back IP interface but leave the path name approach as a secondary interface to remedy local filesystem's umount issue. Users should not expect the path name approach will work if they only want to removes locks associated one particular exported subdirectory (since we will drop *all* locks associated with the subject vfsmount). Acknowledgment goes to current linux nfsd maintainer, Neil Brown, who has been offered support and guidance during our prototype efforts.