Bug 8037 - NFS directory checking broken
Summary: NFS directory checking broken
Status: RESOLVED FIXED
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-24 12:18 UTC by Jonathan Buzzard
Modified: 2016-08-10 06:28 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Buzzard 2011-03-24 12:18:37 UTC
I have just deployed a new CTDB cluster based on RHEL 5.6 (CTDB version 1.0.112) using the vendor provided packages, but with the Samba RPM's recompiled from source to add the GPFS and TSM VFS modules with a couple of patches to the GPFS module (ftruncate and ntimes).

However I have hit a bug in the CTDB NFS monitoring. Basically I was getting lots of 

   ERROR: nfs directory "" not available

errors in the CTDB log. This causes the node to go unhealthy and then we get thrashing of the IP's between the nodes, making it unusable.

After some tracking down I determined these where been generated by ctdb_check_directories in the functions file. Some more checking determined that this was being called by the 60.nfs event script.

The problem as I see it is the following section of code

	# and that its directories are available
	[ "$CTDB_NFS_SKIP_SHARE_CHECK" = "yes" ] || {
	    exportfs | grep -v '^#' | grep '^/' |
	    sed -e 's/[[:space:]]*[^[:space:]]*$//' |
	    ctdb_check_directories

The problem with this is the sed command is broken if the result from exportfs is wrapped, for example the following is a snippet from my problem cluster

/gpfs/csb_lab   10.32.0.0/22
/cluster/gjb_lab
                10.0.3.6/31

So for example the first entry is fine, but the second entry gets converted by the greps to have just the exported directory and then the sed turns it into a blank line. Checking with the older versions of the ctdb packages from RHEL5.5 (1.0.82) what I see is 

        # and that its directories are available
        [ "$CTDB_NFS_SKIP_SHARE_CHECK" = "yes" ] || {
            nfs_dirs=$(exportfs | grep -v '^#' | grep '^/' | awk {'print $1;'})
            ctdb_check_directories "nfs" $nfs_dirs
        }

The use of awk in the older version does not cause the problems the use of sed does. I have modified my 60.nfs to revert to using awk and it all works just fine now. Some digging around in gitweb suggests that the change was introduced by Martin Schwenke as a fix for directory names containing spaces. While laudable it has introduced a regression.

As an additional point I have also chosen to fed the list of directories to check through uniq as we have the same directory exported several times in /etc/exports for example

/gpfs/rb_lab 10.41.0.0/22(rw,no_subtree_check,insecure,sync,fsid=745)
/gpfs/rb_lab 10.31.0.0/22(rw,no_subtree_check,insecure,sync,fsid=745)

Which results in checking the same directory more than once which is pointless.
Comment 1 Luk Claes (dead mail address) 2011-05-13 21:53:53 UTC
This was already fixed in commit da5fc07b fixing bug #7152 which was included in 1.0.114, so I guess this can be closed?
Comment 2 Martin Schwenke 2016-08-10 06:28:06 UTC
I think I caused a regression here and fixed it many years ago in commit bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6.

Fixed in all currently supported versions.