I have just deployed a new CTDB cluster based on RHEL 5.6 (CTDB version 1.0.112) using the vendor provided packages, but with the Samba RPM's recompiled from source to add the GPFS and TSM VFS modules with a couple of patches to the GPFS module (ftruncate and ntimes). However I have hit a bug in the CTDB NFS monitoring. Basically I was getting lots of ERROR: nfs directory "" not available errors in the CTDB log. This causes the node to go unhealthy and then we get thrashing of the IP's between the nodes, making it unusable. After some tracking down I determined these where been generated by ctdb_check_directories in the functions file. Some more checking determined that this was being called by the 60.nfs event script. The problem as I see it is the following section of code # and that its directories are available [ "$CTDB_NFS_SKIP_SHARE_CHECK" = "yes" ] || { exportfs | grep -v '^#' | grep '^/' | sed -e 's/[[:space:]]*[^[:space:]]*$//' | ctdb_check_directories The problem with this is the sed command is broken if the result from exportfs is wrapped, for example the following is a snippet from my problem cluster /gpfs/csb_lab 10.32.0.0/22 /cluster/gjb_lab 10.0.3.6/31 So for example the first entry is fine, but the second entry gets converted by the greps to have just the exported directory and then the sed turns it into a blank line. Checking with the older versions of the ctdb packages from RHEL5.5 (1.0.82) what I see is # and that its directories are available [ "$CTDB_NFS_SKIP_SHARE_CHECK" = "yes" ] || { nfs_dirs=$(exportfs | grep -v '^#' | grep '^/' | awk {'print $1;'}) ctdb_check_directories "nfs" $nfs_dirs } The use of awk in the older version does not cause the problems the use of sed does. I have modified my 60.nfs to revert to using awk and it all works just fine now. Some digging around in gitweb suggests that the change was introduced by Martin Schwenke as a fix for directory names containing spaces. While laudable it has introduced a regression. As an additional point I have also chosen to fed the list of directories to check through uniq as we have the same directory exported several times in /etc/exports for example /gpfs/rb_lab 10.41.0.0/22(rw,no_subtree_check,insecure,sync,fsid=745) /gpfs/rb_lab 10.31.0.0/22(rw,no_subtree_check,insecure,sync,fsid=745) Which results in checking the same directory more than once which is pointless.
This was already fixed in commit da5fc07b fixing bug #7152 which was included in 1.0.114, so I guess this can be closed?
I think I caused a regression here and fixed it many years ago in commit bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6. Fixed in all currently supported versions.