Issue: Permission denied when trying to open/write recovery lock file when CTDB is running as service. Running CTDBD interactively 'ctdbd -i' works fine, and the recovery lock is written. Background: Testing two VM nodes running Fedora 30 CTDB recovery lock file is stored on a Gluster volume, mounted via FUSE with the following mount options: defaults,direct-io-mode=enable,_netdev We're running: Gluster 6.3 Samba 4.10.5-1 (testing for another bug) We have the same setup on Centos 7.6 which doesn't have this issue running: Sernet-samba 4.9.10 Gluster 6.1 Service Log: 2019/07/03 14:54:03.436139 ctdbd[2510]: CTDB starting on node 2019/07/03 14:54:03.441948 ctdbd[2511]: Starting CTDBD (Version 4.10.5) as PID: 2511 2019/07/03 14:54:03.442042 ctdbd[2511]: Created PID file /var/run/ctdb/ctdbd.pid 2019/07/03 14:54:03.442072 ctdbd[2511]: Removed stale socket /var/run/ctdb/ctdbd.socket 2019/07/03 14:54:03.442101 ctdbd[2511]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket 2019/07/03 14:54:03.442113 ctdbd[2511]: Set real-time scheduler priority 2019/07/03 14:54:03.442233 ctdbd[2511]: Starting event daemon /usr/libexec/ctdb/ctdb-eventd -P 2511 -S 15 2019/07/03 14:54:03.448517 ctdbd[2511]: Set runstate to INIT (1) 2019/07/03 14:54:03.449861 ctdbd[2511]: PNN is 0 2019/07/03 14:54:03.458497 ctdbd[2511]: Loaded public addresses from /etc/ctdb/public_addresses 2019/07/03 14:54:03.458617 ctdbd[2511]: Set runstate to SETUP (2) 2019/07/03 14:54:03.461121 ctdbd[2511]: Keepalive monitoring has been started 2019/07/03 14:54:03.461164 ctdbd[2511]: Set runstate to FIRST_RECOVERY (3) 2019/07/03 14:54:03.461281 ctdb-recoverd[2519]: monitor_cluster starting 2019/07/03 14:54:03.461996 ctdb-recoverd[2519]: Initial recovery master set - forcing election 2019/07/03 14:54:03.462081 ctdbd[2511]: This node (0) is now the recovery master 2019/07/03 14:54:04.461270 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED 2019/07/03 14:54:05.461550 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED 2019/07/03 14:54:06.462077 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED 2019/07/03 14:54:06.465169 ctdb-recoverd[2519]: Election period ended 2019/07/03 14:54:06.465601 ctdb-recoverd[2519]: Node:0 was in recovery mode. Start recovery process 2019/07/03 14:54:06.465623 ctdb-recoverd[2519]: ../../ctdb/server/ctdb_recoverd.c:1342 Starting do_recovery 2019/07/03 14:54:06.465630 ctdb-recoverd[2519]: Attempting to take recovery lock (/mnt/ctdbv01/ctdb/lockfile) 2019/07/03 14:54:06.465696 ctdb-recoverd[2519]: Set cluster mutex helper to "/usr/libexec/ctdb/ctdb_mutex_fcntl_helper" 2019/07/03 14:54:06.473953 ctdbd[2511]: /usr/libexec/ctdb/ctdb_mutex_fcntl_helper: Unable to open /mnt/ctdbv01/ctdb/lockfile - (Permission denied) 2019/07/03 14:54:06.474281 ctdb-recoverd[2519]: Unable to take recover lock - unknown error 2019/07/03 14:54:06.474304 ctdb-recoverd[2519]: Banning this node 2019/07/03 14:54:06.474311 ctdb-recoverd[2519]: Banning node 0 for 300 seconds 2019/07/03 14:54:06.474350 ctdbd[2511]: Banning this node for 300 seconds 2019/07/03 14:54:06.474366 ctdbd[2511]: This node has been banned - releasing all public IPs and setting the generation to INVALID. 2019/07/03 14:54:06.474411 ctdbd[2511]: ../../ctdb/server/ctdb_takeover.c:1651 Released 0 public IPs 2019/07/03 14:54:06.474498 ctdb-recoverd[2519]: Unable to take recovery lock 2019/07/03 14:54:06.474517 ctdb-recoverd[2519]: Retrying recovery 2019/07/03 14:54:07.462571 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED 2019/07/03 14:54:07.467017 ctdbd[2511]: Freeze all 2019/07/03 14:54:07.467041 ctdbd[2511]: Freeze all: frozen 2019/07/03 14:54:08.463171 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED 2019/07/03 14:54:09.463641 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED Interactive stdout: ctdbd[2592]: CTDB starting on node ctdbd[2592]: Starting CTDBD (Version 4.10.5) as PID: 2592 ctdbd[2592]: Created PID file /var/run/ctdb/ctdbd.pid ctdbd[2592]: Removed stale socket /var/run/ctdb/ctdbd.socket ctdbd[2592]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket ctdbd[2592]: Set real-time scheduler priority ctdbd[2592]: Starting event daemon /usr/libexec/ctdb/ctdb-eventd -P 2592 -S 12 ctdb-eventd[2593]: daemon started, pid=2593 ctdb-eventd[2593]: startup completed successfully ctdb-eventd[2593]: listening on /var/run/ctdb/eventd.socket ctdbd[2592]: Set runstate to INIT (1) ctdbd[2592]: PNN is 0 ctdbd[2592]: Loaded public addresses from /etc/ctdb/public_addresses ctdbd[2592]: Set runstate to SETUP (2) ctdbd[2592]: Keepalive monitoring has been started ctdbd[2592]: Set runstate to FIRST_RECOVERY (3) ctdb-recoverd[2599]: monitor_cluster starting ctdb-recoverd[2599]: Initial recovery master set - forcing election ctdbd[2592]: This node (0) is now the recovery master ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED ctdb-recoverd[2599]: Election period ended ctdb-recoverd[2599]: Node:0 was in recovery mode. Start recovery process ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1342 Starting do_recovery ctdb-recoverd[2599]: Attempting to take recovery lock (/mnt/ctdbv01/ctdb/lockfile) ctdb-recoverd[2599]: Set cluster mutex helper to "/usr/libexec/ctdb/ctdb_mutex_fcntl_helper" ctdb-recoverd[2599]: Recovery lock taken successfully ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1417 Recovery initiated due to problem with node 0 ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1442 Recovery - created remote databases ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1471 Recovery - updated flags ctdb-recoverd[2599]: Set recovery_helper to "/usr/libexec/ctdb/ctdb_recovery_helper" ctdb-recovery[2602]: Set recovery mode to ACTIVE ctdbd[2592]: Recovery has started ctdb-recovery[2602]: start_recovery event finished ctdb-recovery[2602]: updated VNNMAP ctdb-recovery[2602]: 0 of 0 databases recovered ctdbd[2592]: Recovery mode set to NORMAL ctdbd[2592]: Set cluster mutex helper to "/usr/libexec/ctdb/ctdb_mutex_fcntl_helper" ctdb-recovery[2602]: Set recovery mode to NORMAL ctdbd[2592]: Recovery has finished ctdbd[2592]: Set runstate to STARTUP (4) ctdb-recovery[2602]: recovered event finished ctdb-recoverd[2599]: Takeover run starting ctdb-recoverd[2599]: Set takeover_helper to "/usr/libexec/ctdb/ctdb_takeover_helper" ctdb-takeover[2604]: No nodes available to host public IPs yet ctdb-recoverd[2599]: Takeover run completed successfully ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1489 Recovery complete ctdb-recoverd[2599]: Resetting ban count to 0 for all nodes ctdb-recoverd[2599]: Just finished a recovery. New recoveries will now be suppressed for the rerecovery timeout (10 seconds) ctdb-recoverd[2599]: Disabling recoveries for 10 seconds ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED ctdbd[2592]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second. ctdb-recoverd[2599]: Initial interface fetched ctdb-recoverd[2599]: Trigger takeoverrun ctdb-recoverd[2599]: Takeover run starting ctdb-takeover[2605]: No nodes available to host public IPs yet ctdb-recoverd[2599]: Takeover run completed successfully
Appologies, We found this to be selinux preventing access to the filesystem.