14024 – Permission denied when trying to open recovery lock

Bug 14024 - Permission denied when trying to open recovery lock

Summary: Permission denied when trying to open recovery lock

Status:	RESOLVED INVALID

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	CTDB (show other bugs)
Version:	4.10.5
Hardware:	x64 Linux

Importance:	P5 major (vote)
Target Milestone:	---
Assignee:	Amitay Isaacs
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-07-03 15:02 UTC by Ryan Launchbury
Modified:	2019-07-03 15:22 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ryan Launchbury 2019-07-03 15:02:52 UTC

Issue:
Permission denied when trying to open/write recovery lock file when CTDB is running as service. Running CTDBD interactively 'ctdbd -i' works fine, and the recovery lock is written.

Background:
Testing two VM nodes running Fedora 30
CTDB recovery lock file is stored on a Gluster volume, mounted via FUSE with the following mount options:
defaults,direct-io-mode=enable,_netdev

We're running:
Gluster 6.3
Samba 4.10.5-1 (testing for another bug)

We have the same setup on Centos 7.6 which doesn't have this issue running:
Sernet-samba 4.9.10
Gluster 6.1

Service Log:
2019/07/03 14:54:03.436139 ctdbd[2510]: CTDB starting on node
2019/07/03 14:54:03.441948 ctdbd[2511]: Starting CTDBD (Version 4.10.5) as PID: 2511
2019/07/03 14:54:03.442042 ctdbd[2511]: Created PID file /var/run/ctdb/ctdbd.pid
2019/07/03 14:54:03.442072 ctdbd[2511]: Removed stale socket /var/run/ctdb/ctdbd.socket
2019/07/03 14:54:03.442101 ctdbd[2511]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket
2019/07/03 14:54:03.442113 ctdbd[2511]: Set real-time scheduler priority
2019/07/03 14:54:03.442233 ctdbd[2511]: Starting event daemon /usr/libexec/ctdb/ctdb-eventd -P 2511 -S 15
2019/07/03 14:54:03.448517 ctdbd[2511]: Set runstate to INIT (1)
2019/07/03 14:54:03.449861 ctdbd[2511]: PNN is 0
2019/07/03 14:54:03.458497 ctdbd[2511]: Loaded public addresses from /etc/ctdb/public_addresses
2019/07/03 14:54:03.458617 ctdbd[2511]: Set runstate to SETUP (2)
2019/07/03 14:54:03.461121 ctdbd[2511]: Keepalive monitoring has been started
2019/07/03 14:54:03.461164 ctdbd[2511]: Set runstate to FIRST_RECOVERY (3)
2019/07/03 14:54:03.461281 ctdb-recoverd[2519]: monitor_cluster starting
2019/07/03 14:54:03.461996 ctdb-recoverd[2519]: Initial recovery master set - forcing election
2019/07/03 14:54:03.462081 ctdbd[2511]: This node (0) is now the recovery master
2019/07/03 14:54:04.461270 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED
2019/07/03 14:54:05.461550 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED
2019/07/03 14:54:06.462077 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED
2019/07/03 14:54:06.465169 ctdb-recoverd[2519]: Election period ended
2019/07/03 14:54:06.465601 ctdb-recoverd[2519]: Node:0 was in recovery mode. Start recovery process
2019/07/03 14:54:06.465623 ctdb-recoverd[2519]: ../../ctdb/server/ctdb_recoverd.c:1342 Starting do_recovery
2019/07/03 14:54:06.465630 ctdb-recoverd[2519]: Attempting to take recovery lock (/mnt/ctdbv01/ctdb/lockfile)
2019/07/03 14:54:06.465696 ctdb-recoverd[2519]: Set cluster mutex helper to "/usr/libexec/ctdb/ctdb_mutex_fcntl_helper"
2019/07/03 14:54:06.473953 ctdbd[2511]: /usr/libexec/ctdb/ctdb_mutex_fcntl_helper: Unable to open /mnt/ctdbv01/ctdb/lockfile - (Permission denied)
2019/07/03 14:54:06.474281 ctdb-recoverd[2519]: Unable to take recover lock - unknown error
2019/07/03 14:54:06.474304 ctdb-recoverd[2519]: Banning this node
2019/07/03 14:54:06.474311 ctdb-recoverd[2519]: Banning node 0 for 300 seconds
2019/07/03 14:54:06.474350 ctdbd[2511]: Banning this node for 300 seconds
2019/07/03 14:54:06.474366 ctdbd[2511]: This node has been banned - releasing all public IPs and setting the generation to INVALID.
2019/07/03 14:54:06.474411 ctdbd[2511]: ../../ctdb/server/ctdb_takeover.c:1651 Released 0 public IPs
2019/07/03 14:54:06.474498 ctdb-recoverd[2519]: Unable to take recovery lock
2019/07/03 14:54:06.474517 ctdb-recoverd[2519]: Retrying recovery
2019/07/03 14:54:07.462571 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED
2019/07/03 14:54:07.467017 ctdbd[2511]: Freeze all
2019/07/03 14:54:07.467041 ctdbd[2511]: Freeze all: frozen
2019/07/03 14:54:08.463171 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED
2019/07/03 14:54:09.463641 ctdbd[2511]: CTDB_WAIT_UNTIL_RECOVERED

Interactive stdout:
ctdbd[2592]: CTDB starting on node
ctdbd[2592]: Starting CTDBD (Version 4.10.5) as PID: 2592
ctdbd[2592]: Created PID file /var/run/ctdb/ctdbd.pid
ctdbd[2592]: Removed stale socket /var/run/ctdb/ctdbd.socket
ctdbd[2592]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket
ctdbd[2592]: Set real-time scheduler priority
ctdbd[2592]: Starting event daemon /usr/libexec/ctdb/ctdb-eventd -P 2592 -S 12
ctdb-eventd[2593]: daemon started, pid=2593
ctdb-eventd[2593]: startup completed successfully
ctdb-eventd[2593]: listening on /var/run/ctdb/eventd.socket
ctdbd[2592]: Set runstate to INIT (1)
ctdbd[2592]: PNN is 0
ctdbd[2592]: Loaded public addresses from /etc/ctdb/public_addresses
ctdbd[2592]: Set runstate to SETUP (2)
ctdbd[2592]: Keepalive monitoring has been started
ctdbd[2592]: Set runstate to FIRST_RECOVERY (3)
ctdb-recoverd[2599]: monitor_cluster starting
ctdb-recoverd[2599]: Initial recovery master set - forcing election
ctdbd[2592]: This node (0) is now the recovery master
ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED
ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED
ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED
ctdb-recoverd[2599]: Election period ended
ctdb-recoverd[2599]: Node:0 was in recovery mode. Start recovery process
ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1342 Starting do_recovery
ctdb-recoverd[2599]: Attempting to take recovery lock (/mnt/ctdbv01/ctdb/lockfile)
ctdb-recoverd[2599]: Set cluster mutex helper to "/usr/libexec/ctdb/ctdb_mutex_fcntl_helper"
ctdb-recoverd[2599]: Recovery lock taken successfully
ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1417 Recovery initiated due to problem with node 0
ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1442 Recovery - created remote databases
ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1471 Recovery - updated flags
ctdb-recoverd[2599]: Set recovery_helper to "/usr/libexec/ctdb/ctdb_recovery_helper"
ctdb-recovery[2602]: Set recovery mode to ACTIVE
ctdbd[2592]: Recovery has started
ctdb-recovery[2602]: start_recovery event finished
ctdb-recovery[2602]: updated VNNMAP
ctdb-recovery[2602]: 0 of 0 databases recovered
ctdbd[2592]: Recovery mode set to NORMAL
ctdbd[2592]: Set cluster mutex helper to "/usr/libexec/ctdb/ctdb_mutex_fcntl_helper"
ctdb-recovery[2602]: Set recovery mode to NORMAL
ctdbd[2592]: Recovery has finished
ctdbd[2592]: Set runstate to STARTUP (4)
ctdb-recovery[2602]: recovered event finished
ctdb-recoverd[2599]: Takeover run starting
ctdb-recoverd[2599]: Set takeover_helper to "/usr/libexec/ctdb/ctdb_takeover_helper"
ctdb-takeover[2604]: No nodes available to host public IPs yet
ctdb-recoverd[2599]: Takeover run completed successfully
ctdb-recoverd[2599]: ../../ctdb/server/ctdb_recoverd.c:1489 Recovery complete
ctdb-recoverd[2599]: Resetting ban count to 0 for all nodes
ctdb-recoverd[2599]: Just finished a recovery. New recoveries will now be suppressed for the rerecovery timeout (10 seconds)
ctdb-recoverd[2599]: Disabling recoveries for 10 seconds
ctdbd[2592]: CTDB_WAIT_UNTIL_RECOVERED
ctdbd[2592]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
ctdb-recoverd[2599]: Initial interface fetched
ctdb-recoverd[2599]: Trigger takeoverrun
ctdb-recoverd[2599]: Takeover run starting
ctdb-takeover[2605]: No nodes available to host public IPs yet
ctdb-recoverd[2599]: Takeover run completed successfully

Comment 1 Ryan Launchbury 2019-07-03 15:22:24 UTC

Appologies,

We found this to be selinux preventing access to the filesystem.