ctdb_mutex_ceph_rados_helper currently requests a lock with indefinite duration, which results in CTDB deadlock during failover if the recovery master dies unexpectedly, as subsequently elected recovery master nodes can't obtain the recovery lock. Ceph's rados_lock_exclusive() API supports expiry and renewal, which can be used to ensure that the lock is released on hard-failure. Patch to follow...
Samuel has reviewed the patchset (which resulted in v3), so I'm now just waiting on second team review prior to pushing: https://lists.samba.org/archive/samba-technical/2018-July/129337.html
Created attachment 14413 [details] clean cherry-pick for 4.7.next
Created attachment 14414 [details] clean cherry pick for 4.8.next
Created attachment 14415 [details] clean cherry-pick for 4.9.next
Hi Karolin, This is ready for v4-7, v4-8 and v4-9.
(In reply to Amitay Isaacs from comment #5) Pushed to autobuild-v4-{9,8,7}-test.
(In reply to Karolin Seeger from comment #6) Pushed to all branches. Closing out bug report. Thanks!