Bug 13540 - deadlock with ctdb_mutex_ceph_rados_helper
Summary: deadlock with ctdb_mutex_ceph_rados_helper
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Reported: 2018-07-20 11:03 UTC by David Disseldorp
Modified: 2018-08-24 09:54 UTC (History)
3 users (show)

See Also:

clean cherry-pick for 4.7.next (23.57 KB, patch)
2018-08-13 21:36 UTC, David Disseldorp
amitay: review+
clean cherry pick for 4.8.next (23.57 KB, patch)
2018-08-13 21:37 UTC, David Disseldorp
amitay: review+
clean cherry-pick for 4.9.next (23.57 KB, patch)
2018-08-13 21:37 UTC, David Disseldorp
amitay: review+

Note You need to log in before you can comment on or make changes to this bug.
Description David Disseldorp 2018-07-20 11:03:29 UTC
ctdb_mutex_ceph_rados_helper currently requests a lock with indefinite duration, which results in CTDB deadlock during failover if the recovery master dies unexpectedly, as subsequently elected recovery master nodes can't obtain the recovery lock.

Ceph's rados_lock_exclusive() API supports expiry and renewal, which can be used to ensure that the lock is released on hard-failure. Patch to follow...
Comment 1 David Disseldorp 2018-07-31 10:53:27 UTC
Samuel has reviewed the patchset (which resulted in v3), so I'm now just waiting on second team review prior to pushing:
Comment 2 David Disseldorp 2018-08-13 21:36:45 UTC
Created attachment 14413 [details]
clean cherry-pick for 4.7.next
Comment 3 David Disseldorp 2018-08-13 21:37:24 UTC
Created attachment 14414 [details]
clean cherry pick for 4.8.next
Comment 4 David Disseldorp 2018-08-13 21:37:52 UTC
Created attachment 14415 [details]
clean cherry-pick for 4.9.next
Comment 5 Amitay Isaacs 2018-08-14 03:49:25 UTC
Hi Karolin,

This is ready for v4-7, v4-8 and v4-9.
Comment 6 Karolin Seeger 2018-08-14 11:14:45 UTC
(In reply to Amitay Isaacs from comment #5)
Pushed to autobuild-v4-{9,8,7}-test.
Comment 7 Karolin Seeger 2018-08-24 09:54:39 UTC
(In reply to Karolin Seeger from comment #6)
Pushed to all branches.
Closing out bug report.