Bug 13021 - GET_DB_SEQNUM control can cause ctdb to deadlock when databases are frozen
GET_DB_SEQNUM control can cause ctdb to deadlock when databases are frozen
Status: RESOLVED FIXED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB
4.6.7
All All
: P5 major
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-09-08 01:04 UTC by Amitay Isaacs
Modified: 2017-09-14 14:12 UTC (History)
3 users (show)

See Also:


Attachments
Patches for v4-6 (3.38 KB, patch)
2017-09-13 01:04 UTC, Amitay Isaacs
martins: review+
Details
Patches for v4-7 (3.38 KB, patch)
2017-09-13 01:04 UTC, Amitay Isaacs
martins: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Amitay Isaacs 2017-09-08 01:04:03 UTC
CTDB recovers persistent databases using the database from the node with the highest sequence number.  Recovery helper sends GET_DB_SEQNUM control during recovery to read the sequence number.  At that time the databases are frozen and CTDB daemon has started a tdb transaction on the database.  CTDB daemon can only read a record from tdb using tdb_fetch() only if the tdb transaction has started.  Otherwise, CTDB daemon will block in tdb_fetch() to get a lock on the record causing deadlock.

This problem can be re-created also by stopping a node and trying to get the sequence number for any database.
Comment 1 Amitay Isaacs 2017-09-13 01:04:03 UTC
Created attachment 13587 [details]
Patches for v4-6
Comment 2 Amitay Isaacs 2017-09-13 01:04:24 UTC
Created attachment 13588 [details]
Patches for v4-7
Comment 3 Martin Schwenke 2017-09-13 01:22:24 UTC
Hi Karolin,

This is ready for 4.6 and 4.7.

Thanks...
Comment 4 Karolin Seeger 2017-09-13 12:57:20 UTC
(In reply to Martin Schwenke from comment #3)
Pushed to autobuild-v4-{7,6}-test.
Comment 5 Karolin Seeger 2017-09-14 14:12:34 UTC
Pushed to both branches.
Closing out bug report.

Thanks!