There are 2 similar but opposite issues. The first scenario involves stopping a node (i.e. use of "ctdb stop"). If the recovery daemon causes a recovery before the local daemon has been updated to mark the remote node as inactive, then the inactive node can be included in the recovery, which clears the invalid records flag. When the node is continued, deleted records may be resurrected. This can be fixed by confirming the flags of remote nodes and dropping inactive nodes from recovery. The second scenario involves a node becoming active between the recovery daemon creating missing databases and the recovery helper freezing databases (after it has independently fetched the state of nodes). In this case the recovery helper may attempt to freeze a database that was not created. The solution is to create missing databases from the recovery helper, using the same set of active nodes that are used for the rest of recovery.
Created attachment 15873 [details] Patch for 4.12
Fix cherry-picks cleanly into v4-12-test
Created attachment 15874 [details] Patch for 4.11 Fix cherry-picks cleanly into v4-11-test but does not build because the fix removes old client functions that are still used in v4-11-test. Additionally, the fix changes the behaviour of one of the client functions ctdb_ctrl_createdb() in a way that is incompatible with the usage in v4-11-test. The incompatibility exists because master (and v4-12-test) has some vacuuming improvements and some additional changes made to facilitate testing. Due to the presence of the tests, the vacuuming code is finally tested. So, rather than backporting this fix and potentially making a mistake, will apply it on top of the vacuuming changes (and one other cosmetic typo fix). This gives us a well tested and reviewed fix on top of some other previously well tested and reviewed changes. Unfortunately the vacuuming tests themselves can not be pulled into v4-11-test because there were too many test infrastructure changes made before the vacuuming tests were written.
(In reply to Martin Schwenke from comment #3) Are all the patches clean cherry-picks? I thought you had to add a patch for compatibility. Or was that the other approach?
(In reply to Amitay Isaacs from comment #4) Completely clean. Adding a patch for compatibility (and resolving the resulting conflicts and dropping the patches that remove the client functions) is only needed if we decide not to do this on top of the vacuuming patches.
Hi Karolin, This is ready for v4-11 and v4-12. Thanks.
(In reply to Amitay Isaacs from comment #6) Pushed to autobuild-v4-{12,11}-test.
(In reply to Karolin Seeger from comment #7) Pushed to both branches. Closing out bug report. Thanks!