In the following scenario, you can end up with a ctdb_freeze_lock child process hanging out and holding the databases locked. - ctdb starts up and does a ctdb_blocking_freeze() which spawns a freeze lock child - ctdbd crashes (for whatever reason, in my case it was a bug...) - the freeze lock child goes into a loop waiting for the parent to exit The problem is that this is so early that ctdb->ctdbd_pid is not set. This causes the child to just sit in the wait loop forever (ctdb_freeze.c:216): while (1) { sleep(1); if (kill(ctdb->ctdbd_pid, 0) != 0) { DEBUG(DEBUG_ERR,("Parent died. Exiting lock wait child\n")); _exit(0); } } The fix is just initialize the ctdbd_pid variable earlier in daemon startup (patch to be attached...) This issues is rare in that it only happens in a small startup window, but when it does happen, manual intervention is required to kill off the child procs holding the locks.
Created attachment 5899 [details] Proposed one-line fix
Fixed in commit 3da1e2 on Tue Dec 28 13:14:23 2010 +0100
As per previous comment, fixed a long time ago...