Bug 7616 - ctdb_freeze_lock references uninitialized parent pid
Summary: ctdb_freeze_lock references uninitialized parent pid
Status: RESOLVED FIXED
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: 1.0.71
Hardware: Other All
: P3 normal
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-12 14:29 UTC by Jeff Butler
Modified: 2016-08-13 10:01 UTC (History)
1 user (show)

See Also:


Attachments
Proposed one-line fix (394 bytes, patch)
2010-08-12 14:30 UTC, Jeff Butler
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Butler 2010-08-12 14:29:13 UTC
In the following scenario, you can end up with a ctdb_freeze_lock child process hanging out and holding the databases locked.

- ctdb starts up and does a ctdb_blocking_freeze() which spawns a freeze lock child
- ctdbd crashes (for whatever reason, in my case it was a bug...)
- the freeze lock child goes into a loop waiting for the parent to exit

The problem is that this is so early that ctdb->ctdbd_pid is not set.  This causes the child to just sit in the wait loop forever (ctdb_freeze.c:216):


	        while (1) {
                        sleep(1);
                        if (kill(ctdb->ctdbd_pid, 0) != 0) {
		                DEBUG(DEBUG_ERR,("Parent died. Exiting lock wait child\n"));

                                _exit(0);
			}
 	        }

The fix is just initialize the ctdbd_pid variable earlier in daemon startup (patch to be attached...)

This issues is rare in that it only happens in a small startup window, but when it does happen, manual intervention is required to kill off the child procs holding the locks.
Comment 1 Jeff Butler 2010-08-12 14:30:18 UTC
Created attachment 5899 [details]
Proposed one-line fix
Comment 2 Luk Claes (dead mail address) 2011-05-13 17:43:54 UTC
Fixed in commit 3da1e2 on Tue Dec 28 13:14:23 2010 +0100
Comment 3 Martin Schwenke 2016-08-13 10:01:44 UTC
As per previous comment, fixed a long time ago...