9268 – Make tdb robust against improper CLEAR_IF_FIRST restart

Bug 9268 - Make tdb robust against improper CLEAR_IF_FIRST restart

Summary: Make tdb robust against improper CLEAR_IF_FIRST restart

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.0
Classification:	Unclassified
Component:	Clustering (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-10-08 18:38 UTC by Jeremy Allison
Modified:	2012-10-09 07:24 UTC (History)
CC List:	0 users

See Also:

Attachments
git-am fix for 4.0.0rc3. (10.20 KB, patch) 2012-10-08 18:39 UTC, Jeremy Allison	no flags	Details
git-am for 3.6.next. (8.26 KB, patch) 2012-10-08 19:26 UTC, Jeremy Allison	vl: review+	Details
Correct fix for 4.0.0rc3. (9.28 KB, patch) 2012-10-08 20:28 UTC, Jeremy Allison	vl: review+	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jeremy Allison 2012-10-08 18:38:47 UTC

From Volker's patch:

When winbind is restarted, there is a potential crash in tdb. Following
situation: We are in a cluster with ctdb. A winbind child hangs
in a request to the DC. Cluster monitoring decides the node has a
problem. Cluster monitoring decides to kill ctdbd. winbind child
still hangs in a RPC request. winbind parent figures that ctdb is
dead and immediately commits suicide. winbind parent is restarted by
cluster management, overwriting gencache.tdb with CLEAR_IF_FIRST. The
CLEAR_IF_FIRST logic as implemented now will not see that a child still
has the tdb open, only the parent holds the ACTIVE_LOCK due to performance
reasons. During the CLEAR_IF_FIRST logic is done, there is a very small
window where we ftruncate(tfd, 0) the file and re-write a proper header
without a lock. When during this small window the winbind child comes
back, wanting to store something into gencache.tdb, that winbind child
will crash with a SIGBUS.

Comment 1 Jeremy Allison 2012-10-08 18:39:24 UTC

Created attachment 8009 [details]
git-am fix for 4.0.0rc3.

Same fix that went into master.
Jeremy.

Comment 2 Jeremy Allison 2012-10-08 19:26:07 UTC

Created attachment 8010 [details]
git-am for 3.6.next.

Back-port from master.
Jeremy.

Comment 3 Volker Lendecke 2012-10-08 20:11:57 UTC

Comment on attachment 8009 [details]
git-am fix for 4.0.0rc3.

Are you sure you want the last patch in the patchset under this bug report?

Comment 4 Jeremy Allison 2012-10-08 20:27:05 UTC

Comment on attachment 8009 [details]
git-am fix for 4.0.0rc3.

Oh I missed that - will resubmit.

Comment 5 Jeremy Allison 2012-10-08 20:28:24 UTC

Created attachment 8011 [details]
Correct fix for 4.0.0rc3.

Now without extraneous patch :-).

Comment 6 Karolin Seeger 2012-10-09 07:24:40 UTC

Pushed to v3-6-test and autobuild-v4-0-test.
Closing out bug report.

Thanks!