Bug 12005 - parse_share_modes() chokes on ctdb tombstone record from ltdb
Summary: parse_share_modes() chokes on ctdb tombstone record from ltdb
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Clustering (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 10008
  Show dependency treegraph
 
Reported: 2016-07-02 07:25 UTC by Ralph Böhme
Modified: 2021-01-18 11:24 UTC (History)
2 users (show)

See Also:


Attachments
Level 10 Log (601.10 KB, text/plain)
2016-07-02 07:25 UTC, Ralph Böhme
no flags Details
Possible patch for master (4.30 KB, patch)
2016-07-02 07:27 UTC, Ralph Böhme
no flags Details
Full SBT (17.06 KB, text/plain)
2016-07-02 07:58 UTC, Ralph Böhme
no flags Details
Patch for 4.5 cherry-picked from master (8.28 KB, patch)
2016-08-09 08:00 UTC, Ralph Böhme
obnox: review+
Details
Patch for 4.3 and 4.4 backported from master (8.23 KB, patch)
2016-08-09 08:02 UTC, Ralph Böhme
obnox: review+
slow: review? (metze)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ralph Böhme 2016-07-02 07:25:20 UTC
Created attachment 12253 [details]
Level 10 Log

On a Samba 4.4.4 ctdb cluster the following messages where (repeatedly) found in the log:

[2016/06/30 08:45:02.645180,  1] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 8 (../librpc/ndr/ndr_basic.c:236)
[2016/06/30 08:45:02.645213,  1] ../source3/locking/share_mode_lock.c:312(parse_share_modes)
  ndr_pull_share_mode_lock failed: Buffer Size Error

Looking closer this is coming from ctdb tombstone records in the node local ltdb.

I was able to reproduce this yesterday on my lxc based devel cluster, for some reason however since doing a git bisect I can't anymore.

I see two ways to address this:
o add yet another check for tombstone records to a dbwrap caller, or
o fix dbwrap

In dbwrap we already check for tombstone records in at least two places:
o when fetching records from remote ctdb nodes via ctdbd_parse()
o in db_ctdb_traverse()

Just db_ctdb_ltdb_parser() lacks a check for tombstone records.
Comment 1 Ralph Böhme 2016-07-02 07:27:38 UTC
Created attachment 12254 [details]
Possible patch for master
Comment 2 Ralph Böhme 2016-07-02 07:57:52 UTC
One note: the crash at the end of the log stems from adding a smb_panic() for debugging.
Comment 3 Ralph Böhme 2016-07-02 07:58:46 UTC
Created attachment 12255 [details]
Full SBT
Comment 4 Ralph Böhme 2016-07-02 17:42:20 UTC
Oh, and I can reproduce it, I was just looking at the wrong place. Looks like this is present since a looooong time, still bisecting. Wonder why this wasn't noticed before...
Comment 5 Ralph Böhme 2016-07-04 11:36:23 UTC
Oh, I missed this: the patch from bug 10008 already had this fix in, 1cae59ce112ccb51b45357a52b902f80fce1eef1, but it had to be reverted in 925625b52886d40b50fc631bad8bdc81970f7598.

I *think* my patch might be a proper fix without the risk of a deadlock, because it *won't* call out to ctdb but return ENOENT (im terms of NTSTATUS).
Comment 6 Ralph Böhme 2016-08-09 08:00:36 UTC
Created attachment 12345 [details]
Patch for 4.5 cherry-picked from master
Comment 7 Ralph Böhme 2016-08-09 08:02:14 UTC
Created attachment 12346 [details]
Patch for 4.3 and 4.4 backported from master
Comment 8 Ralph Böhme 2016-08-09 08:03:28 UTC
Comment on attachment 12254 [details]
Possible patch for master

>From c26c3c354db42a688551d46d8583352a5c7c9ad7 Mon Sep 17 00:00:00 2001
>From: Ralph Boehme <slow@samba.org>
>Date: Fri, 1 Jul 2016 17:45:53 +0200
>Subject: [PATCH 1/2] WIP: dbwrap_ctdb: move struct db_ctdb_parse_record_state
>
>Will be needed in the next commit. No change in behavour.
>
>Signed-off-by: Ralph Boehme <slow@samba.org>
>---
> source3/lib/dbwrap/dbwrap_ctdb.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
>diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
>index df5a34f..c557902 100644
>--- a/source3/lib/dbwrap/dbwrap_ctdb.c
>+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
>@@ -99,6 +99,14 @@ static int db_ctdb_ltdb_parser(TDB_DATA key, TDB_DATA data,
> 	return 0;
> }
> 
>+struct db_ctdb_parse_record_state {
>+	void (*parser)(TDB_DATA key, TDB_DATA data, void *private_data);
>+	void *private_data;
>+	uint32_t my_vnn;
>+	bool ask_for_readonly_copy;
>+	bool done;
>+};
>+
> static NTSTATUS db_ctdb_ltdb_parse(
> 	struct db_ctdb_ctx *db, TDB_DATA key,
> 	void (*parser)(TDB_DATA key, struct ctdb_ltdb_header *header,
>@@ -1215,14 +1223,6 @@ static struct db_record *db_ctdb_try_fetch_locked(struct db_context *db,
> 	return fetch_locked_internal(ctx, mem_ctx, key, true);
> }
> 
>-struct db_ctdb_parse_record_state {
>-	void (*parser)(TDB_DATA key, TDB_DATA data, void *private_data);
>-	void *private_data;
>-	uint32_t my_vnn;
>-	bool ask_for_readonly_copy;
>-	bool done;
>-};
>-
> static void db_ctdb_parse_record_parser(
> 	TDB_DATA key, struct ctdb_ltdb_header *header,
> 	TDB_DATA data, void *private_data)
>-- 
>2.5.0
>
>
>From 5ddd6e537beffe3fef855d81a0c342a8feda235b Mon Sep 17 00:00:00 2001
>From: Ralph Boehme <slow@samba.org>
>Date: Fri, 1 Jul 2016 17:21:04 +0200
>Subject: [PATCH 2/2] WIP: dbwrap_ctdb: check for ltdb tombstone records
>
>When fetching records from remote ctdb nodes via ctdbd_parse() or in
>db_ctdb_traverse(), we already checks for tombstone records and skip
>them. Add the same check to db_ctdb_ltdb_parser() when fetching records
>from the ltbd.
>
>Signed-off-by: Ralph Boehme <slow@samba.org>
>---
> source3/lib/dbwrap/dbwrap_ctdb.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
>diff --git a/source3/lib/dbwrap/dbwrap_ctdb.c b/source3/lib/dbwrap/dbwrap_ctdb.c
>index c557902..b3097b8 100644
>--- a/source3/lib/dbwrap/dbwrap_ctdb.c
>+++ b/source3/lib/dbwrap/dbwrap_ctdb.c
>@@ -79,6 +79,7 @@ struct db_ctdb_ltdb_parse_state {
> 	void (*parser)(TDB_DATA key, struct ctdb_ltdb_header *header,
> 		       TDB_DATA data, void *private_data);
> 	void *private_data;
>+	bool tombstone;
> };
> 
> static int db_ctdb_ltdb_parser(TDB_DATA key, TDB_DATA data,
>@@ -91,6 +92,14 @@ static int db_ctdb_ltdb_parser(TDB_DATA key, TDB_DATA data,
> 		return -1;
> 	}
> 
>+	if (data.dsize == sizeof(struct ctdb_ltdb_header)) {
>+		/*
>+		 * record tombstone
>+		 */
>+		state->tombstone = true;
>+		return -1;
>+	}
>+
> 	state->parser(
> 		key, (struct ctdb_ltdb_header *)data.dptr,
> 		make_tdb_data(data.dptr + sizeof(struct ctdb_ltdb_header),
>@@ -105,6 +114,7 @@ struct db_ctdb_parse_record_state {
> 	uint32_t my_vnn;
> 	bool ask_for_readonly_copy;
> 	bool done;
>+	bool tombstone;
> };
> 
> static NTSTATUS db_ctdb_ltdb_parse(
>@@ -113,6 +123,8 @@ static NTSTATUS db_ctdb_ltdb_parse(
> 		       TDB_DATA data, void *private_data),
> 	void *private_data)
> {
>+	struct db_ctdb_parse_record_state *rec_state =
>+		(struct db_ctdb_parse_record_state *)private_data;
> 	struct db_ctdb_ltdb_parse_state state;
> 	int ret;
> 
>@@ -122,6 +134,9 @@ static NTSTATUS db_ctdb_ltdb_parse(
> 	ret = tdb_parse_record(db->wtdb->tdb, key, db_ctdb_ltdb_parser,
> 			       &state);
> 	if (ret == -1) {
>+		if (state.tombstone) {
>+			rec_state->tombstone = true;
>+		}
> 		return NT_STATUS_NOT_FOUND;
> 	}
> 	return NT_STATUS_OK;
>@@ -1267,6 +1282,7 @@ static NTSTATUS db_ctdb_parse_record(struct db_context *db, TDB_DATA key,
> 	state.parser = parser;
> 	state.private_data = private_data;
> 	state.my_vnn = ctdbd_vnn(ctx->conn);
>+	state.tombstone = false;
> 
> 	if (ctx->transaction != NULL) {
> 		struct db_ctdb_transaction_handle *h = ctx->transaction;
>@@ -1301,6 +1317,12 @@ static NTSTATUS db_ctdb_parse_record(struct db_context *db, TDB_DATA key,
> 		return NT_STATUS_OK;
> 	}
> 
>+	if (NT_STATUS_EQUAL(status, NT_STATUS_NOT_FOUND)
>+	    && state.tombstone)
>+	{
>+		return NT_STATUS_NOT_FOUND;
>+	}
>+
> 	ret = ctdbd_parse(ctx->conn, ctx->db_id, key,
> 			  state.ask_for_readonly_copy, parser, private_data);
> 	if (ret != 0) {
>-- 
>2.5.0
>
Comment 9 Ralph Böhme 2016-08-09 08:04:06 UTC
Damn! :)
Comment 10 Ralph Böhme 2016-08-09 17:44:06 UTC
Reassigning to Karolin for inclusion in 4.3, 4.4 and 4.5. Thanks!
Comment 11 Karolin Seeger 2016-08-10 07:03:43 UTC
(In reply to Ralph Böhme from comment #10)
Pushed to autobuild-v4-[5|4|3]-test.
Comment 12 Karolin Seeger 2016-08-11 08:23:55 UTC
(In reply to Karolin Seeger from comment #11)
Pushed to all branches.
Closing out bug report.

Thanks!