Bug 10008 - dbwrap_ctdb broken for deleted records in 4.1
Summary: dbwrap_ctdb broken for deleted records in 4.1
Status: RESOLVED LATER
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Clustering (show other bugs)
Version: unspecified
Hardware: All All
: P5 critical (vote)
Target Milestone: ---
Assignee: Volker Lendecke
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on: 12005
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-10 15:30 UTC by Christian Ambach
Modified: 2016-07-03 06:50 UTC (History)
4 users (show)

See Also:


Attachments
Patch (1.99 KB, patch)
2013-09-09 08:29 UTC, Volker Lendecke
ambi: review+
obnox: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Ambach 2013-07-10 15:30:08 UTC
One thing that is currently severly broken in master and in v4-1-test is running with CTDB.
dbwrap_ctdb and dbwrap_tdb have different semantics when it comes 
to deleted records that break code at multiple places and lead not only 
to ugly messages spammed into the logs but also to functional breakage.

e.g. dbwrap_exists() will report that a record that was deleted briefly before still exists. It will currently only go away once fast vacuuming has kicked in.

In the dbwrap_ctdb case, there is currently no way to distinguish between an empty and a deleted record as records to be deleted will be reduced to the CTDB header and then added to the fast deletion queue (and left for fast vacuuming to delete them finally).
dbwrap_tdb will delete the record immediately

We'll probably need a DELETED or INACTIVE flag in the CTDB ltdb header that dbwrap_ctdb can look at. This then also needs to be respected in the CTDB recovery, R/O copy and vacuuming code.
Comment 1 Karolin Seeger 2013-08-19 08:23:55 UTC
Any news on this one?
It's a blocker for 4.1.0...
Comment 2 Amitay Isaacs 2013-08-19 08:36:21 UTC
This requires significant development effort in CTDB and then corresponding changes in Samba.  I am not sure if I can find any time to fix this in next few weeks.

Since CTDB is not yet part of Samba tree, can this be pushed to 4.2?
Comment 3 Michael Adam 2013-08-19 09:16:59 UTC
Well, this behaviour has been around in CTDB since the beginning.
The design flaw is that that there is no distinction between
deleted and validly stored (active) empty records.
So this is not new. And it can (as Amitay has commented) not
be solved with a trivial patch: If we treat all empty records
either as existing or as deletetd, dbwrap_exists() either
procudes false positives or false negatives.

It can only be solved with introducing a new flag in the
CTDB record header (as Christian has proposed) that signals
that a record has been deleted (or is active). This is
actually the right thing to do (tm), and I have proposed such
changes 3 or 4 years ago already. But it was always rejected
as too invasive and not upgrade-compatible.
Indeed, the use of such a flag needs changes in both
ctdb code and samba code (dbwrap_ctdb) to be aligned to
work.

Up to Samba 3.6 this issue was not bad enough to trigger
a discussion at this level. But it seems that (with 4.0 already)
the usage of the dbwrap_exists() from Samba code has changed,
so that the known issue pops up much more prominently in the
logs.

So what to do?

- We can either fix it, which is as Amitay says something
  that needs some time, since it has to be done very
  carefully (even if the resulting patches would not be huge).
  (Applicable if this is mostly cosmetic.)
- We can raise the debug level of the nagging messages
  and fix this properly for 4.2.

Opinions? I'm undecided..
Comment 4 Karolin Seeger 2013-08-21 06:57:46 UTC
From my point of view, raising the debug level for 4.1.0 and fixing it properly for 4.2 sounds reasonable in this case.
Comment 5 Christian Ambach 2013-08-21 12:44:46 UTC
I do not think this is the correct way to go forward with for 4.1.
It will silently break code that uses dbwrap_exists().

How about treating empty records as deleted? I have some incomplete patches for that. We probably have less code that expects records to be possibly empty than we have code that can deal with deleted records to be reported as empty and therefore trying to parse their contents.
Comment 6 Karolin Seeger 2013-08-28 07:32:15 UTC
(In reply to comment #5)
> I do not think this is the correct way to go forward with for 4.1.
> It will silently break code that uses dbwrap_exists().
> 
> How about treating empty records as deleted? I have some incomplete patches for
> that. We probably have less code that expects records to be possibly empty than
> we have code that can deal with deleted records to be reported as empty and
> therefore trying to parse their contents.

We need decisions/patches soon.
The release branch will be frozen in one week.
It's still a blocker.
Comment 7 Amitay Isaacs 2013-08-29 05:00:36 UTC
Christian,  would Volker's patch to treat all empty records as non-existing records do for now?
Comment 8 Christian Ambach 2013-08-30 09:56:43 UTC
(In reply to comment #7)
> Christian,  would Volker's patch to treat all empty records as non-existing
> records do for now?

For 4.1, that should do it. But in the long run, we'll need to be able to distinguish properly between empty and deleted records.
Comment 9 Volker Lendecke 2013-09-09 08:29:18 UTC
Created attachment 9200 [details]
Patch
Comment 10 Michael Adam 2013-09-10 21:59:51 UTC
Comment on attachment 9200 [details]
Patch

ACK
Comment 11 Michael Adam 2013-09-10 22:00:43 UTC
==> Karolin for inclusion into 4.1
Comment 12 Karolin Seeger 2013-09-12 07:04:49 UTC
Pushed to autobuild-v4-1-test.
Comment 13 Karolin Seeger 2013-09-16 07:05:12 UTC
(In reply to comment #12)
> Pushed to autobuild-v4-1-test.

Pushed to v4-1-test.
Re-assigning to Volker.
Volker, can we close this one or should we leave it as a reminder for the further changes proposed by ambi?
Comment 14 Volker Lendecke 2013-09-16 12:07:08 UTC
(In reply to comment #13)
> Volker, can we close this one or should we leave it as a reminder for the
> further changes proposed by ambi?

Your choice. It's on the long, long list of possible enhancements now.

Volker
Comment 15 Christian Ambach 2014-01-02 18:47:47 UTC
Let's set this one to fixed