10293 – brunlock fails due to correct length issue.

Bug 10293 - brunlock fails due to correct length issue.

Summary: brunlock fails due to correct length issue.

Status:	RESOLVED WONTFIX

Alias:	None

Product:	TDB
Classification:	Unclassified
Component:	libtdb (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P5 normal
Target Milestone:	---
Assignee:	Samba QA Contact
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-11-26 18:50 UTC by Alan Hourihane
Modified:	2013-11-28 13:10 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
FIx brunlock length (400 bytes, patch) 2013-11-26 18:50 UTC, Alan Hourihane	no flags	Details
New unlock fix. (758 bytes, patch) 2013-11-26 20:19 UTC, Alan Hourihane	no flags	Details
New lockfix (3.52 KB, patch) 2013-11-28 12:08 UTC, Alan Hourihane	no flags	Details
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alan Hourihane 2013-11-26 18:50:17 UTC

Created attachment 9479 [details]
FIx brunlock length

I've been having trouble with gencache.tdb and failing to unlock. It turns out that an unmatched length on brunlock is the problem.

Patch attached.

Comment 1 Volker Lendecke 2013-11-26 19:46:04 UTC

What platform is this?

Comment 2 Volker Lendecke 2013-11-26 19:49:05 UTC

By the way, I believe the patch is incorrect. We do lock the whole tdb with lock length 0 (which means up to the end of the 2^64 bit address space).

Comment 3 Alan Hourihane 2013-11-26 20:19:27 UTC

Created attachment 9481 [details]
New unlock fix.

Update patch.

This matches what tdb_allrecord_lock() does on unlock.

Comment 4 Volker Lendecke 2013-11-26 20:23:53 UTC

Since my last post here I tried to read from susv4 whether we're allowed to unlock the whole range when we've locked in two pieces. Everything I have seen so far on the platforms I've worked indicates that this is allowed. Can you enlighten me why the unlock with two adjacent and thus merged locks should not be allowed?

Still, I would like to know which platform you are using.

Comment 5 Volker Lendecke 2013-11-26 20:26:48 UTC

Next question: What is the exact error message you are seeing? Can you get us an strace of the failing unlock?

Comment 6 Alan Hourihane 2013-11-26 20:27:21 UTC

It's an m68k Atari platform. I could probably fix the platform if this is what is allowed from susv4.

But currently I need to follow the same semantics that came in.

Comment 7 Volker Lendecke 2013-11-26 20:36:29 UTC

Do you have a link to that platform? I'm just curious

Comment 8 Volker Lendecke 2013-11-26 20:42:30 UTC

I think I found it:

http://sparemint.atariforge.net/cgi-bin/cvsweb/freemint/README?rev=1.2&content-type=text/x-cvsweb-markup

mentions you :-)

Comment 9 Alan Hourihane 2013-11-26 20:44:17 UTC

Yep. That's it.

Comment 10 Volker Lendecke 2013-11-26 20:48:17 UTC

Hmmm. Don't find your kernel sources to look at your fcntl implementation...

Comment 11 Volker Lendecke 2013-11-26 21:11:48 UTC

(In reply to comment #6)
> It's an m68k Atari platform. I could probably fix the platform if this is what
> is allowed from susv4.

Well, tdb in general is pretty portable in its use of fcntl. fcntl on many platforms does not scale well at all, but the semantics are pretty uniform and after all those years you're the first one to have problems with it.

Your updated patch -- does it fully help you?

Comment 12 Alan Hourihane 2013-11-26 21:38:03 UTC

Yes, the patch fixes my troubles, and it still works on my Linux x86_64 box.

Comment 13 Volker Lendecke 2013-11-28 07:00:29 UTC

(In reply to comment #12)
> Yes, the patch fixes my troubles, and it still works on my Linux x86_64 box.

Sorry, but the second patch is also incomplete, sorry for not finding this earlier: If I'm getting you right then you can't unlock a lock range that was locked in pieces and implicitly merged. Assuming that is correct, tdb gives you a larger challenge: Look at tdb_allrecord_lock and the call to tdb_chainlock_gradual: Under load, the allrecord lock takes the hash chain range in arbitrary, non-predictable pieces. We do this because the hash chains are potentially heavily contended, and one large lock request might starve.

Comment 14 Alan Hourihane 2013-11-28 08:17:20 UTC

Yes, I noticed this as well. I'll get this fixed up and submit a new patch.

Comment 15 Alan Hourihane 2013-11-28 12:08:51 UTC

Created attachment 9490 [details]
New lockfix

A new allrecord unlock fix.

This caches a list of allrecord locks for unlock.

I'm not sure if there a wider repercussions of this, but let me know and I'll continue with additional code.

Comment 16 Volker Lendecke 2013-11-28 12:44:41 UTC

Can you give me a hint where you implement the fcntl locks?

Maybe it's finally easier to fix the kernel. Everybody else does what Samba expects, so I bet you will hit similar problems with other applications in the future. I'd like to understand your fcntl lock implementation and see whether it's possible to change it.

Comment 17 Alan Hourihane 2013-11-28 13:06:27 UTC

Don't worry. I'll go fix it then.

Comment 18 Alan Hourihane 2013-11-28 13:09:03 UTC

Oh, just to add there's a bug on unlock when the initial chained lock succeeds and the second lock fails, when it tries to unlock the chained lock. That essentially is hitting the same problem....

        /* Grab individual record locks. */
        if (tdb_brlock(tdb, ltype, lock_offset(tdb->hash_size), 0,
                       flags) == -1) {
                tdb_brunlock(tdb, ltype, FREELIST_TOP,
                             tdb->hash_size * 4);
                return -1;
        }

The tdb_brunlock() here is wrong as it's doing what I was doing in the previous patch.

Comment 19 Alan Hourihane 2013-11-28 13:10:08 UTC

Unless that is, assuming merged locks which I guess is the root to all of this.