Created attachment 9479 [details] FIx brunlock length I've been having trouble with gencache.tdb and failing to unlock. It turns out that an unmatched length on brunlock is the problem. Patch attached.
What platform is this?
By the way, I believe the patch is incorrect. We do lock the whole tdb with lock length 0 (which means up to the end of the 2^64 bit address space).
Created attachment 9481 [details] New unlock fix. Update patch. This matches what tdb_allrecord_lock() does on unlock.
Since my last post here I tried to read from susv4 whether we're allowed to unlock the whole range when we've locked in two pieces. Everything I have seen so far on the platforms I've worked indicates that this is allowed. Can you enlighten me why the unlock with two adjacent and thus merged locks should not be allowed? Still, I would like to know which platform you are using.
Next question: What is the exact error message you are seeing? Can you get us an strace of the failing unlock?
It's an m68k Atari platform. I could probably fix the platform if this is what is allowed from susv4. But currently I need to follow the same semantics that came in.
Do you have a link to that platform? I'm just curious
I think I found it: http://sparemint.atariforge.net/cgi-bin/cvsweb/freemint/README?rev=1.2&content-type=text/x-cvsweb-markup mentions you :-)
Yep. That's it.
Hmmm. Don't find your kernel sources to look at your fcntl implementation...
(In reply to comment #6) > It's an m68k Atari platform. I could probably fix the platform if this is what > is allowed from susv4. Well, tdb in general is pretty portable in its use of fcntl. fcntl on many platforms does not scale well at all, but the semantics are pretty uniform and after all those years you're the first one to have problems with it. Your updated patch -- does it fully help you?
Yes, the patch fixes my troubles, and it still works on my Linux x86_64 box.
(In reply to comment #12) > Yes, the patch fixes my troubles, and it still works on my Linux x86_64 box. Sorry, but the second patch is also incomplete, sorry for not finding this earlier: If I'm getting you right then you can't unlock a lock range that was locked in pieces and implicitly merged. Assuming that is correct, tdb gives you a larger challenge: Look at tdb_allrecord_lock and the call to tdb_chainlock_gradual: Under load, the allrecord lock takes the hash chain range in arbitrary, non-predictable pieces. We do this because the hash chains are potentially heavily contended, and one large lock request might starve.
Yes, I noticed this as well. I'll get this fixed up and submit a new patch.
Created attachment 9490 [details] New lockfix A new allrecord unlock fix. This caches a list of allrecord locks for unlock. I'm not sure if there a wider repercussions of this, but let me know and I'll continue with additional code.
Can you give me a hint where you implement the fcntl locks? Maybe it's finally easier to fix the kernel. Everybody else does what Samba expects, so I bet you will hit similar problems with other applications in the future. I'd like to understand your fcntl lock implementation and see whether it's possible to change it.
Don't worry. I'll go fix it then.
Oh, just to add there's a bug on unlock when the initial chained lock succeeds and the second lock fails, when it tries to unlock the chained lock. That essentially is hitting the same problem.... /* Grab individual record locks. */ if (tdb_brlock(tdb, ltype, lock_offset(tdb->hash_size), 0, flags) == -1) { tdb_brunlock(tdb, ltype, FREELIST_TOP, tdb->hash_size * 4); return -1; } The tdb_brunlock() here is wrong as it's doing what I was doing in the previous patch.
Unless that is, assuming merged locks which I guess is the root to all of this.