Bug 658 - smbd cannot (apparently) obtain file locks
smbd cannot (apparently) obtain file locks
Status: CLOSED FIXED
Product: Samba 3.0
Classification: Unclassified
Component: File Services
3.0.0
All HP-UX
: P3 major
: none
Assigned To: Tim Potter
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2003-10-20 21:04 UTC by Tony Butt
Modified: 2005-11-14 09:29 UTC (History)
2 users (show)

See Also:


Attachments
smbd log, log level 1 (32.39 KB, text/plain)
2003-10-20 21:07 UTC, Tony Butt
no flags Details
clinet smbd log level 1 (12.54 KB, text/plain)
2003-10-20 21:09 UTC, Tony Butt
no flags Details
Some level 10 logs. (63.20 KB, application/octet-stream)
2003-10-21 20:10 UTC, Tony Butt
no flags Details
Display more debugs when tdb_brlock() fails (970 bytes, patch)
2003-10-21 23:17 UTC, Tim Potter
no flags Details
More level 10 logs, with tdb.c patch applied (298.69 KB, application/octet-stream)
2003-10-22 17:14 UTC, Tony Butt
no flags Details
config.h, config.log, config.status (59.32 KB, application/octet-stream)
2003-10-22 20:59 UTC, Tony Butt
no flags Details
Define MAX_POSITIVE_LOCK_OFFSET to be a 32-bit value (879 bytes, patch)
2003-10-22 23:27 UTC, Tim Potter
no flags Details
Samba logs after the max_offset patch (182.31 KB, application/octet-stream)
2003-10-23 18:39 UTC, Tony Butt
no flags Details
Samba logs with tdb_brlock failed messages (210.76 KB, application/octet-stream)
2003-10-26 17:12 UTC, Tony Butt
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tony Butt 2003-10-20 21:04:04 UTC
Periodically, smbd is unable to obtain file locks.

The most serious manifestation is that some of the tdb databases cannot be
opened/reopened, reporting messages similar to this:

tdb(/var/opt/samba/locks/connections.tdb): tdb_reopen: failed to obtain active lock

This prevents a client from connecting to ANY samba shares on the server.
Restarting samba resolves the problem.

Numerous messages are also being reported about posix_fcntl_lock, eg

[2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(657)
  posix_fcntl_lock: WARNING: lock request at offset 4096 length -4097 returned
[2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(658)
  an Invalid argument error. This can happen when using 64 bit lock offsets
[2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(659)
  on 32 bit NFS mounted file systems.
[2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(657)
  Count greater than 31 bits - retrying with 31 bit truncated length.


The samba host computer is a HP C160L (32 bit arch.)
The OS is HPUX 10.20 ACE
Comment 1 Tony Butt 2003-10-20 21:07:59 UTC
Created attachment 209 [details]
smbd log, log level 1

Will send more detailed logs when available.
Comment 2 Tony Butt 2003-10-20 21:09:13 UTC
Created attachment 210 [details]
clinet smbd log level 1

Will send more detailed logs when available.
Comment 3 Tim Potter 2003-10-20 21:59:12 UTC
Tony, can you send as log level 10 instead of log level 1?
Comment 4 Tony Butt 2003-10-21 20:10:34 UTC
Created attachment 212 [details]
Some level 10 logs.

Here are some logs collected at level 10 over the last 6 hours, or so.

The lockout problem manifested itself when I attempted a smbclient connection
from lion to itself, there should be some traces in log.lion and log.smbd
There may be other occurrences present, I was away form work this morning.

We are also seeing numerous log files logged against IP addresses, I assume
this  is (now) normal.
Comment 5 Tim Potter 2003-10-21 22:56:32 UTC
Interesting enough, the new logs don't contain any of the posix_fcntl_lock
warnings but rather a whole bunch of these:

log.172.16.2.46:  tdb(/var/opt/samba/locks/messages.tdb): tdb_reopen: failed to
obtain active lock

Having the tdb_reopen() fail could have strange consequences and may be the
reason why you are getting lockouts.

Jeremy, you might find this interesting.
Comment 6 Tim Potter 2003-10-21 23:17:29 UTC
Created attachment 213 [details]
Display more debugs when tdb_brlock() fails

This patch logs some information in the cases where tdb_brlock() fails.  This
should help track down why the active lock cannot be taken when calling
tdb_reopen_all()
Comment 7 Tony Butt 2003-10-22 17:14:56 UTC
Created attachment 215 [details]
More level 10 logs, with tdb.c patch applied

Here are this mornings level 10 logs, with the tdb_brlock patch applied.
Comment 8 Tim Potter 2003-10-22 18:09:58 UTC
It looks like the posix_fcntl_lock() problem is present in the set of logs in
comment 7 but not the tdb_brlock() problem but that's still OK.

I'm wondering whether this is a problem with HPUX and negative lock offsets? 
I'll have to work out what a negative lock offset actually means.  (-:
Comment 9 Tim Potter 2003-10-22 18:37:36 UTC
Tony, I think I need a copy of the generated include/config.h file.  I'm deep
inside the locking code and there are whole chunks of code that are dependent on
various conditions detected during the configure process.

What I think is happening is Samba is generating an invalid call to flock() by
specifying an offset before the start of a file.  HPUX is quite rightly
returning an invalid argument error for this case.

http://www.opengroup.org/onlinepubs/007908799/xsh/fcntl.html
Comment 10 Tony Butt 2003-10-22 20:59:09 UTC
Created attachment 216 [details]
config.h, config.log, config.status

I added config.log and config.status in case these might help.
Tony
Comment 11 Tim Potter 2003-10-22 22:27:53 UTC
Thanks Tony - those files are very useful.

My latest theory is that the locking code is getting confused because on your
system off_t is only 32-bits wide, but we have a hardcoded define of
MAX_POSITIVE_LOCK_OFFSET=0x1ffffffffffLL.  It looks like there is some
truncation strangeness going on in some of the if statements in
posix.c:posix_lock_in_range() but I haven't figured out the exact mechanism yet.
Comment 12 Tim Potter 2003-10-22 23:27:27 UTC
Created attachment 217 [details]
Define MAX_POSITIVE_LOCK_OFFSET to be a 32-bit value

This patch should be a quick-fix.  We should really detect whether
MAX_POSITIVE_LOCK_OFFSET is set to an appropriate value at configure time, as
well as have an assert in the fcntl() wrapper to abort when a negative value is
passed as an offset.

It looks like HPUX is the only platform that defines MAX_POSITIVE_LOCK_OFFSET
though.  I might have to track down whoever put it in and ask them why they did
it.
Comment 13 Tony Butt 2003-10-23 18:39:09 UTC
Created attachment 218 [details]
Samba logs after the max_offset patch

Here are the current level 10 logs produced after the max_offset patch.
Things seem better, although the 'tdb_brlock failed' messages are still
occurring.
I have seen none of the posix_fcntl_lock messages.

I will quiz the users later to see if they are still having trouble seeing
shares.

Tony
Comment 14 Tim Potter 2003-10-23 20:48:20 UTC
Excellent.  The tdb_brlock() must be a separate problem then.
Comment 15 Tim Potter 2003-10-23 22:46:49 UTC
The fcntl debugs look a lot more reasonable now, the lack of negative numbers is
definitely a bonus.  (-:

I've checked the patch for tdb.c in to CVS as I think it's an appropriate change
to have in the code anyway.

Reassigning to me.
Comment 16 Tony Butt 2003-10-26 17:12:28 UTC
Created attachment 225 [details]
Samba logs with tdb_brlock failed messages

Here are some logs captured this morning, after I saw several 'tdb_brlock
failed'
messages.
I switched the loglevel to 10, and captured these -
particularly see log.smbd at 2003/10/27 11:47:14.

I have not had any complaints from the users about unavailable shares.
Tony
Comment 17 Tim Potter 2003-11-04 13:44:55 UTC
According to the manpage, EACCES is returned when the "operation is prohibited
by locks held by other processes".

Tony, have you seen any more troubles in the office?
Comment 18 Tony Butt 2003-11-04 15:41:53 UTC
I have not heard any complaints since the MAX_POSITIVE_LOCK_OFFSET patch.
Whatever is going on with the tdb_brlock failed messages, it seems to have no
impact on our users.
Comment 19 Tim Potter 2004-02-10 14:38:22 UTC
Marking as fixed.
Comment 20 Gerald (Jerry) Carter 2005-08-24 10:25:56 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.
Comment 21 Gerald (Jerry) Carter 2005-11-14 09:29:30 UTC
database cleanup