Periodically, smbd is unable to obtain file locks. The most serious manifestation is that some of the tdb databases cannot be opened/reopened, reporting messages similar to this: tdb(/var/opt/samba/locks/connections.tdb): tdb_reopen: failed to obtain active lock This prevents a client from connecting to ANY samba shares on the server. Restarting samba resolves the problem. Numerous messages are also being reported about posix_fcntl_lock, eg [2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(657) posix_fcntl_lock: WARNING: lock request at offset 4096 length -4097 returned [2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(658) an Invalid argument error. This can happen when using 64 bit lock offsets [2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(659) on 32 bit NFS mounted file systems. [2003/10/21 12:11:14, 0] locking/posix.c:posix_fcntl_lock(657) Count greater than 31 bits - retrying with 31 bit truncated length. The samba host computer is a HP C160L (32 bit arch.) The OS is HPUX 10.20 ACE
Created attachment 209 [details] smbd log, log level 1 Will send more detailed logs when available.
Created attachment 210 [details] clinet smbd log level 1 Will send more detailed logs when available.
Tony, can you send as log level 10 instead of log level 1?
Created attachment 212 [details] Some level 10 logs. Here are some logs collected at level 10 over the last 6 hours, or so. The lockout problem manifested itself when I attempted a smbclient connection from lion to itself, there should be some traces in log.lion and log.smbd There may be other occurrences present, I was away form work this morning. We are also seeing numerous log files logged against IP addresses, I assume this is (now) normal.
Interesting enough, the new logs don't contain any of the posix_fcntl_lock warnings but rather a whole bunch of these: log.172.16.2.46: tdb(/var/opt/samba/locks/messages.tdb): tdb_reopen: failed to obtain active lock Having the tdb_reopen() fail could have strange consequences and may be the reason why you are getting lockouts. Jeremy, you might find this interesting.
Created attachment 213 [details] Display more debugs when tdb_brlock() fails This patch logs some information in the cases where tdb_brlock() fails. This should help track down why the active lock cannot be taken when calling tdb_reopen_all()
Created attachment 215 [details] More level 10 logs, with tdb.c patch applied Here are this mornings level 10 logs, with the tdb_brlock patch applied.
It looks like the posix_fcntl_lock() problem is present in the set of logs in comment 7 but not the tdb_brlock() problem but that's still OK. I'm wondering whether this is a problem with HPUX and negative lock offsets? I'll have to work out what a negative lock offset actually means. (-:
Tony, I think I need a copy of the generated include/config.h file. I'm deep inside the locking code and there are whole chunks of code that are dependent on various conditions detected during the configure process. What I think is happening is Samba is generating an invalid call to flock() by specifying an offset before the start of a file. HPUX is quite rightly returning an invalid argument error for this case. http://www.opengroup.org/onlinepubs/007908799/xsh/fcntl.html
Created attachment 216 [details] config.h, config.log, config.status I added config.log and config.status in case these might help. Tony
Thanks Tony - those files are very useful. My latest theory is that the locking code is getting confused because on your system off_t is only 32-bits wide, but we have a hardcoded define of MAX_POSITIVE_LOCK_OFFSET=0x1ffffffffffLL. It looks like there is some truncation strangeness going on in some of the if statements in posix.c:posix_lock_in_range() but I haven't figured out the exact mechanism yet.
Created attachment 217 [details] Define MAX_POSITIVE_LOCK_OFFSET to be a 32-bit value This patch should be a quick-fix. We should really detect whether MAX_POSITIVE_LOCK_OFFSET is set to an appropriate value at configure time, as well as have an assert in the fcntl() wrapper to abort when a negative value is passed as an offset. It looks like HPUX is the only platform that defines MAX_POSITIVE_LOCK_OFFSET though. I might have to track down whoever put it in and ask them why they did it.
Created attachment 218 [details] Samba logs after the max_offset patch Here are the current level 10 logs produced after the max_offset patch. Things seem better, although the 'tdb_brlock failed' messages are still occurring. I have seen none of the posix_fcntl_lock messages. I will quiz the users later to see if they are still having trouble seeing shares. Tony
Excellent. The tdb_brlock() must be a separate problem then.
The fcntl debugs look a lot more reasonable now, the lack of negative numbers is definitely a bonus. (-: I've checked the patch for tdb.c in to CVS as I think it's an appropriate change to have in the code anyway. Reassigning to me.
Created attachment 225 [details] Samba logs with tdb_brlock failed messages Here are some logs captured this morning, after I saw several 'tdb_brlock failed' messages. I switched the loglevel to 10, and captured these - particularly see log.smbd at 2003/10/27 11:47:14. I have not had any complaints from the users about unavailable shares. Tony
According to the manpage, EACCES is returned when the "operation is prohibited by locks held by other processes". Tony, have you seen any more troubles in the office?
I have not heard any complaints since the MAX_POSITIVE_LOCK_OFFSET patch. Whatever is going on with the tdb_brlock failed messages, it seems to have no impact on our users.
Marking as fixed.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.
database cleanup