This is somewhat easy to reproduce using the NetApp tool opl1.exe. Sorry if this is too much information; although I think more is better than less. Note that I built Samba 3.0.11 with "-g" option so I could get around easily in gdb. Here is my environment setup: Samba 3.0.11 serving a single share "pub" Two Windows machines, mapped to "pub", each running the opl1 tool: winclient1: opl1 -t1 z:\m.bat winclient2: opl1 -t1 f:\m.bat smb.conf: [global] workgroup = TJAPAN realm = TJAPAN.ACME.NET netbios name = iqtest security = ADS password server = japan1,japan2 encrypt passwords = Yes winbind uid = 10000-65534 winbind gid = 10000-65534 debug pid = yes [pub] path = /shares/pub read only = yes valid users = "TJAPAN\Domain Users", write list = "TJAPAN\Domain Users", Here is an overview of the opl1 tool, straight from NetApp Test Suite readme file: --------------- opl1 is a command line test to test file sharing. It is designed to cause oplock breaks. N copies of opl1 are run against the same file either on the same machine or using different machines. usage: G:\OPL1.EXE [filename][-?hvt] filename - test file (default: opl1.dat) -?|h - display help -t# - set wait time in ms (default 5000ms) -v - toggle messages (default: off) Program flow: while (true) { open file wait specified time perform single operation wait specified time close file wait specified time } The usual test will be for two machines to open the same file. We generally run with both the default time (which increase the probability that two system will collide) and with -t = 1 (which increases the number of test cases). Locking problems show themselves as pauses in the test or stats within the filer. The test does not explicitly detect or flag errors. ----------- While the test is running if you tail the smb.log on the server, after a while one of the Samba processes will panic with something like: release_level_2_oplocks_on_change(1287) release_level_2_oplocks_on_change: PANIC. File m.bat still has a level II oplock. A common precursor to this is a locking error reported by the other samba client daemon: [2005/04/15 16:42:07, 0, pid=20423] tdb/tdbutil.c:tdb_log(725) tdb(/var/tmp/iqtest/locking.tdb): tdb_lock failed on list 57 ltype=1 (Resource deadlock avoided) [2005/04/15 16:42:07, 0, pid=20423] smbd/oplock.c:oplock_break(708) oplock_break: unable to lock share entry for file m.bat I added an F_GETLK fcntl() call to tdb_brlock() on a failure to verify it was the other Samba daemon holding the lock. I attached to both daemons from gdb, set a breakpoint on the F_SETLKW failure. Here are the relevant gdb backtraces and frame information for both: PID 20423: Breakpoint 1, tdb_brlock (tdb=0x8348770, offset=396, rw_type=1, lck_type=14, probe=0) at tdb/tdb.c:288 288 errno = saved_errno; (gdb) bt #0 tdb_brlock (tdb=0x8348770, offset=396, rw_type=1, lck_type=14, probe=0) at tdb/tdb.c:288 #1 0x081b9aaa in tdb_lock (tdb=0x8348770, list=57, ltype=1) at tdb/tdb.c:315 #2 0x081bc7bf in tdb_chainlock (tdb=0x8348770, key= {dptr = 0x82a2f20 "\001▒~V~R", dsize = 16}) at tdb/tdb.c:1974 #3 0x08172246 in lock_share_entry_fsp (fsp=0x83539f8) at locking/locking.c:392 #4 0x081d4a2a in oplock_break (dev=65025, inode=2582110, file_id=1026, local_request=0) at smbd/oplock.c:707 #5 0x081d42a4 in process_local_message (buffer=0x159f9008 "\026", buf_size=131137) at smbd/oplock.c:450 #6 0x080c8107 in async_processing (buffer=0x159f9008 "\026", buffer_len=131137) at smbd/process.c:285 #7 0x080c8624 in receive_message_or_smb (buffer=0x159f9008 "\026", buffer_len=131137, timeout=60000) at smbd/process.c:469 #8 0x080c9bc5 in smbd_process () at smbd/process.c:1543 #9 0x08211ef8 in main (argc=4, argv=0x3ffff7f4) at smbd/server.c:951 #10 0x1577d1c4 in __libc_start_main () from /lib/libc.so.6 (gdb) l 283 284 fcntl(tdb->fd, F_GETLK, &fl); 285 TDB_LOG((tdb, 0, "F_GETLCK:l_type:%d, l_whence:%d, l_pid:%d" 286 " l_start:%Ld, l_len:%Ld\n", fl.l_type, fl.l_whence, 287 fl.l_pid, fl.l_start, fl.l_len)); 288 errno = saved_errno; 289 } 290 return TDB_ERRCODE(TDB_ERR_LOCK, -1); 291 } 292 return 0; (gdb) p fl $1 = {l_type = 1, l_whence = 0, l_start = 396, l_len = 1, l_pid = 20487} (gdb) p errno $2 = 35 PID 20487: (hit ctrl-c to see where it's blocked) Program received signal SIGINT, Interrupt. 0x15835674 in open () from /lib/libc.so.6 (gdb) bt #0 0x15835674 in open () from /lib/libc.so.6 #1 0x1588830c in __DTOR_END__ () from /lib/libc.so.6 #2 0x0819a1b0 in sys_open (path=0x3ffff120 "m.bat", oflag=66, mode=496) at lib/system.c:345 #3 0x080bebf4 in vfswrap_open (handle=0x0, conn=0x834fb38, fname=0x3ffff120 "m.bat", flags=66, mode=496) at smbd/vfs-wrap.c:190 #4 0x080b7206 in fd_open (conn=0x834fb38, fname=0x3ffff120 "m.bat", flags=66, mode=496) at smbd/open.c:47 #5 0x080b74ef in open_file (fsp=0x834c788, conn=0x834fb38, fname=0x3ffff120 "m.bat", psbuf=0x3ffff0c0, flags=578, mode=496, desired_access=131487) at smbd/open.c:175 #6 0x080b9487 in open_file_shared1 (conn=0x834fb38, fname=0x3ffff120 "m.bat", psbuf=0x3ffff0c0, desired_access=131487, share_mode=66, ofun=18, new_dos_mode=0, oplock_request=3, Access=0x3fffecac, paction=0x3fffecb0) at smbd/open.c:1250 #7 0x08096010 in reply_ntcreate_and_X (conn=0x834fb38, inbuf=0x159f9008 "", outbuf=0x15a1a008 "", length=102, bufsize=131072) at smbd/nttrans.c:823 #8 0x080c8d4a in switch_message (type=162, inbuf=0x159f9008 "", outbuf=0x15a1a008 "", size=102, bufsize=131072) at smbd/process.c:968 #9 0x080c8dd9 in construct_reply (inbuf=0x159f9008 "", outbuf=0x15a1a008 "", size=102, bufsize=131072) at smbd/process.c:998 #10 0x080c90e9 in process_smb (inbuf=0x159f9008 "", outbuf=0x15a1a008 "") at smbd/process.c:1098 #11 0x080c9bfd in smbd_process () at smbd/process.c:1558 #12 0x08211ef8 in main (argc=4, argv=0x3ffff7f4) at smbd/server.c:951 #13 0x1577d1c4 in __libc_start_main () from /lib/libc.so.6 Someone who understands the oplock logic can maybe figure out how this can occur. To me it looks as if the open() (pid 20487) call triggered a kernel oplock break to the other samba daemon (pid 20423). Granted pid 20487 has the locking.tdb list locked for this dev/inode so I'm not sure how pid 20423 placed an oplock since it seems you need to aquire the tdb lock prior to placing the file oplock. But I'm rambling. If you need more info, please email me. I'm going to continue to look at this as well.
Do you have a copy of this tool I can download and use ? That would be helpful I think. Thanks, Jeremy.
Sorry, having the tool would help I guess. I believe opl1.exe depends on a few dll's in the netapp-testtools.exe package. But you can find this package at the Connectathon 2005 site, http://www. connectathon.org/, under the Test Suites downloads. TomL
Ok, I've been trying to reproduce this with the latest SVN build of Samba 3.0.x (revision 7198) and can't. I'm using a 2.6.10 kernel... Can you try and reproduce this with our latest SVN code please ? Jeremy.
closing. no response.