The Samba-Bugzilla – Bug 3912
Kernel Oplocks problem
Last modified: 2006-08-09 03:16:25 UTC
We have a problem with an application that clears the archive bit before
writing and sets it after writing.
The latter one doesn't succeed if the writer != file owner.
Dos filemode is enabled and manually setting is fine.
Samba is 3.0.23RC3.
Client is XP/SP2.
Seems to loose the connection before setting the bit and a new smbd is
Created attachment 2017 [details]
Level 10 log
Created attachment 2018 [details]
With kernel oplocks = no the bug doesn't show up at all!
Should there be something really broken in Samba/Linux??
Created attachment 2037 [details]
Level 10 log with kernel oplocks off
now it works
Jeremy, please close if this is fixed in 3.0.23a
I'm closing this as I fixed a race condition bug with 3.0.23a. Please retest with this release and reopen if you can reproduce.
Unfortunately not fixed with 3.0.23a.
BTW, as soon as I activated kernel oplocks for testing with 3.0.23a I saw again "lost delayed write data" on saving.
Linux/x86 is 2.6.17.
I looked at your log files/sniff a couple of days ago, but I could not detect any failure. In particular, in the sniff there is only one SET_FILE_INFO call that appears to reset the archive bit (frame 320), none that sets it afterwards. Can you point me at the specific line/frame number that fails?
Alternatively, please upload a full smbd debug level 10 log and a full sniff from starting the smbd connection to the failure, please in both cases with and without kernel oplocks.
Ok, here are 2 complete tcpdumps with(out) kernel oplocks from loading to saving (it's an InstallShield projekt).
2 corresponding level 10 logs, the no-kernel-oplock log misses it's first part due log rotation, hopefully it's not important.
With kernel oplocks the client complains "no connection to share" on save, the last line in the log shows a new connection.
user is cad, client DELLDEV4, file is Facton5.x.ism.
Created attachment 2056 [details]
tcpdump no kernel oplocks
Created attachment 2057 [details]
level 10 log w/o oplocks
Created attachment 2058 [details]
tcpdump kernel oplocks
Created attachment 2059 [details]
level 10 log w oplocks
Ok, thanks. These logs are more helpful. Although now I see something very strange: in the "oplock" case at the very end of the sniff after the write call in packet 4206 you can see the server to end the connection, and the client restart it. In the corresponding logfile you can see the write attempt but *nothing* after that. If we get a signal that somehow gives us a chance to panic reasonably, then I would expect a panic message in the log, but there is really only the reconnect by the client.
This makes me assume that we get a KILL signal from the kernel for some reason.
To verify this, the next step is an strace of smbd. Can you start your smbd with
strace -f -ttT -o /tmp/strace.smbd /usr/sbin/smbd -D
and re-run the test with kernel oplocks on? Can you also upload the logfile and sniff for that?
BTW, what exact system and kernel version are you using?
Created attachment 2060 [details]
I quickly took a strace log only of the affected smbd.
Kernel is 2.6.17, SuSE 9.1
Quick shot: Is it possible that you have async I/O enabled? Can you try disabling it?
I don't think so.
Built by: root@server
Built on: Sun Jul 23 22:50:05 CEST 2006
Built using: gcc
Build host: Linux server 2.6.17 #27 Mon Jun 19 17:49:15 CEST 2006 i686 i686 i386 GNU/Linux
sizeof(long long): 8
pdb_ldap pdb_smbpasswd pdb_tdbsam rpc_lsa rpc_reg rpc_lsa_ds rpc_wks rpc_svcctl rpc_ntsvcs rpc_net rpc_netdfs rpc_srv rpc_spoolss rpc_eventlog rpc_samr idmap_ldap idmap_tdb auth_sam auth_unix auth_winbind auth_server auth_domain auth_builtin
Checked it against another server (Kernel 2.6.17/x64, SuSE 9.3), problem is reproducible.
Can you check this on an older kernel ? Is it possible this is an issue with 2.6.17 ? Running here on 126.96.36.199-21.12-smp on SuSE 9.3 I don't see this problem.
You've got it :-)
2.6.16 works like charm.
Finally I know now why the save problems started without Samba change in end of June.
Maybe kernel oplocks and 2.6.17 could be some kind of dangerous.
Thanks Volker & Jeremy!
Oh that's bad :-(. Now we need to get a kernel bug fixed, and that's much harder than fixing Samba....
Has the kernel bug (fcntl(F_SETSIG) no longer working) been reported? I've got a simple test program that demonstrates the problem.
*** Bug 3970 has been marked as a duplicate of this bug. ***
No it hasn't been reported. If you post your test case here I'll attach it to the SuSE bugzilla databases. I don't know where to post this for the kernel.org kernels (kernel mailing list ?).
I've reported it to the Fedora bugzilla. I'll post to the linux kernel list as well.
I've checked it again with this patch from the kernel mailing list and the problems are gone.
Thanks to Orion for bringing this on the list!
fcntl(F_SETSIG) no longer works on leases because
lease_release_private_callback() gets called as the lease is copied in
order to initialise it.
The problem is that lease_alloc() performs an unnecessary initialisation,
which sets the lease_manager_ops. Avoid the problem by allocating the
target lease structure using locks_alloc_lock().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fs/locks.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index b0b41a6..d7c5339 100644
@@ -1421,8 +1421,9 @@ static int __setlease(struct file *filp,
- error = lease_alloc(filp, arg, &fl);
- if (error)
+ error = -ENOMEM;
+ fl = locks_alloc_lock();
+ if (fl == NULL)
@@ -1430,6 +1431,7 @@ static int __setlease(struct file *filp,
*flp = fl;
+ error = 0;