Bug 10564 - Lock order violation and file lost
Summary: Lock order violation and file lost
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.1.6
Hardware: All All
: P5 major (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
: 10537 10591 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-04-24 16:52 UTC by Tom Schulz
Modified: 2016-09-28 20:41 UTC (History)
4 users (show)

See Also:


Attachments
log.smbd (240.22 KB, text/plain)
2014-04-25 14:12 UTC, Tom Schulz
no flags Details
log.fangtooth (340.95 KB, text/plain)
2014-04-25 14:14 UTC, Tom Schulz
no flags Details
log.192.168.2.226 (169.41 KB, text/plain)
2014-04-25 14:18 UTC, Tom Schulz
no flags Details
The wireshark file (23 bytes, text/plain)
2014-04-25 19:08 UTC, Tom Schulz
no flags Details
Patch that should get you up and running again.. (5.01 KB, patch)
2014-04-25 23:40 UTC, Jeremy Allison
no flags Details
git-am fix for 4.1.next. (16.12 KB, patch)
2014-05-01 18:56 UTC, Jeremy Allison
no flags Details
git-am fix for 4.1.next (16.61 KB, patch)
2014-05-02 22:23 UTC, Jeremy Allison
vl: review+
Details
git-am fix for 4.0.next (15.04 KB, patch)
2014-05-02 22:26 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Schulz 2014-04-24 16:52:19 UTC
I am making the severity as major. If I did not have Samba 3.6.22 to fall back on, I would mark is as critical.

We have a problem with files being deleted instead of being saved when
making changes in Microsoft Excel. The log shows an error of:
'Lock order violation'. This is with Samba 4.1.6 and happens on both
Linux (2.6.25-14.fc9.i686) and Solaris 10. No such problem when running
Samba 3.6.22.

We have a setup where we need to allow any user to edit an excel file.
To allow this we have an ACL set on the directory and on the file to force
write permission for group. The problem only occurs when the user doing
the editing is not the current owner of the file. Before the attempt to
edit the file we have the file (as seen on Linux or Solaris):

-rwxrwxr--+  1 user1    users    2611248 Apr 23 14:02 glorp.xlsx*

The directory and the file have the following ACLs.
# file: .
# owner: root
# group: users
user::rwx
group::rwx              #effective:rwx
group:users:rwx         #effective:rwx
mask:rwx
other:r-x
default:user::rwx
default:group::rwx
default:group:users:rwx
default:mask:rwx
default:other:r-x

# file: glorp.xlsx
# owner: user1
# group: users
user::rwx
user:user1:rwx          #effective:rwx
group::rwx              #effective:rwx
group:users:rwx         #effective:rwx
mask:rwx
other:r--

After making changes and trying to save the file, a pop up box says
'file not saved'. At this point the file is gone but we have the following
files in the directroy.

-rwxrwxr--+  1 user1    users    2596791 Apr 23 13:39 F790B97D.tmp
-rwxrwxr--+  1 user2    users        165 Apr 23 14:01 ~$glorp.xlsx

The F790B97D.tmp looks to be the original file without the changes.

If the user now tries to exit Excel the user is again prompted to save
the file. If he now clicks on Don't Save than as far as he can tell the
file is lost. If he clicks on Save to try again to save the file, the
file will be saved correctly and will show up as:

-rwxrwxr--+  1 user2    users    2611248 Apr 23 14:02 glorp.xlsx

But the '~$glorp.xlsx' and 'F790B97D.tmp' files remain in the directory.
Unfortunately after having had one failure to save the file, the users
usually click on Don't Save the second time and the file is gone.

The system log file now shows the following:

Apr 23 13:58:39 bream smbd[29505]: [2014/04/23 13:58:39.412682,  0] ../lib/dbwrap/dbwrap.c:193(dbwrap_check_lock_order)
Apr 23 13:58:39 bream smbd[29505]:   Lock order violation: Trying /opt/local/samba4/var/lock/smbXsrv_open_global.tdb at 1 while /opt/local/samba4/var/lock/locking.tdb at 1 is locked
Apr 23 13:58:39 bream smbd[29505]: [2014/04/23 13:58:39.413837,  0] ../lib/dbwrap/dbwrap.c:133(debug_lock_order)
Apr 23 13:58:39 bream smbd[29505]:   lock order:  1:/opt/local/samba4/var/lock/locking.tdb 2:<none> 3:<none>
Apr 23 13:58:39 bream smbd[29505]: [2014/04/23 13:58:39.414353,  0] ../source3/lib/util.c:785(smb_panic_s3)
Apr 23 13:58:39 bream smbd[29505]:   PANIC (pid 29505): invalid lock_order
Apr 23 13:58:39 bream smbd[29505]: [2014/04/23 13:58:39.417636,  0] ../source3/lib/util.c:896(log_stack_trace)
Apr 23 13:58:39 bream smbd[29505]:   BACKTRACE: 35 stack frames:
Apr 23 13:58:39 bream smbd[29505]:    #0 /opt/local/samba4/lib/libsmbconf.so.0(log_stack_trace+0x2b) [0x6de8dd]
Apr 23 13:58:39 bream smbd[29505]:    #1 /opt/local/samba4/lib/libsmbconf.so.0(smb_panic_s3+0x87) [0x6de730]
Apr 23 13:58:39 bream smbd[29505]:    #2 /opt/local/samba4/lib/libsamba-util.so.0(smb_panic+0x2a) [0x1613c6]
Apr 23 13:58:39 bream smbd[29505]:    #3 /opt/local/samba4/lib/private/libdbwrap.so [0xfa7d2c]
Apr 23 13:58:39 bream smbd[29505]:    #4 /opt/local/samba4/lib/private/libdbwrap.so [0xfa7e5e]
Apr 23 13:58:39 bream smbd[29505]:    #5 /opt/local/samba4/lib/private/libdbwrap.so(dbwrap_fetch_locked+0x2f) [0xfa7f5e]
Apr 23 13:58:39 bream smbd[29505]:    #6 /opt/local/samba4/lib/private/libsmbd_base.so [0x2fb932]
Apr 23 13:58:39 bream smbd[29505]:    #7 /opt/local/samba4/lib/private/libsmbd_base.so(smbXsrv_open_create+0x145) [0x2fc742]
Apr 23 13:58:39 bream smbd[29505]:    #8 /opt/local/samba4/lib/private/libsmbd_base.so(file_new+0xcd) [0x2121c7]
Apr 23 13:58:39 bream smbd[29505]:    #9 /opt/local/samba4/lib/private/libsmbd_base.so [0x28ebc9]
Apr 23 13:58:39 bream smbd[29505]:    #10 /opt/local/samba4/lib/private/libsmbd_base.so(create_file_default+0x32b) [0x28f885]
Apr 23 13:58:39 bream smbd[29505]:    #11 /opt/local/samba4/lib/private/libsmbd_base.so [0x3bd5d9]
Apr 23 13:58:39 bream smbd[29505]:    #12 /opt/local/samba4/lib/private/libsmbd_base.so(smb_vfs_call_create_file+0xba) [0x29b3c9]
Apr 23 13:58:39 bream smbd[29505]:    #13 /opt/local/samba4/lib/private/libsmbd_base.so(open_file_fchmod+0xbd) [0x28bf7f]
Apr 23 13:58:39 bream smbd[29505]:    #14 /opt/local/samba4/lib/private/libsmbd_base.so(file_set_dosmode+0x647) [0x281294]
Apr 23 13:58:39 bream smbd[29505]:    #15 /opt/local/samba4/lib/private/libsmbd_base.so(rename_internals_fsp+0xa25) [0x254943]
Apr 23 13:58:39 bream smbd[29505]:    #16 /opt/local/samba4/lib/private/libsmbd_base.so(rename_internals+0x575) [0x25509b]
Apr 23 13:58:39 bream smbd[29505]:    #17 /opt/local/samba4/lib/private/libsmbd_base.so(reply_mv+0x558) [0x255efb]
Apr 23 13:58:39 bream smbd[29505]:    #18 /opt/local/samba4/lib/private/libsmbd_base.so [0x2b0806]
Apr 23 13:58:39 bream smbd[29505]:    #19 /opt/local/samba4/lib/private/libsmbd_base.so [0x2b0984]
Apr 23 13:58:39 bream smbd[29505]:    #20 /opt/local/samba4/lib/private/libsmbd_base.so [0x2b1996]
Apr 23 13:58:39 bream smbd[29505]:    #21 /opt/local/samba4/lib/private/libsmbd_base.so [0x2b2b6d]
Apr 23 13:58:39 bream smbd[29505]:    #22 /opt/local/samba4/lib/private/libsmbd_base.so [0x2b2bdd]
Apr 23 13:58:39 bream smbd[29505]:    #23 /opt/local/samba4/lib/libsmbconf.so.0(run_events_poll+0x487) [0x6fb4d6]
Apr 23 13:58:39 bream smbd[29505]:    #24 /opt/local/samba4/lib/libsmbconf.so.0 [0x6fb7ba]
Apr 23 13:58:39 bream smbd[29505]:    #25 /opt/local/samba4/lib/private/libtevent.so.0(_tevent_loop_once+0xf5) [0x4c6394]
Apr 23 13:58:39 bream smbd[29505]:    #26 /opt/local/samba4/lib/private/libsmbd_base.so(smbd_process+0x1480) [0x2b6360]
Apr 23 13:58:39 bream smbd[29505]:    #27 /opt/local/samba4/sbin/smbd [0xb7fee6d7]
Apr 23 13:58:39 bream smbd[29505]:    #28 /opt/local/samba4/lib/libsmbconf.so.0(run_events_poll+0x487) [0x6fb4d6]
Apr 23 13:58:39 bream smbd[29505]:    #29 /opt/local/samba4/lib/libsmbconf.so.0 [0x6fb7ba]
Apr 23 13:58:39 bream smbd[29505]:    #30 /opt/local/samba4/lib/private/libtevent.so.0(_tevent_loop_once+0xf5) [0x4c6394]
Apr 23 13:58:39 bream smbd[29505]:    #31 /opt/local/samba4/sbin/smbd [0xb7fef41c]
Apr 23 13:58:39 bream smbd[29505]:    #32 /opt/local/samba4/sbin/smbd(main+0x1857) [0xb7ff0ddf]
Apr 23 13:58:39 bream smbd[29505]:    #33 /lib/libc.so.6(__libc_start_main+0xe6) [0xcc35d6]
Apr 23 13:58:39 bream smbd[29505]:    #34 /opt/local/samba4/sbin/smbd [0xb7fe9e21]
Apr 23 13:58:39 bream smbd[29505]: [2014/04/23 13:58:39.426906,  0] ../source3/lib/dumpcore.c:317(dump_core)
Apr 23 13:58:39 bream smbd[29505]:   dumping core in /opt/local/samba4/var/logs/bream.adi.com/cores/smbd
Apr 23 13:58:39 bream smbd[29505]:
Comment 1 Jeremy Allison 2014-04-24 17:24:38 UTC
Can you attach a full debug level 10 log to the bug please, plus a wireshark capture trace from the affected client.

I think I understand the issue, but this would help confirm it.

Thanks,

Jeremy.
Comment 2 Jeremy Allison 2014-04-24 18:10:57 UTC
What is in your smb.conf ? Are you storing DOS
attributes in extended EA's ?

As far as I can see in the code the offending
open_file_fchmod() can only get called if
'store dos attributes = no' (i.e. smbd is
trying to store DOS attrs in the filesystem
mode bits, not an EA).

Jeremy.
Comment 3 Tom Schulz 2014-04-25 14:12:32 UTC
Created attachment 9872 [details]
log.smbd

There are 3 log files. I am not sure what to send, so I will send all three.
Ignore anything about printers. I copied the smb.conf file form another
machine that did have printers set up and did not edit the smb.conf file
to remove those lines.
Comment 4 Tom Schulz 2014-04-25 14:14:55 UTC
Created attachment 9873 [details]
log.fangtooth

The next log file.
Comment 5 Tom Schulz 2014-04-25 14:18:45 UTC
Created attachment 9874 [details]
log.192.168.2.226

The last log file.

I meant to mention earlier that the errors that I am getting look a lot like
those mentioned in Bug 10537.
Comment 6 Tom Schulz 2014-04-25 14:34:36 UTC
Now the smb.conf file and the file it includes. There is a netbios aliases
line that should not be there. I don't think that that has any effect other
than to fill up the log file with junk.

[global]
max protocol = NT1
   workgroup = adi
   server string =
   security = domain
   load printers = yes
   printcap name = /etc/printers.samba
   printing = sysv
  guest account = nobody2
   log file = /opt/local/samba4/var/logs/%h/log.%m
   max log size = 150
   password server = starfish2
   passdb backend = tdbsam
   socket options = TCP_NODELAY 
   dns proxy = no 
dos filemode = yes
delete readonly = yes
name resolve order = bcast host
host msdfs = yes
msdfs root = yes
unix extensions = no
wide links = yes
include = /opt/local/samba4/etc/smb.conf.%hnetbios aliases = crayfish

******** The included file *********

[homes]
   comment = Home Directories
   browseable = no
   writable = yes
   inherit permissions = yes

[foom]
   comment = Foom Project Directories
   path = /home/foom
   browseable = yes
   writable = yes
   inherit permissions = yes
Comment 7 Tom Schulz 2014-04-25 14:46:37 UTC
The wireshark will take awhile. I am not familiar with it.

According to the smb.conf man page, store dos attributes defaults to no.
It looks like my smb.conf file does not include a setting for that.
Comment 8 Tom Schulz 2014-04-25 19:08:22 UTC
Created attachment 9876 [details]
The wireshark file

This is the wireshark file. Hopefully it contains only packets between
the two machines involved. But just in case, the Samba server is running
on 192.168.2.141 and the PC running Excel is 192.168.2.226.

This contains both the first failed save and the second successful save.
Comment 9 Jeremy Allison 2014-04-25 19:10:51 UTC
Ok, thanks for the details. I think I know how to fix this - give me a little while to prepare a patch.

Cheers,

Jeremy.
Comment 10 Jeremy Allison 2014-04-25 22:50:36 UTC
Can you upload the wireshark file here, not a link to it. The link doesn't resolve for me :-(.
Comment 11 Jeremy Allison 2014-04-25 22:51:32 UTC
Also, there a no panic crashes in any of these logs. Are you sure you got the correct logs covering the problem case ?
Comment 12 Jeremy Allison 2014-04-25 23:40:16 UTC
Created attachment 9878 [details]
Patch that should get you up and running again..

Not sure about this for master, still looking at the options.

Jeremy.
Comment 13 Jeremy Allison 2014-04-25 23:52:16 UTC
*** Bug 10537 has been marked as a duplicate of this bug. ***
Comment 14 Tom Schulz 2014-04-25 23:58:17 UTC
I tried to upload the wireshark file but it  would not let me. It said that it was too big. Try
going to ftp://ftp.adi.com/samba. You should find it there.
Comment 15 Jeremy Allison 2014-04-26 00:13:57 UTC
When I click on the "wireshark" link from that url, it refuses to download on both firefox and chrome. Never mind - please try the attached patch. It should fix your issue I think.

Jeremy.
Comment 16 Tom Schulz 2014-04-26 00:35:46 UTC
I will try the patch. You could try a command line ftp from a linux machine for that file.
The command line would be 'ftp ftp.adi.com' You would use ftp for a user name and
anything you want for a password. You  would then cd to samba, type binary to specify
a binary transfer and then type get wireshark.
Comment 17 Tom Schulz 2014-04-26 00:47:54 UTC
Odd about the log files not showing the crash. The /var/log/messages file (the linux system log) shows  the crash at Apr 25 14:26:34. The other day, when running without debug, the Samba logs and  the system log both showed the crash and the dump.
Comment 18 Tom Schulz 2014-04-26 01:04:49 UTC
Well that ftp sure does not work! I moved it to another directory. Try just ftp://ftp.adi.com from a browser or 'ftp ftp.adi.com' from a command line. From a command line then cd pub and then get wireshark.

After  all  the trouble I went to the get  that trace, you should definitely get it!! :)
Comment 19 Tom Schulz 2014-04-28 23:18:32 UTC
The patch works. I rebuilt Samba 4.1.6 with the patch and Excel now saves the file correctly. The owner changes as expected, the ACLs remain on the file as expected and the temp files are removed.
Comment 20 Jeremy Allison 2014-04-28 23:28:25 UTC
Thanks for the test. I don't think this is appropriate for master because of the chmod race condition introduced. However, I'm working on an alternate fix with Volker that I might ask you to test in a day or so if you're willing.

Thanks !

Jeremy.
Comment 21 Tom Schulz 2014-04-29 01:42:16 UTC
I  will test your new patch when you have it. I have verified that the wireshark file can now be downloaded with firefox from the pub directory of ftp.adi.com. Let me know if you either have it or are not going to bother.
Comment 22 Jeremy Allison 2014-05-01 18:56:40 UTC
Created attachment 9885 [details]
git-am fix for 4.1.next.

Ok, Tom here is the "official" patch I'd like to put in 4.1.next.

Can you test this to make sure it fixes your problem ? It does under my tests here, but it would be good to get you to confirm.

Thanks a *lot* for your help on this !

Jeremy.
Comment 23 Tom Schulz 2014-05-02 14:23:11 UTC
Tested against 4.1.6 on Solaris 10 and it works.For completeness I will soon try with 4.1.7 and on Linux.
Comment 24 Jeremy Allison 2014-05-02 22:23:35 UTC
Created attachment 9895 [details]
git-am fix for 4.1.next

Backport of master fix for 4.1.next.
Comment 25 Jeremy Allison 2014-05-02 22:26:46 UTC
Created attachment 9896 [details]
git-am fix for 4.0.next

Backported from the fix that went into master.
Comment 26 Tom Schulz 2014-05-03 04:40:39 UTC
Tested with 4.1.7 on Linux, with both a NFS mounted disk and a local disk, and it works.

This does not matter any more, but I think I know what happened with the debug logs. I have a  maximum log size set in the smb.conf file and the debug log went over that size. If I ever try to do a debug level 10 log again, I will have to remember to change the max log size first.
Comment 27 Jeremy Allison 2014-05-05 20:08:56 UTC
*** Bug 10591 has been marked as a duplicate of this bug. ***
Comment 28 Jeremy Allison 2014-05-06 16:39:15 UTC
Re-assigning to Karolin for inclusion in 4.0.next, 4.1.next.

Jeremy.
Comment 29 Karolin Seeger 2014-05-19 10:20:37 UTC
(In reply to comment #28)
> Re-assigning to Karolin for inclusion in 4.0.next, 4.1.next.
> 
> Jeremy.

Pushed to autobuild-v4-[0|1]-test.
Comment 30 Karolin Seeger 2014-05-20 09:52:16 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > Re-assigning to Karolin for inclusion in 4.0.next, 4.1.next.
> > 
> > Jeremy.
> 
> Pushed to autobuild-v4-[0|1]-test.

Pushed to both branches.
Closing out bug report.

Thanks!