Bug 11390 - Timeout when writing a file which is currently read by a NFS 4 client
Timeout when writing a file which is currently read by a NFS 4 client
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
4.1.17
x64 Linux
: P5 normal
: ---
Assigned To: Samba QA Contact
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-08 07:56 UTC by Olivier Monaco
Modified: 2015-07-17 07:03 UTC (History)
1 user (show)

See Also:


Attachments
Strace output (82.55 KB, text/plain)
2015-07-17 06:27 UTC, Olivier Monaco
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Olivier Monaco 2015-07-08 07:56:43 UTC
Hello,

I'm running Debian Wheezy (x64) on a couple of servers and Windows 7 (x64) on workstations.

One server (server A) is a NAS server running Samba 4.1.17 (from Debian Backports) and NFS 1.2.6 (from Debian). It share the "home" directories to other servers throught NFS and to workstations througt Samba.

Samba share configuration is:

[homes]
   comment = Home Directories
   valid users = %S
   # Read/Write/Delete
   browseable = no
   read only = no
   writable = yes
   create mask = 0664
   directory mask = 0775
   delete readonly = yes
   # Locking
   locking = no
   oplocks = yes
   # Attributes/ACL
   inherit permissions = yes
   map acl inherit = yes
   nt acl support = yes
   store dos attributes = yes
   inherit owner = yes
   map hidden = no
   map system = no
   map archive = no

NFS share configuration is:

/volumes         *(rw,async,fsid=0,no_subtree_check,no_root_squash,crossmnt)

All user files are in /volumes/users. Home directory are /volumes/users/<user>.

On a workstation running Windows 7, I mount the user "home" directory. I open a text editor to create/edit a text file.

On server B, I mount the NFS share with options: "_netdev,rw,noatime,nodiratime,vers=4,hard,intr,timeo=5,actimeo=5,retrans=2,bg,acl". So, I use version 4 of NFS.

On server B, I run a "cat" on the text file edited on the workstation in a loop so I'm "always" reading the file throught NFS. On the workstation, I save the file so Samba issue a write to the file. The text editor freeze, the smbd process associated to the workstation exits some 1 or 2 minutes later, another smbd process start a little after and the file is really saved 4 or 5 minutes after the command was issued. The editor comes back to live once the save is done.

No relevant info in log file using log level 2.
Editing file using vi on server A: no problem.
Switching to NFS 3: no problem.

We recently upgraded from Samba 3 to Samba 4 and NFS 3 to NFS 4.

I spent 2 months to track this issue. It's now easy to reproduce but really hard to find what's going wrong...
Comment 1 Volker Lendecke 2015-07-08 10:25:13 UTC
Is this really a crash, as the subject line of the bug clearly states?
Comment 2 Olivier Monaco 2015-07-08 15:52:11 UTC
I have no clear way to answer. But the smbd process are not the same before and after the problem.

Here is some lines of log:

[2015/07/07 15:20:04.157575,  2, pid=23609, effective(21XXX, 21XXX), real(21XXX, 0)]   olivier opened file xxxxxx read=Yes write=No (numopen=1)
[2015/07/07 15:20:14.258504,  2, pid=23609, effective(21XXX, 21XXX) real(21XXX, 0)]   unix_mode(xxxxxx) inheriting from yyyyyy
[2015/07/07 15:20:14.258564,  2, pid=23609, effective(21XXX, 21XXX), real(21XXX, 0)]   unix_mode(xxxxxx) inherit mode 40775
[2015/07/07 15:20:15.934795,  2, pid=23609, effective(21XXX, 21XXX), real(21XXX, 0)]   olivier closed file xxxxxx (numopen=0) NT_STATUS_OK
[2015/07/07 15:24:21.939797,  1, pid=23609, effective(0, 0), real(0, 0)]   bs-p007 (ipv4:10.48.X.Y:52753) closed connection to service olivier
[2015/07/07 15:24:22.058982,  2, pid=24667, effective(0, 0), real(0, 0)]   bs-p007 (ipv4:10.48.X.Y:53366) connect to service olivier initially as user olivier (uid=21XXX, gid=21XXX) (pid 24667)
[2015/07/07 15:24:22.060956,  2, pid=24667, effective(21XXX, 21XXX), real(21XXX, 0)]   unix_mode(xxxxxx) inheriting from yyyyyy
[2015/07/07 15:24:22.060994,  2, pid=24667, effective(21XXX, 21XXX), real(21XXX, 0)]   unix_mode(xxxxxx) inherit mode 40775
[2015/07/07 15:24:22.061314,  2, pid=24667, effective(21XXX, 21XXX), real(21XXX, 0)]   olivier opened file xxxxxx read=No write=Yes (numopen=1)
[2015/07/07 15:24:22.080432,  2, pid=24667, effective(21XXX, 21XXX), real(21XXX, 0)]   olivier closed file xxxxxx (numopen=0) NT_STATUS_OK

At 15:20:15.934795 the editor freezes. At 15:24:21.939797 the editor is back. The write is done by the second process.
Comment 3 Olivier Monaco 2015-07-08 17:13:57 UTC
I've tried on another server and no problem. I need to compare each configuration.
Comment 4 Olivier Monaco 2015-07-16 15:41:22 UTC
I now have a minimal use case with two new virtual machines installed as Debian Wheezy and with the same problem and a clean way to reproduce.
Comment 5 Volker Lendecke 2015-07-16 15:43:44 UTC
(In reply to Olivier Monaco from comment #4)
Can you attach to the smbd that stalls with strace -ttT?
Comment 6 Olivier Monaco 2015-07-17 06:27:04 UTC
Created attachment 11266 [details]
Strace output
Comment 7 Olivier Monaco 2015-07-17 06:33:35 UTC
I replaced the use of a Windows workstation by the use of smbclient. So now:
- server A has samba and nfs shares and is using smbclient to access the samba share.
- server B has the nfs mount.

My use case:
1) Read a file in loop on server B (while true; do cat a.txt; done)
2) Run smbclient from server A to open share of server A
3) Download the a.txt file
4) Upload the a.txt file

Smbclient takes around 30 seconds and end with "NT_STATUS_IO_TIMEOUT opening remote file \a.txt"

Log of samba:
...
[2015/07/17 08:24:40.235204,  2, pid=11322, effective(21136, 21176), real(21136, 0)]   unix_mode(a.txt) inheriting from .
[2015/07/17 08:24:40.236024,  2, pid=11322, effective(21136, 21176), real(21136, 0)]   unix_mode(a.txt) inherit mode 40775
[2015/07/17 08:25:16.284652,  2, pid=11322, effective(21136, 21176), real(21136, 0)]   unix_mode(a.txt) inheriting from .
[2015/07/17 08:25:16.285541,  2, pid=11322, effective(21136, 21176), real(21136, 0)]   unix_mode(a.txt) inherit mode 40775
[2015/07/17 08:25:16.290464,  2, pid=11322, effective(21136, 21176), real(21136, 0)]   olivier opened file a.txt read=Yes write=No (numopen=1)
[2015/07/17 08:25:16.293384,  2, pid=11322, effective(21136, 21176), real(21136, 0)]   olivier closed file a.txt (numopen=0) NT_STATUS_OK
...

Strace output attached.

So smbd does not crash. It's a timeout. When using a windows box, there may be some "connection reset" from Windows that stop the smbd process and start a new one.
Comment 8 Olivier Monaco 2015-07-17 07:03:50 UTC
Problems seems to be linked to the use "mount -o bind" and not NFS.

On server A, I have a folder named /volumes/data/test with files to share. I use "mount -o bind" to mount /volumes/data/test to /shares/test and then share this folder.

When I remove the "mount -o bind", no problem. If I run the "while..." on server A, no problem. If I write to the same file on server A (without samba), no problem.

In samba, sharing /volumes/data/test (the original folder) or /shares/test (the "bind") give the same timeout.