Created attachment 8150 [details] smbd.log I've been running Samba 4 since rc2 and I've been encountering lockups and CIFS timeouts on Linux clients ever since. These timeouts seem to happen only when accessing multiple files at once. Windows clients are fine, btw. Downgrading to Samba 3.6.8 on the server-side solves this issue, so this must be a regression. Steps to reproduce: 1. Set up test share on Samba server 2. Put some FLAC files in test share (this is the easiest way to reproduce) 3. Mount test share on Linux client 4. On client run `metaflac --add-replay-gain *.flac` Result: All mounted CIFS shares from this server time out somehow. The metaflac process freezes. This is what dmesg shows on the client after a while: > CIFS VFS: Server horst has not responded in 120 seconds. Reconnecting... > CIFS VFS: Server horst has not responded in 120 seconds. Reconnecting... > CIFS VFS: Error -32 sending data on socket to server I have attached the smbd.log file. When the timeouts happen on the client-side I always get this message on the server: > ../source3/smbd/server.c:436(remove_child_pid) > Could not find child 20541 -- ignoring Setup: - Samba 4.0-rc4 on Linux (tried both 3.6 and 3.7-master) - Samba acts as a standalone AD DC - Linux clients: tried kernel 3.6 and 3.7-master - Linux clients: tried with SMB2 and without SMB2 kernel option - Linux clients: cifs-utils 5.6
Also applies to rc5
We need logs with "log level = 10" "debug hires timestamps = yes" "debug pid = yes" and network captures. Both for 3.6 and 4.0. Please also upload your smb.conf See also https://wiki.samba.org/index.php/Client_specific_Log https://wiki.samba.org/index.php/Capture_Packets
After debugging this a bit more I found out a few things: * This is definitely NOT related to network/NIC issues, as the problem even occurs when mounting a share directly on the server (through //127.0.0.1/TestShare). * After provisioning Samba4 with --use-ntvfs the issue disappears. Since I was able to solve the issue with "--use-ntvfs" I can now no longer provide any debugging information. If none of the Samba devs have time to look into the remove_child_pid-error I was getting you may close this bug. As far as I understand the NTVFS mode is being used in future versions of Samba4 anyway, so I guess I'll be fine then.
(In reply to comment #3) > After debugging this a bit more I found out a few things: > > * This is definitely NOT related to network/NIC issues, as the problem even > occurs when mounting a share directly on the server (through > //127.0.0.1/TestShare). > * After provisioning Samba4 with --use-ntvfs the issue disappears. > > Since I was able to solve the issue with "--use-ntvfs" I can now no longer > provide any debugging information. If none of the Samba devs have time to look > into the remove_child_pid-error I was getting you may close this bug. As far as > I understand the NTVFS mode is being used in future versions of Samba4 anyway, > so I guess I'll be fine then. No it won't, we'll keep the current default and use the 'smbd' file server.
Created attachment 8245 [details] Log/config from samba4 with s3fs Sorry, my last comment was based on wrong information. While ntvfs works for me, I figured that s3fs really is the way to go. Since I am also interested in getting this thing sorted out for s3fs I added the debug parameters to the smb.conf file and performed the following steps directly on the Samba server for (Samba3|Samba4-ntvfs|Samba4-s3fs): 1. mount -t cifs //127.0.0.1/TestShare /mnt/smb -o username=Administrator 2. metaflac --add-replay-gain /mnt/smb/cdparanoia/*.flac The 2nd step works great with Samba3 and Samba4-ntvfs, but causes timeouts on Samba4-s3fs. I hope you'll be able to fix the bug with this information.
Created attachment 8246 [details] Log/config from samba4 with ntvfs
Created attachment 8247 [details] Log/config from samba3
Sorry for the noise, but I made an interesting discovery: As soon as I disable UNIX extensions (either in smb.conf or as a mount option for mount.cifs) the timeouts are no longer an issue. BTW: Only using "nobrl,noposixpaths,noacl" as mount options is not sufficient to work around the issue - one definitely has to disable UNIX extensions completely. Is there any further information I could provide? Do you still need network captures (maybe with and without UNIX extensions enabled)?
(In reply to comment #8) > Sorry for the noise, but I made an interesting discovery: > > As soon as I disable UNIX extensions (either in smb.conf or as a mount option > for mount.cifs) the timeouts are no longer an issue. BTW: Only using > "nobrl,noposixpaths,noacl" as mount options is not sufficient to work around > the issue - one definitely has to disable UNIX extensions completely. > > Is there any further information I could provide? Do you still need network > captures (maybe with and without UNIX extensions enabled)? Ok, thanks for debugging this! I think network captures would be really good together with level 10 logs. 3.6 as server (with and without unix extension) 4.0 as server (with and without unix extension)
When you mount with unix extensions, it's likely that the client is negotiating a larger rsize and wsize with the server. It's possible that this bug is a duplicate of this one: https://bugzilla.samba.org/show_bug.cgi?id=9422 You may want to make sure your samba binaries have the patch for that bug and retest before going to great lengths to debug this.
(In reply to comment #10) > You may want to make sure your samba binaries have the patch for that bug and > retest before going to great lengths to debug this. Thanks a lot for this hint. I've applied this patch to rc5: > [PATCH] Fix Bug 9422 - large read requests cause server to issue malformed reply The patch fixes the nasty timeouts and Samba4 now works as expected with UNIX extensions enabled.
*** This bug has been marked as a duplicate of bug 9422 ***