Test is copy a file set from two XP machines into two separate subdirectories on the same share. After minutes (1 - 20 min in our case) one of the clients gets disconnected. Looking in ethereal trace is appears change notify responce is strange, and right after that XP reset the connection. 192.168.192.14 is XP client, 192.168.168.20 is samba server. Log file and ethereal traces are in: ftp://208.69.180.86/samba Logon: softeng@powerfile.com Pass: SoftDev client.pcap, packet #10416 is the reset in question. This is ethereakl trace as seen on the XP client with the problem. Packet # 10414 is the notify responce that looks suspicious to me, note several times the same file name. logxpprosp2wan2 is log.smbd file with debug=10 for the same transfer. Look for connection reset by peer at [2007/07/03 10:44:33, 0]. server.pcap1 ethereal trace of the same transfer taken from the samba server side. RST is packet # 27444. I cannot identify the Notify responce.
Not sure if you're working on July 4th, but just in case and it's urgent for you :-). I see the problem, the reply size is 18k - almost certainly too big for the client buffer. Can you try changing the following code in smbd/notify.c : 342 /* 343 * Someone has triggered a notify previously, queue the change for 344 * later. 345 */ 346 347 if ((fsp->notify->num_changes > 1000) || (name == NULL)) { 348 /* 349 * The real number depends on the client buf, just provide a 350 * guard against a DoS here. 351 */ 352 TALLOC_FREE(fsp->notify->changes); 353 fsp->notify->num_changes = -1; 354 return; 355 } Change the line that says : if ((fsp->notify->num_changes > 1000) to something like : if ((fsp->notify->num_changes > 100) just as a test and see if you can reproduce the bug. I don't think that's the correct ultimate fix, but it may get you around this issue temporarily in case it's a customer critical situation. Let me know, I'm still looking at the logs + trace. Thanks, Jeremy.
Created attachment 2799 [details] Patch This is the first part of the fix. Limit responses to what the client told us it can accept.
Created attachment 2800 [details] Second patch This should coalesce identical adjacent notify records - making the "too large" bug very rare indeed. Please test. Jeremy.
We finally completed the test in our environment, it ran to completion with no error. So I guess this should be considered fixed. Thanks.
Created attachment 2824 [details] Correct (I think) patch Start with a clean 3.0.25b and replace these two patches with this one. Jeremy.
*** Bug 4796 has been marked as a duplicate of this bug. ***
Created attachment 2826 [details] Replacement patch. One line was deleted by accident. One line was deleted by accident. This should be the correct fix. Jeremy.
I am still encountering this problem (leonard.kroll@umb.edu) I am performing a copy from one share to a 2nd share and get this error 1 out of 3 times. The copy is made up of a collection of large and small files for a total copy size of 8GB.
We seem to have the same Problem. Specially when working with large Database Files but also with small files Windows displays a small pop up with the Message "Delayed Write Failed". Samba 3.0.28, Kernel 2.6.24 on Gentoo Samba is acting as Domain Controller for ~30 Workstations. The Workstations are all Windows XP 32bit sp2 This seems also to happen with cifs from another Linux box: May 9 14:33:03 CIFS VFS: server not responding May 9 14:33:03 CIFS VFS: No response to cmd 47 mid 61863 May 9 14:33:03 CIFS VFS: Write2 ret -11, wrote 0 May 9 14:33:06 CIFS VFS: No response to cmd 47 mid 39910 May 9 14:33:06 CIFS VFS: Write2 ret -11, wrote 0 May 9 14:33:06 CIFS VFS: Write2 ret -9, wrote 0 May 9 14:33:06 CIFS VFS: Write2 ret -11, wrote 0 This may be the same problem as in bug #3927