Bug 4763 - TCP Reset from XP client while copiyng data from two clients.
Summary: TCP Reset from XP client while copiyng data from two clients.
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.25b
Hardware: Other Linux
: P3 normal
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
: 4796 (view as bug list)
Depends on:
Reported: 2007-07-03 21:26 UTC by Serge Pashenkov (mail address dead)
Modified: 2008-05-09 08:40 UTC (History)
2 users (show)

See Also:

Patch (833 bytes, patch)
2007-07-05 13:27 UTC, Jeremy Allison
no flags Details
Second patch (1.12 KB, patch)
2007-07-06 16:46 UTC, Jeremy Allison
no flags Details
Correct (I think) patch (4.04 KB, patch)
2007-07-17 18:29 UTC, Jeremy Allison
no flags Details
Replacement patch. One line was deleted by accident. (3.86 KB, patch)
2007-07-17 18:59 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Serge Pashenkov (mail address dead) 2007-07-03 21:26:41 UTC
Test is copy a file set from two XP machines into two separate subdirectories on the same share. After minutes (1 - 20 min in our case) one of the clients gets disconnected.

Looking in ethereal trace is appears change notify responce is strange, and right after that XP reset the connection. is XP client, is samba server.

Log file and ethereal traces are in:
Logon: softeng@powerfile.com
Pass: SoftDev

client.pcap, packet #10416 is the reset in question. This is ethereakl trace as seen on the XP client with the problem. Packet # 10414 is the notify responce that looks suspicious to me, note several times the same file name.

logxpprosp2wan2 is log.smbd file with debug=10 for the same transfer. Look for connection reset by peer at [2007/07/03 10:44:33, 0].

server.pcap1 ethereal trace of the same transfer taken from the samba server side. RST is packet # 27444. I cannot identify the Notify responce.
Comment 1 Jeremy Allison 2007-07-04 01:01:27 UTC
Not sure if you're working on July 4th, but just in case and it's urgent for you :-). I see the problem, the reply size is 18k - almost certainly too big for the client buffer. Can you try changing the following code in smbd/notify.c :

342         /*
343          * Someone has triggered a notify previously, queue the change for
344          * later.
345          */
347         if ((fsp->notify->num_changes > 1000) || (name == NULL)) {
348                 /*
349                  * The real number depends on the client buf, just provide a
350                  * guard against a DoS here.
351                  */
352                 TALLOC_FREE(fsp->notify->changes);
353                 fsp->notify->num_changes = -1;
354                 return;
355         }

Change the line that says :

if ((fsp->notify->num_changes > 1000)

to something like :

if ((fsp->notify->num_changes > 100)

just as a test and see if you can reproduce the bug. I don't think that's the correct ultimate fix, but it may get you around this issue temporarily in case it's a customer critical situation.

Let me know, I'm still looking at the logs + trace.

Comment 2 Jeremy Allison 2007-07-05 13:27:28 UTC
Created attachment 2799 [details]

This is the first part of the fix. Limit responses to what the client told us it can accept.
Comment 3 Jeremy Allison 2007-07-06 16:46:24 UTC
Created attachment 2800 [details]
Second patch

This should coalesce identical adjacent notify records - making the "too large" bug very rare indeed. Please test.
Comment 4 Serge Pashenkov (mail address dead) 2007-07-11 13:46:32 UTC
We finally completed the test in our environment, it ran to completion with no error. So I guess this should be considered fixed.

Comment 5 Jeremy Allison 2007-07-17 18:29:32 UTC
Created attachment 2824 [details]
Correct (I think) patch

Start with a clean 3.0.25b and replace these two patches with this one.
Comment 6 Jeremy Allison 2007-07-17 18:31:19 UTC
*** Bug 4796 has been marked as a duplicate of this bug. ***
Comment 7 Jeremy Allison 2007-07-17 18:59:29 UTC
Created attachment 2826 [details]
Replacement patch. One line was deleted by accident.

One line was deleted by accident. This should be the correct fix.
Comment 8 Leonard Kroll 2008-04-30 15:38:45 UTC
I am still encountering this problem (leonard.kroll@umb.edu)
I am performing a copy from one share to a 2nd share and get this error 1 out of 3 times. The copy is made up of a collection of large and small files for a total copy size of 8GB.
Comment 9 Syed Amer Gilani 2008-05-09 08:40:00 UTC
We seem to have the same Problem.
Specially when working with large Database Files but also with small files Windows displays a small pop up with the Message "Delayed Write Failed". 

Samba 3.0.28, Kernel 2.6.24 on Gentoo
Samba is acting as Domain Controller for ~30 Workstations.
The Workstations are all Windows XP 32bit sp2

This seems also to happen with cifs from another Linux box:
May  9 14:33:03 CIFS VFS: server not responding
May  9 14:33:03 CIFS VFS: No response to cmd 47 mid 61863
May  9 14:33:03 CIFS VFS: Write2 ret -11, wrote 0
May  9 14:33:06 CIFS VFS: No response to cmd 47 mid 39910
May  9 14:33:06 CIFS VFS: Write2 ret -11, wrote 0
May  9 14:33:06 CIFS VFS: Write2 ret -9, wrote 0
May  9 14:33:06 CIFS VFS: Write2 ret -11, wrote 0

This may be the same problem as in bug #3927