The Samba-Bugzilla – Bug 5138
Samba needs throttle capability for DOS Clients
Last modified: 2015-05-19 08:31:31 UTC
We are testing a new samba server for use as a repository of disk images.
We noticed that when ghosting to the server, the DOS client freezes after a while and then disconnects from the server (as indicated by the smbd log).
If we slow down the physical network between client and server by replacing the Gigabit server switch with a 100Mb switch, the problem does NOT happen.
As we increase the samba log level, the file transfer rate slows down (as one would expect) and the problem is less likely to occur. The problem seems to disappear totally at Log Level 10 - please do not suggest that this is a solution to my problem :)
We happen to have *identical* server hardware running MS NT Server 4.0 SP6 which we always find useful for doing comparisons with Samba. The NT box does *not seem* to have this problem but since it transfers files at about 1/4 the speed of Samba, we might never know!!!
The problem also happens with different network cards so it is *not* due to a buggy NDIS driver.
We also tried doing a straight copy of large files using the DOS COPY command and the problem happened again, so it is *not* a Norton Ghost issue.
The problem does *not* happen with Windows clients, perhaps because their TCP/IP stack is more sophisticated.
It seems that there is one of two possibilities:
1. There is a genuine Samba bug that only manifests itself with DOS clients at high sustained transfer rates
2. It is an MS Client bug that only happens with Samba because Samba is so wonderfully fast!!!
I have tried various socket options and played with DOS and LM settings in smb.conf but problem keeps happening. Assuming we never find the real cause of this bug, what I would like is a "throttle" setting for samba that imposes an absolute upper limit on file transfer rate that would not need to be tuned for different hardware and different network topographies.
BTW, I am very familiar with the many foibles of the MS DOS Client and would be reluctant to cast any blame on samba because even recent versions of MS Windows server require tricks to work reliably with DOS clients.
Can you try "reset on zero vc = yes" please?
(In reply to comment #1)
> Can you try "reset on zero vc = yes" please?
Sorry, Volker, "reset on zero vc" did not work..but it did get me excited as I never noticed this option before :)
My take on this is that the DOS Client stack is old and unsophisticated and no longer maintained by Microsoft. Yet it is still in widespread use in network imaging and restore scenarios. I get the feeling that this old stack is more likely to break as CPUs get faster and network transfer rates increase. This is why I believe a throttle feature in Samba would be very useful.
I just spotted the following post (relating to a windows client)
To make sure this bug report relates to DOS clients ony, I have just performed a bulk copy from a Windows XP SP2 client running on the same hardware (fast gigabit path between server and client). I dragged 24 x 650MB files from the local c: drive to my samba image share in about 12 minutes without a hitch. I also checked the Windows event viewer just to make sure nothing untoward happended. My DOS client would have failed well before the first file had been copied.
So my assertion that this problem relates to DOS clients only still stands.
I might also note that yesterday I performed a successful image of another slightly slower machine on the same Samba server with no problem.
DOS client anomalies such as this are becoming more common. They may be chipset related, BIOS dependent, SATA interrupt issues, NDIS driver bugs, etc etc. The list goes on and on. Yet we are more dependent than ever on DOS boot disks for performing images and restores. Samba is an ideal solution for an Image repository/server given that Microsoft is pretty unsympathetic to folks using DOS clients and disk imaging.
Perhaps this bug report should be changed to a feature request rather than a bug report.
A samba throttle feature...please!
Have you tried setting "use sendfile = no" ? This will slow down any client reads.
I don't think throttling will help. Try "oplocks = no" please. At a customer site I had these dos clients acquire an oplock for a file and then reboot hard. While in the BIOS, the oplock break request came in but could not be replied to -- no IP address yet. The conflicting open then timed out. "oplocks = no" might help here.
Created attachment 3039 [details]
smbd log (tail)
Created attachment 3040 [details]
nmbd log (tail)
Disabling oplocks and "use sendfile = no" also do nothing for me. I am careful about issuing a DOS net stop command at end of image process to close connection so hard reboots on DOS side are not the problem.
But you are correct, Volker, about throttle not solving my problem because I re-ran my log level = 10 test on a direct gigabit-to-gigabit connection and I finally hit payload. See attached logs and smb.conf.
Here is the pertinent section of smbd log (I have inserted comments with ***)
[2007/12/12 23:00:11, 8] lib/util.c:fcntl_lock(1820)
fcntl_lock 19 12 135178240 32768 1
[2007/12/12 23:00:11, 8] locking/posix.c:posix_fcntl_lock(689)
posix_fcntl_lock: Lock call failed
[2007/12/12 23:00:11, 10] locking/locking.c:is_locked(121)
is_locked: posix start=135178240 len=32768 unlocked for file opx/D1000004.GHS
[2007/12/12 23:00:11, 10] smbd/fileio.c:real_write_file(137)
real_write_file (opx/D1000004.GHS): pos = 135178240, size = 16585, returned 16585
[2007/12/12 23:00:11, 3] smbd/reply.c:reply_writebraw(2756)
writebraw1 fnum=7864 start=135178240 num=16585 wrote=16585 sync=0
[2007/12/12 23:00:11, 5] lib/util.c:show_msg(454)
**** GHOST FREEZES (STOPS WRITING TO SERVER) AT THIS POINT *****
[2007/12/12 23:00:11, 5] lib/util.c:show_msg(464)
smb_vwv[ 0]=65535 (0xFFFF)
[2007/12/12 23:00:56, 5] lib/util_sock.c:read_socket_with_timeout(480)
read_socket_with_timeout: timeout read. EOF from client.
[2007/12/12 23:00:56, 3] smbd/sec_ctx.c:set_sec_ctx(288)
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
[2007/12/12 23:00:56, 5] auth/auth_util.c:debug_nt_user_token(433)
NT user token: (NULL)
[2007/12/12 23:00:56, 5] auth/auth_util.c:debug_unix_user_token(454)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
[2007/12/12 23:00:56, 5] smbd/uid.c:change_to_root_user(324)
change_to_root_user: now uid=(0,0) gid=(0,0)
[2007/12/12 23:00:56, 2] smbd/server.c:exit_server(614)
**** GHOST ABORTS WITH WRITE FAIL *AROUND* THIS POINT *****
[2007/12/12 23:00:56, 10] locking/locking.c:parse_share_modes(442)
parse_share_modes: delete_on_close: 0, num_share_modes: 1
[2007/12/12 23:00:56, 10] locking/locking.c:parse_share_modes(488)
parse_share_modes: share_mode_entry: pid = 4320, share_access = 0x3, private_options = 0x0, access_mask = 0x12019f, mid = 0x0, type= 0x0, file_id = 9, dev = 0x801, inode = 41156609
[2007/12/12 23:00:56, 3] smbd/sec_ctx.c:push_sec_ctx(256)
push_sec_ctx(0, 0) : sec_ctx_stack_ndx = 1
[2007/12/12 23:00:56, 3] smbd/uid.c:push_conn_ctx(393)
Created attachment 3041 [details]
Created attachment 3042 [details]
Relevant part of smbd log
At this stage, I have managed to ascertain that the problem arises with at least two types of network card (Broadcom B44 and RTL8169) on two different PC hardware platforms BUT DOES NOT happen with an on-board Intel E1000. I am using the most up-to-date NDIS drivers in all cases.
It's possible that many NDIS drivers are designed from readymade templates and NDIS driver testing is at best perfunctory. Perhaps Intel is that bit more proactive about stress testing with DOS clients or maybe I am just lucky e.g. in the choice of interrupt.
Volker, could you please verify from the attached log segment that Samba is behaving itself. It seems to me, from a very naive reading of the log, that the misdemeanor is perpetrated by the client i.e. a sudden timeout or EOF possibly due to a driver bug...but maybe you will spot something more obvious.
BTW, the problem seems very random but is more likely as network path gets faster.
Nothing obvious around.
[2007/12/12 23:00:56, 5] lib/util_sock.c:read_socket_with_timeout(480)
read_socket_with_timeout: timeout read. EOF from client.
indicates that the client has disconnected without telling us why. Before that, no unusual error message around. The fcntl call failed msgs should not affect what the client sees.
While the smbd log seems to indicate that nothing is amiss on the Samba side, this irritating "bug" has now become the difference between using Samba or Windows as an Image Repository for access by DOS-based network boot disks! If we can get to the bottom of this, I'm sure all those imaging heads out there looking for a Samba solution will be most pleased.
I decided to do what I should probably have done at the very start. I have removed the hard-drives (a separate drive is used for the image share) from my Ubuntu/Samba Image Server and replaced with identical hard drives on which I have installed a clean copy of Windows XP, which has identically configured user accounts, workgroup, computer name, share and permissions.
In other words, Windows XP is now running on the same hardware to my Ubuntu/Samba installation and I can switch back and forth between the two OSes by simply swapping out the drives.
I have just performed a straight image push to the XP "server" flawlessly with the same boot disk I originally used with Ubuntu/Samba. I note that the image transfer rate was marginally faster than Ubuntu/Samba. This is a good thing from the point of view of the test as a slower rate might have reduced the likelyhood of the problem ocurring, as we have seen before with NT. (BTW, my waxing lyrical about Samba's performance vs NT was probably unfair on NT whose ancient drivers are unable to leverage the performance of modern hardware).
It is pretty obvious to me now that the problem is due to some subtle difference(s) between my Ubuntu/Samba installation and Windows XP. The problem may very well be on the Ubuntu side (Kernel? EXT3 file system? IP stack? E1000 driver?) although I did not notice anything unusual in the syslog and no errors show up in ifconfig.
The obvious next step is to try a Samba installation on a completely different distro e.g. CentOS(RedHat) 5. Before I embark on this, it occurred to me that it would be nice if Samba provided ready-made configuration templates and OS tweaks to emulate the different flavours of Windows as closely as possible. The solution could turn out to be as simple as a tunable TCP/IP parameter (although I have played around with all the socket options in smb.conf).
While we often see a lot of "fine-grained" advice such as "reset on zero vc" (emulating Windows 2003 behaviour), etc, something more global in scope would be preferable for those of us who just want to use Samba as a "drop-in" replacement for windows server with a minimum of fuss.
Wow, detailed analysis!
If it is that repeatable, can you get us sniffs of both XP and the faulty Samba server?
You might want to install wireshark on the XP server, under Linux please do a
tcpdump -i eth0 -n -s 1500 -w /tmp/sniff
and upload /tmp/sniff.
If it's huge, please send it to me directly. My mailbox can stand a couple of MB.
Whoa! There is nothing like an invitation to sniff to send the mob back to the slums. In the absence of a Samba version for Windows (old joke by now I'm sure), which would allow me to test Windows and Samba with identical IP stacks, network drivers, etc, I am reduced to first checking out a hunch.
I'm sure many "bugs" get solved by the reporter indulging in a bit of one-way conversation on-line with an odd prod from a third party to keep him going. Well this thread is something like that...but it's better than muttering incoherently to myself on the street :)
If I look back over this thread, I note that the only time this problem did *not* happen with a Samba server is when the client NIC was an e1000. It so happens that my server NIC is also an e1000. I originally thought that, since the problem is already marginal and only happens at high data rates, it would probably be least likely to occur in the case of two identical NICs which are of course optimally matched.
But this might also be indicative of a bug in the server E1000 driver, the kind of bug that makes an assumption about the "far side" which is only valid in the case of an identical NIC.
The server driver/module is a standard packaged e1000.ko? Now I've been using identical hardware with Ubuntu/Samba for the last year and have seen no problems with NFS, ftp or Samba. It is after all an Intel chip on a Dell PowerEdge Server, not something I bought down at the flea market! But, to settle the argument, I installed a Belkin PCI Gigabit NIC (bought down at he fla market), adjusted the configuration and....ohmygawd...the problem has dissappeared!
I have tested with a number of clients and on a direct gigabit-to-gigabit link and it works fine every time. This means one of two things:
1. The ubuntu-packaged e1000.ko Intel driver is buggy
2. An on-board LAN chip is more prone to the problem than a PCI card (my bios is up-to-date and mature).
But that is for another bug report or forum. Samba is off the hook on this one, sorry I ever doubted you :(
BUT THERE IS AN INTERESTING LESSON HERE...
I hear a lot in Samba forums about torture tests etc. Unfortunately, TCP/IP's ability to cope with lost packets, etc can hide many problems in underlying stack, bad drivers etc. It may even be possible that a Windows Explorer drag-copy operation has it's own layer of "if there is a problem, try-again a few times before giving up".
This issue has persuaded me that there is no better way of stressing Samba than with an MS-DOS client which, in my experience, will go belly up with even the slightest hiccup by the server.
Thanks for all the help!
Without sniffs we can't do much more here. If you think that asking for sniffs is an offense, then you might find better support somewhere else, maybe via http://www.samba.org/samba/support/.
Sorry Volker, I was over-indulgent in my last post...let me give it to you in a one-liner:
Using a different server NIC seems to cure problem!!
...but don't close this bug yet until I know for sure it's not Samba. As a novice, I can only sniff as a last resort ;-)
(In reply to comment #16)
> Without sniffs we can't do much more here. If you think that asking for sniffs
> is an offense, then you might find better support somewhere else, maybe via
Volker, he was thanking us and telling us it wasn't our fault, but a possible hardware or kernel driver fault. I think you got the wrong end of the stick here.
Okay, sorry :-)
A little postscript for anybody who runs into the same problem:
Samba was indeed impeccable in this whole business :)
The problem was eventually solved by overiding the default off-load parameters for the e1000 driver using ethtool. As a happy side-effect to actually solving the problem, the transfer rate also increased by about 10% so that my Ubuntu/Samba setup runs at pretty much identical speed to Windows XP/2003 on the same hardware!
Whether the setting change indicated a driver, kernel or stack problem, well that remains to be seen. (I did upgrade from Dapper to Edgy and then to Feisty and performed a custom build of latest version of e1000 driver just to be sure that everything was reasonably up to date).
See full solution (and any follow ups) at:
I would empahasise to anybody who runs into a similar problem that Samba's behaviour is characterised by more than just the smb.conf file. While nobody can legislate for a driver bug, kernel issue or innocuous driver default setting, there are a number of tunable kernel TCP/IP parameters (see sysctl) whose defaults do not necessarily come anything close to Windows equivalents. If your Samba server is primarily a Windows drop-in replacement, then you might want to adjust these as well. Links anybody?
Samba is a fine solution for an Image server (for pushing and retrieving Norton Ghost images using a DOS Network Boot disk). This also happens to be an excellent means of stressing your Samba box and ensuring not only that Samba is doing its thing but that the kernel, drivers and network stack are also up to scratch.
Thanks for the summary, and sorry again for being rude.
No offence taken...will teach me to be more concise in future ;-)
*** Bug 4861 has been marked as a duplicate of this bug. ***
*** Bug 9030 has been marked as a duplicate of this bug. ***
Created attachment 11066 [details]
Full log for NT_STATUS_CONNECTION_RESET
full log for NT_STATUS_CONNECTION_RESET.
I find below log message before get NT_STATUS_CONNECTION_RESET.
[2015/05/19 13:50:49.764268, 10, pid=28254, effective(0, 0), real(0, 0), class=tdb] ../source3/lib/gencache.c:296(gencache_set_data_blob)
Adding cache entry with key=[IDMAP/SID2XID/S-1-5-21-2137076357-3068380017-3353387290-1000] and timeout=[四 1月 1 08时00分00秒 1970 CST] (-1432014649 seconds in the past)