I have just discussed this on IRC and have been directed here. Logs, to save me retyping, are as follows: (09:38:47) <Phil-Work> we're having major speed issues with Samba (09:39:10) <Phil-Work> on first boot of the system, it benchmarks at ~45mbit/s write speed over a gbit network (09:39:27) <Phil-Work> following a day or so of use by 25 concurrent users, it drops to < 1mbit/s (09:39:37) <Phil-Work> and causes us to reboot the server almost daily (09:39:40) <Phil-Work> why might this be? :S (09:41:59) <Phil-Work> it might be useful to note that a samba restart doesn't fix this - it takes the whole server being rebooted (09:44:25) * Flechmen is now known as Flechmen_Disconn (09:44:44) * merzo (~merzo@193.254.217.44) Quit (Ping timeout: 258 seconds) (09:46:20) <kai> Phil-Work: interesting, what system? (09:48:39) * abartlet (~abartlet@fn.samba.org) Quit (Quit: Leaving.) (09:50:50) <Phil-Work> kai, CentOS (09:51:14) <Phil-Work> "CentOS release 5.6 (Final)" (09:53:59) <kai> Phil-Work: that's not the current release, right? (09:54:06) * aggelos_ (~aggelos@p5DDBAFDB.dip.t-dialin.net) Quit (Ping timeout: 260 seconds) (09:54:46) <Phil-Work> kai, 6 was released 20 days ago, apparently (09:54:54) <Phil-Work> prior to that, it was the latest (09:55:01) <Phil-Work> but these issues have been ongoing for months now (09:55:06) <kai> fair enough (09:55:19) <kai> just so I get the kernel versions straight :) (09:55:49) <Phil-Work> [root@local:~]$ uname -a (09:55:49) <Phil-Work> Linux linux.local.propelleremail.co.uk 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux (09:56:38) <Phil-Work> admittedly, that kernel is crazy-old (09:57:02) <Phil-Work> the redhat folks aren't big fans of updates (09:57:05) <kai> well, it's an enterprise distro, that's to be expected (09:58:06) <kai> seeing how a recent ubuntu kernel upgrade broke my LXC containers, I totally subscribe to changing the kernel as little as possible :) (09:58:13) <Phil-Work> lol (10:00:17) <kai> anyhow, I'm a bit at a loss on how to best debug this. I suspect this is one of the performance-optimized code paths, doing something weird that causes some list in the kernel to fill up (10:00:40) <kai> at least that seems what best would explain the symptoms you're seeing (10:00:50) <Phil-Work> yeh - we've upgraded Samba from a way-old repo version to near latest (10:01:00) <Phil-Work> and even then it took a total system reboot to speed it back up (10:01:28) <kai> but that's the low level file server stuff, that's a bit outside my league (10:01:38) <kai> what samba version are you using? (10:01:48) <Phil-Work> Version 3.5.4-0.70.el5_6.1 (10:02:06) <Phil-Work> it could probably use an upgrade (10:02:10) <Phil-Work> but it's not too old (10:02:52) <Phil-Work> we were on 2.something previously and had the same issues (10:03:03) <kai> oh, ok (10:03:34) <kai> probably someone with more insight to the fileserver code needs to have a look at that (10:03:55) <Phil-Work> thanks for your help so far :) (10:03:56) <kai> can you please file a bug report at bugzilla.samba.org? (10:04:05) <Phil-Work> yarp (10:05:11) <kai> great. if you put all the information you gave me in there, including that you saw this on 2.whatever already, that'll give the fileserver gurus something interesting to sink their teeth into :)
Kai, please take care of the initial analysis (network traces, strace etc). Thanks, Volker
Config is as follows... [global] workgroup = PROPELLERCOMMUN os level = 20 load printers = no show add printer wizard = no printing = none printcap name = /dev/null disable spoolss = yes cups options = raw netbios name = LinuxServer server string = Samba Server Version %v security = user passdb backend = tdbsam vfs objects = full_audit full_audit:failure = none full_audit:success = mkdir rename unlink rmdir pwrite full_audit:prefix = %u|%I|%m|%S full_audit:facility = local5 full_audit:priority = notice [homes] comment = Home Directories browseable = no writable = yes [Propeller Files] writeable = yes invalid users = steph path = /mnt/pdrive force directory mode = 775 force group = propusers revalidate = yes force create mode = 664 comment = Propeller Files create mode = 664 directory mode = 775 [ecommerce] writeable = yes path = /mnt/pdrive/General/E-commerce force directory mode = 775 force group = propusers force create mode = 664 comment = E-Commerce Files create mode = 664 directory mode = 775
Created attachment 6739 [details] Strace of copying the last approx 500mb of a 2GB file from Windows at ~50-70mbit/s
Seems like turning off the audit module might fix the observed performance drop. Waiting for the reporter to confirm after a few more days of testing.
configuring the syslog daemon to not sync() stuff from the audit module might be another option to work around the performance penalty without disabling audit completely. We should also add a hint into the man page ...
It might be important to note that the server starts extremely fast and then steadily slows down until a total reboot happens. Is this a known and unavoidable symptom of audit logging? I would have thought that the server would be of consistent speed because the same amount of logging is done when it's fast as when it's slow. Phil
Holger expressed interest in looking at this, reassigning bug.
on comment#6: Does it happen when the audit module isn't enabled?
> on comment#6: Does it happen when the audit module isn't enabled? It doesn't seem so. The server has been running fine for a week now with the module unloaded - before it typically had to be rebooted daily. Phil
we need to close this bug as this seems to be specific to your setup, there is no such problem known from other setups with audit module