Bug 8339 - Severe slow down issues
Severe slow down issues
Status: NEEDINFO
Product: Samba 3.5
Classification: Unclassified
Component: File services
3.5.4
x64 Linux
: P5 normal
: ---
Assigned To: Holger Hetterich
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-01 09:07 UTC by Phil Lavin
Modified: 2011-08-05 07:31 UTC (History)
1 user (show)

See Also:


Attachments
Strace of copying the last approx 500mb of a 2GB file from Windows at ~50-70mbit/s (1.01 MB, application/bzip2)
2011-08-01 10:11 UTC, Phil Lavin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phil Lavin 2011-08-01 09:07:51 UTC
I have just discussed this on IRC and have been directed here. Logs, to save me retyping, are as follows:

(09:38:47) <Phil-Work> we're having major speed issues with Samba
(09:39:10) <Phil-Work> on first boot of the system, it benchmarks at ~45mbit/s write speed over a gbit network
(09:39:27) <Phil-Work> following a day or so of use by 25 concurrent users, it drops to < 1mbit/s
(09:39:37) <Phil-Work> and causes us to reboot the server almost daily
(09:39:40) <Phil-Work> why might this be? :S
(09:41:59) <Phil-Work> it might be useful to note that a samba restart doesn't fix this - it takes the whole server being rebooted
(09:44:25) * Flechmen is now known as Flechmen_Disconn
(09:44:44) * merzo (~merzo@193.254.217.44) Quit (Ping timeout: 258 seconds)
(09:46:20) <kai> Phil-Work: interesting, what system?
(09:48:39) * abartlet (~abartlet@fn.samba.org) Quit (Quit: Leaving.)
(09:50:50) <Phil-Work> kai, CentOS
(09:51:14) <Phil-Work> "CentOS release 5.6 (Final)"
(09:53:59) <kai> Phil-Work: that's not the current release, right?
(09:54:06) * aggelos_ (~aggelos@p5DDBAFDB.dip.t-dialin.net) Quit (Ping timeout: 260 seconds)
(09:54:46) <Phil-Work> kai, 6 was released 20 days ago, apparently
(09:54:54) <Phil-Work> prior to that, it was the latest
(09:55:01) <Phil-Work> but these issues have been ongoing for months now
(09:55:06) <kai> fair enough
(09:55:19) <kai> just so I get the kernel versions straight :)
(09:55:49) <Phil-Work> [root@local:~]$ uname -a
(09:55:49) <Phil-Work> Linux linux.local.propelleremail.co.uk 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
(09:56:38) <Phil-Work> admittedly, that kernel is crazy-old
(09:57:02) <Phil-Work> the redhat folks aren't big fans of updates
(09:57:05) <kai> well, it's an enterprise distro, that's to be expected
(09:58:06) <kai> seeing how a recent ubuntu kernel upgrade broke my LXC containers, I totally subscribe to changing the kernel as little as possible :)
(09:58:13) <Phil-Work> lol
(10:00:17) <kai> anyhow, I'm a bit at a loss on how to best debug this. I suspect this is one of the performance-optimized code paths, doing something weird that causes some list in the kernel to fill up
(10:00:40) <kai> at least that seems what best would explain the symptoms you're seeing
(10:00:50) <Phil-Work> yeh - we've upgraded Samba from a way-old repo version to near latest
(10:01:00) <Phil-Work> and even then it took a total system reboot to speed it back up
(10:01:28) <kai> but that's the low level file server stuff, that's a bit outside my league
(10:01:38) <kai> what samba version are you using?
(10:01:48) <Phil-Work> Version 3.5.4-0.70.el5_6.1
(10:02:06) <Phil-Work> it could probably use an upgrade
(10:02:10) <Phil-Work> but it's not too old
(10:02:52) <Phil-Work> we were on 2.something previously and had the same issues
(10:03:03) <kai> oh, ok
(10:03:34) <kai> probably someone with more insight to the fileserver code needs to have a look at that
(10:03:55) <Phil-Work> thanks for your help so far :)
(10:03:56) <kai> can you please file a bug report at bugzilla.samba.org?
(10:04:05) <Phil-Work> yarp
(10:05:11) <kai> great. if you put all the information you gave me in there, including that you saw this on 2.whatever already, that'll give the fileserver gurus something interesting to sink their teeth into :)
Comment 1 Volker Lendecke 2011-08-01 09:27:18 UTC
Kai, please take care of the initial analysis (network traces, strace etc).

Thanks,

Volker
Comment 2 Phil Lavin 2011-08-01 10:01:42 UTC
Config is as follows...

[global]
        workgroup = PROPELLERCOMMUN
        os level = 20
        load printers = no
        show add printer wizard = no
        printing = none
        printcap name = /dev/null
        disable spoolss = yes
        cups options = raw
        netbios name = LinuxServer
        server string = Samba Server Version %v
        security = user
        passdb backend = tdbsam
        vfs objects = full_audit
        full_audit:failure = none
        full_audit:success = mkdir rename unlink rmdir pwrite
        full_audit:prefix = %u|%I|%m|%S
        full_audit:facility = local5
        full_audit:priority = notice

[homes]
        comment = Home Directories
        browseable = no
        writable = yes

[Propeller Files]
        writeable = yes
        invalid users = steph
        path = /mnt/pdrive
        force directory mode = 775
        force group = propusers
        revalidate = yes
        force create mode = 664
        comment = Propeller Files
        create mode = 664
        directory mode = 775

[ecommerce]
        writeable = yes
        path = /mnt/pdrive/General/E-commerce
        force directory mode = 775
        force group = propusers
        force create mode = 664
        comment = E-Commerce Files
        create mode = 664
        directory mode = 775
Comment 3 Phil Lavin 2011-08-01 10:11:56 UTC
Created attachment 6739 [details]
Strace of copying the last approx 500mb of a 2GB file from Windows at ~50-70mbit/s
Comment 4 Kai Blin 2011-08-02 08:18:40 UTC
Seems like turning off the audit module might fix the observed performance drop. Waiting for the reporter to confirm after a few more days of testing.
Comment 5 Björn Jacke 2011-08-02 20:49:46 UTC
configuring the syslog daemon to not sync() stuff from the audit module might be another option to work around the performance penalty without disabling audit completely. We should also add a hint into the man page ...
Comment 6 Phil Lavin 2011-08-03 08:14:45 UTC
It might be important to note that the server starts extremely fast and then steadily slows down until a total reboot happens. Is this a known and unavoidable symptom of audit logging? I would have thought that the server would be of consistent speed because the same amount of logging is done when it's fast as when it's slow.

Phil
Comment 7 Kai Blin 2011-08-04 13:14:42 UTC
Holger expressed interest in looking at this, reassigning bug.
Comment 8 Holger Hetterich 2011-08-04 18:56:23 UTC
on comment#6: Does it happen when the audit module isn't enabled?
Comment 9 Phil Lavin 2011-08-05 07:31:53 UTC
> on comment#6: Does it happen when the audit module isn't enabled?

It doesn't seem so. The server has been running fine for a week now with the module unloaded - before it typically had to be rebooted daily.

Phil