Bug 250 - After approx 1 hours smbd uses lots of CPU doing nothing
After approx 1 hours smbd uses lots of CPU doing nothing
Status: CLOSED FIXED
Product: Samba 3.0
Classification: Unclassified
Component: File Services
3.0.0preX
All Linux
: P1 normal
: 3.0.0rc2
Assigned To: Gerald (Jerry) Carter
:
Depends on:
Blocks: 289
  Show dependency treegraph
 
Reported: 2003-07-24 14:19 UTC by Ole Gjerde
Modified: 2005-11-14 09:24 UTC (History)
1 user (show)

See Also:


Attachments
samba config file (4.36 KB, text/plain)
2003-08-13 07:13 UTC, Ole Gjerde
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ole Gjerde 2003-07-24 14:19:45 UTC
After about 1 hour, there are 8(one not shown) smbds using lots of CPU:
root     23850 20.3  1.6  8512 4136 ?        R    14:51  16:14 smbd -D
root     23971 12.1  1.4  8040 3652 ?        R    15:44   3:13 smbd -D
root     23979 11.6  1.4  8016 3624 ?        R    15:46   2:54 smbd -D
root     23987 10.8  2.6 10484 6700 ?        R    15:47   2:34 smbd -D
root     23998 10.9  1.3  7848 3452 ?        R    15:51   2:10 smbd -D
root     24007 10.2  1.4  7988 3812 ?        R    15:54   1:41 smbd -D
root     24025 10.6  1.3  7916 3520 ?        R    15:56   1:33 smbd -D

There are no users using the server at this point.

A strace reveales that it just keeps calling the same syscall (a LOT)
mremap(0x4034a000, 937984, 937984, MREMAP_MAYMOVE) = 0x4034a000
and nothing else(numbers are slightly different for the other ones, but
otherwise the same).

OS: Redhat Linux 7.3 (up-to-date)
Samba: 3.0.0beta3 installed from binary rpm from the samba website(for RH 7.3)

Worked fine in beta1.
Comment 1 Ole Gjerde 2003-07-30 21:07:27 UTC
I went back to beta1 and that now has the same problem.
The *only* thing I changed after upgrading to beta3 was the group mappings,
because they stopped working after the upgrade.

Because of other reasons, I upgraded to RedHat 9 on that server tonight,
installed the beta3 rpm for RH9 from www.samba.org and it has the same problem
as well.
Comment 2 Ole Gjerde 2003-08-01 06:55:05 UTC
Some more information:
It also seem like they start in twos(within 1-2 minutes or so)

Samba is usable during this time, but everything is insanely slow.  Programs
that use bdbs hang for minutes at a time and things like that.
If I kill all the CPU hogging processes samba runs just fine for about an hour
when it spawns some more of these.

root      5982 20.3  0.7 12916 1928 ?        R    Jul31 252:06 smbd -D
root      5984 20.3  0.7 12908 1912 ?        R    Jul31 251:01 smbd -D

root      6005 19.4  0.6 11988 1688 ?        R    Jul31 233:34 smbd -D
root      6007 19.4  0.6 11836 1564 ?        R    Jul31 233:01 smbd -D

root      6254 16.5  0.6 11684 1632 ?        R    Jul31 130:10 smbd -D
root      6256 16.5  0.5 11488 1328 ?        R    Jul31 129:50 smbd -D

root      7038 12.7  1.2 10716 3192 ?        R    08:24   1:47 smbd -D
root      7040 12.3  1.0 10520 2564 ?        R    08:26   1:31 smbd -D

Like I said in the original comment, the only change made was the group
mappings. Now they look like this(Note that these don't seem to work right, I'm
in both staff + admin, but it seems like it only notices that I'm in staff):

System Operators (S-1-5-32-549) -> -1
Replicators (S-1-5-32-552) -> -1
Guests (S-1-5-32-546) -> -1
Domain Users (S-1-5-21-2489638659-2949135044-1763653089-513) -> staff
Power Users (S-1-5-32-547) -> staff
Print Operators (S-1-5-32-550) -> -1
Administrators (S-1-5-32-544) -> admin
Domain Admins (S-1-5-21-2489638659-2949135044-1763653089-512) -> admin
Domain Guests (S-1-5-21-2489638659-2949135044-1763653089-514) -> -1
Account Operators (S-1-5-32-548) -> -1
Backup Operators (S-1-5-32-551) -> -1
Users (S-1-5-32-545) -> staff

If there is something else that I should be doing to help debug this problem,
I'll do it.. I just need a little direction :)
Comment 3 Ole Gjerde 2003-08-06 07:09:34 UTC
More info: 
The "stuck" smbd processes are always "connected" to at least 2 shares, 
netlogon and IPC$. 
 
From smbstatus: 
netlogon     17050   pios2         Wed Aug  6 00:48:19 2003 
IPC$         17050   pios2         Wed Aug  6 00:48:19 2003 
 
but sometimes even more: 
netlogon     17751   default_wks   Wed Aug  6 08:53:28 2003 
IPC$         17751   default_wks   Wed Aug  6 08:53:28 2003 
profiles     17751   default_wks   Wed Aug  6 08:53:18 2003 
IPC$         17751   default_wks   Wed Aug  6 08:53:18 2003 
IPC$         17751   default_wks   Wed Aug  6 08:52:57 2003 
 
The clients are all Windows XP SP1(and they have been since SP1 came out 
without problems) 
 
Comment 4 Gerald (Jerry) Carter 2003-08-11 13:07:44 UTC
can you get an strace (or truss) of the smbd taking up CPU?  Perhaps also set
the debug level to 10 and get a log?

   log file = /var/log/samba/log.%m

and send 'smbcontrol <pid> debug 10'.
 
Mail me the log files to jerry@samba.org
Comment 5 Gerald (Jerry) Carter 2003-08-12 17:33:10 UTC
I've seen this occur when smbd has died and is trying to log the 
gdb backtrace.  Can you do an strace of the process and see if it 
stuck in an endless loop mremap() ?
Comment 6 Ole Gjerde 2003-08-12 21:17:08 UTC
Indeed, as I mentioned in the initial comment, it is stuck in a loop like this: 
mremap(0x4034a000, 937984, 937984, MREMAP_MAYMOVE) = 0x4034a000 
over, and over again.... 
 
This last weekend I downgraded to 2.2, and everything works great(with the same 
config). 
 
I will upgrade to 3.0beta3 again this coming weekend and get the debug log, 
etc. 
Comment 7 Björn Jacke 2003-08-13 00:35:16 UTC
Ole, can you please attach you smb.conf ?
Comment 8 Ole Gjerde 2003-08-13 07:13:44 UTC
Created attachment 73 [details]
samba config file

Here's my smb.conf file
Comment 9 Björn Jacke 2003-08-13 07:59:49 UTC
you also have one variable multiple times in a preexec:

root preexec = /home/shares/scripts/makelogonscript.1 %U %L %G %U

try writing a wrapper script, which you can call with just one %U.
That will probably make the process disappear (see bug #289).
Comment 10 Gerald (Jerry) Carter 2003-08-25 13:43:53 UTC
This should be fixed now.  Turns out to be a 
duplicate of bug 289.
Comment 11 Gerald (Jerry) Carter 2005-02-07 08:41:21 UTC
originally reported against 3.0.0beta3.  CLeaning out 
non-production release versions.
Comment 12 Gerald (Jerry) Carter 2005-08-24 10:16:04 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.
Comment 13 Gerald (Jerry) Carter 2005-11-14 09:24:58 UTC
database cleanup