3515 – Extremely high cpu utilization

Bug 3515 - Extremely high cpu utilization

Summary: Extremely high cpu utilization

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 3.0
Classification:	Unclassified
Component:	File Services (show other bugs)
Version:	3.0.21b
Hardware:	SGI IRIX

Importance:	P3 critical
Target Milestone:	none
Assignee:	Samba Bugzilla Account
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-02-14 11:07 UTC by Jason Mader (mail bounces back)
Modified:	2006-02-17 17:15 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
par -s (10.89 KB, application/x-gzip) 2006-02-14 12:23 UTC, Jason Mader (mail bounces back)	no flags	Details
Patch (4.62 KB, patch) 2006-02-14 16:49 UTC, Jeremy Allison	no flags	Details
strace (169.68 KB, application/octet-stream) 2006-02-17 17:12 UTC, Frederico Gendorf	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jason Mader (mail bounces back) 2006-02-14 11:07:47 UTC

PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
    522319     522385 root      20 1635M  989M ready   86:32 69.1 69.10 smbd
    522167     522385 root      20 1222M  800M run/1   63:25 68.9 68.91 smbd
    522397     522385 root      20 7808K 1328K ready   80:11 66.9 66.89 smbd

Comment 1 Jason Mader (mail bounces back) 2006-02-14 11:11:54 UTC

Since updating to 3.0.21b from 3.0.20b on Irix 6.5.23f, the smbd process is causing heavy cpu utilization.

Before I revert to 3.0.20b, I'd hope that someone take a look at this and let me know what they'd like me to do to provide more information.  Thanks.

Comment 2 Jeremy Allison 2006-02-14 12:02:28 UTC

Can you attach to the high CPU process with a tool like strace or truss and give me some feedback on the set of system calls please ?
Thanks,
Jeremy.

Comment 3 Frederico Gendorf 2006-02-14 12:07:28 UTC

this problem is same for me in x86 linux

Comment 4 Jeremy Allison 2006-02-14 12:11:23 UTC

Great, should be easier to track down than on IRIX, as we all have Linux boxes to hand. Please attach with ltrace or strace and let me know the output for a process eating CPU please.
Jeremy.

Comment 5 Jason Mader (mail bounces back) 2006-02-14 12:23:39 UTC

Created attachment 1729 [details]
par -s

Here's a 5s "par -s" output.  Mostly,

  0mS[  0] : select(26, [6:23:25], 0, 0, {sec=60, usec=0})
  0mS[  0] : END-select(26, [6:23], 0, 0, 0x7fff2cd8) = 2

Comment 6 Jeremy Allison 2006-02-14 12:34:28 UTC

Ok, this (select(26, [6:23:25], 0, 0, {sec=60, usec=0})) is *not* high CPU utilization unless the select is returning prematurely for some reason (the select is trying to sleep for 60 seconds waiting for a new network packet).
Is it returning prematurely and if so what is triggering it ? ie. If it's a real network packet then it shouldn't go directly into select again, but should do some other processing.
Jeremy.

Comment 7 Jason Mader (mail bounces back) 2006-02-14 12:50:12 UTC

But it is not doing any other processing, just thousands of selects?

Comment 8 Jeremy Allison 2006-02-14 13:25:18 UTC

Can you attach using a debugger and give me a backtrace so I can see exactly which select is looping please ? Looks like a "read ready" condition isn't being read and cleared.
Jeremy

Comment 9 Jason Mader (mail bounces back) 2006-02-14 15:06:16 UTC

# dbx -p 522911
dbx version 7.3.4 (86441_Nov11 MR) Nov 11 2002 11:31:55
Process 522911 (smbd) stopped at [__select:17 +0x8,0xfa43c38]
         Source (of /xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/select.s) not available for Process 522911
(dbx) where
>  0 __select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/select.s":17, 0xfa43c38]
   1 _select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/selectSCI.c":30, 0xfa43cc4]
   2 sys_select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/lib/select.c":93, 0x102237c4]
   3 receive_message_or_smb(0x1036c6f0, 0x20041, 0xea60, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/process.c":550, 0x100b28b8]
   4 smbd_process(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/process.c":1734, 0x100b4bb4]
   5 main(0xffffffff, 0x7fff2f54, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/server.c":976, 0x102a19a0]
   6 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1003f8f8]
(dbx) q

Comment 10 Jeremy Allison 2006-02-14 15:35:33 UTC

Ok, this code looks the same between 3.0.14 and 3.0.21. The only thing that has
changed is the way oplocks are processed. Once you've attached to this code can you walk through the code via dbx or gdb, finding out why, after the select returns it doesn't go onto process either an oplock break message or an incoming smb packet. That's what it should be doing, not looping round immediately into select again.
Jeremy.

Comment 11 Jeremy Allison 2006-02-14 15:47:07 UTC

Quick test for me - can you turn off kernel oplocks in smb.conf and see if the problem is fixed. I think I might see the problem...
Jeremy.

Comment 12 Jason Mader (mail bounces back) 2006-02-14 15:55:38 UTC

Maybe.  After I used Swat to set oplocks = No, the one high cpu utilization process on the system crashed with:

[2006/02/14 16:50:04, 0] lib/fault.c:(36)
  ===============================================================
[2006/02/14 16:50:04, 0] lib/fault.c:(37)
  INTERNAL ERROR: Signal 10 in pid 523552 (3.0.21b)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2006/02/14 16:50:04, 0] lib/fault.c:(39)
  
  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2006/02/14 16:50:04, 0] lib/fault.c:(40)
  ===============================================================
[2006/02/14 16:50:04, 0] lib/util.c:(1554)
  PANIC: internal error
[2006/02/14 16:50:05, 0] lib/util.c:(1608)
  BACKTRACE: 7 stack frames:
   #0 0x1021731c smb_panic2
   #1 0x102171ac smb_panic
   #2 0x101f9518 fault_report
   #3 0x101f96f0 sig_fault
   #4 0xfaee79c _sigtramp
   #5 0x100b4c10 smbd_process
   #6 0x102a19a0 main

Comment 13 Jeremy Allison 2006-02-14 16:49:20 UTC

Created attachment 1730 [details]
Patch

Please try this patch against 3.0.21b.
Jeremy.

Comment 14 Jason Mader (mail bounces back) 2006-02-15 09:41:14 UTC

It looks like the patch works.  I haven't had any smbd processes start to use a lot of cpu.

Comment 15 Lars Müller 2006-02-15 10:18:05 UTC

Frederico: Are you also able to verify Jeremy's fix from comment #13?

Comment 16 Jeremy Allison 2006-02-15 10:32:21 UTC

Great ! Thanks - I'll make sure this gets into the next release.
Jeremy.

Comment 17 Frederico Gendorf 2006-02-15 10:34:29 UTC

sorry, I had "downdating" to samba 3.0.20a when haved a problem, now I revert do 3.0.21b and the problem stoped!
I

Comment 18 Frederico Gendorf 2006-02-17 17:12:55 UTC

Created attachment 1736 [details]
strace

strace -p

Comment 19 Frederico Gendorf 2006-02-17 17:15:37 UTC

Hi, the problem ocurred again!
now I get de strace!
it