Bug 3515 - Extremely high cpu utilization
Summary: Extremely high cpu utilization
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.21b
Hardware: SGI IRIX
: P3 critical
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-14 11:07 UTC by Jason Mader (mail bounces back)
Modified: 2006-02-17 17:15 UTC (History)
1 user (show)

See Also:


Attachments
par -s (10.89 KB, application/x-gzip)
2006-02-14 12:23 UTC, Jason Mader (mail bounces back)
no flags Details
Patch (4.62 KB, patch)
2006-02-14 16:49 UTC, Jeremy Allison
no flags Details
strace (169.68 KB, application/octet-stream)
2006-02-17 17:12 UTC, Frederico Gendorf
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Mader (mail bounces back) 2006-02-14 11:07:47 UTC
PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
    522319     522385 root      20 1635M  989M ready   86:32 69.1 69.10 smbd
    522167     522385 root      20 1222M  800M run/1   63:25 68.9 68.91 smbd
    522397     522385 root      20 7808K 1328K ready   80:11 66.9 66.89 smbd
Comment 1 Jason Mader (mail bounces back) 2006-02-14 11:11:54 UTC
Since updating to 3.0.21b from 3.0.20b on Irix 6.5.23f, the smbd process is causing heavy cpu utilization.

Before I revert to 3.0.20b, I'd hope that someone take a look at this and let me know what they'd like me to do to provide more information.  Thanks.
Comment 2 Jeremy Allison 2006-02-14 12:02:28 UTC
Can you attach to the high CPU process with a tool like strace or truss and give me some feedback on the set of system calls please ?
Thanks,
Jeremy.
Comment 3 Frederico Gendorf 2006-02-14 12:07:28 UTC
this problem is same for me in x86 linux
Comment 4 Jeremy Allison 2006-02-14 12:11:23 UTC
Great, should be easier to track down than on IRIX, as we all have Linux boxes to hand. Please attach with ltrace or strace and let me know the output for a process eating CPU please.
Jeremy.
Comment 5 Jason Mader (mail bounces back) 2006-02-14 12:23:39 UTC
Created attachment 1729 [details]
par -s

Here's a 5s "par -s" output.  Mostly,

  0mS[  0] : select(26, [6:23:25], 0, 0, {sec=60, usec=0})
  0mS[  0] : END-select(26, [6:23], 0, 0, 0x7fff2cd8) = 2
Comment 6 Jeremy Allison 2006-02-14 12:34:28 UTC
Ok, this (select(26, [6:23:25], 0, 0, {sec=60, usec=0})) is *not* high CPU utilization unless the select is returning prematurely for some reason (the select is trying to sleep for 60 seconds waiting for a new network packet).
Is it returning prematurely and if so what is triggering it ? ie. If it's a real network packet then it shouldn't go directly into select again, but should do some other processing.
Jeremy.
Comment 7 Jason Mader (mail bounces back) 2006-02-14 12:50:12 UTC
But it is not doing any other processing, just thousands of selects?
Comment 8 Jeremy Allison 2006-02-14 13:25:18 UTC
Can you attach using a debugger and give me a backtrace so I can see exactly which select is looping please ? Looks like a "read ready" condition isn't being read and cleared.
Jeremy
Comment 9 Jason Mader (mail bounces back) 2006-02-14 15:06:16 UTC
# dbx -p 522911
dbx version 7.3.4 (86441_Nov11 MR) Nov 11 2002 11:31:55
Process 522911 (smbd) stopped at [__select:17 +0x8,0xfa43c38]
         Source (of /xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/select.s) not available for Process 522911
(dbx) where
>  0 __select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/select.s":17, 0xfa43c38]
   1 _select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/selectSCI.c":30, 0xfa43cc4]
   2 sys_select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/lib/select.c":93, 0x102237c4]
   3 receive_message_or_smb(0x1036c6f0, 0x20041, 0xea60, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/process.c":550, 0x100b28b8]
   4 smbd_process(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/process.c":1734, 0x100b4bb4]
   5 main(0xffffffff, 0x7fff2f54, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/server.c":976, 0x102a19a0]
   6 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1003f8f8]
(dbx) q
Comment 10 Jeremy Allison 2006-02-14 15:35:33 UTC
Ok, this code looks the same between 3.0.14 and 3.0.21. The only thing that has
changed is the way oplocks are processed. Once you've attached to this code can you walk through the code via dbx or gdb, finding out why, after the select returns it doesn't go onto process either an oplock break message or an incoming smb packet. That's what it should be doing, not looping round immediately into select again.
Jeremy.

Comment 11 Jeremy Allison 2006-02-14 15:47:07 UTC
Quick test for me - can you turn off kernel oplocks in smb.conf and see if the problem is fixed. I think I might see the problem...
Jeremy.
Comment 12 Jason Mader (mail bounces back) 2006-02-14 15:55:38 UTC
Maybe.  After I used Swat to set oplocks = No, the one high cpu utilization process on the system crashed with:

[2006/02/14 16:50:04, 0] lib/fault.c:(36)
  ===============================================================
[2006/02/14 16:50:04, 0] lib/fault.c:(37)
  INTERNAL ERROR: Signal 10 in pid 523552 (3.0.21b)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2006/02/14 16:50:04, 0] lib/fault.c:(39)
  
  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2006/02/14 16:50:04, 0] lib/fault.c:(40)
  ===============================================================
[2006/02/14 16:50:04, 0] lib/util.c:(1554)
  PANIC: internal error
[2006/02/14 16:50:05, 0] lib/util.c:(1608)
  BACKTRACE: 7 stack frames:
   #0 0x1021731c smb_panic2
   #1 0x102171ac smb_panic
   #2 0x101f9518 fault_report
   #3 0x101f96f0 sig_fault
   #4 0xfaee79c _sigtramp
   #5 0x100b4c10 smbd_process
   #6 0x102a19a0 main
Comment 13 Jeremy Allison 2006-02-14 16:49:20 UTC
Created attachment 1730 [details]
Patch

Please try this patch against 3.0.21b.
Jeremy.
Comment 14 Jason Mader (mail bounces back) 2006-02-15 09:41:14 UTC
It looks like the patch works.  I haven't had any smbd processes start to use a lot of cpu.
Comment 15 Lars Müller 2006-02-15 10:18:05 UTC
Frederico: Are you also able to verify Jeremy's fix from comment #13?
Comment 16 Jeremy Allison 2006-02-15 10:32:21 UTC
Great ! Thanks - I'll make sure this gets into the next release.
Jeremy.
Comment 17 Frederico Gendorf 2006-02-15 10:34:29 UTC
sorry, I had "downdating" to samba 3.0.20a when haved a problem, now I revert do 3.0.21b and the problem stoped!
I
Comment 18 Frederico Gendorf 2006-02-17 17:12:55 UTC
Created attachment 1736 [details]
strace

strace -p
Comment 19 Frederico Gendorf 2006-02-17 17:15:37 UTC
Hi, the problem ocurred again!
now I get de strace!
it