PID PGRP USERNAME PRI SIZE RES STATE TIME WCPU% CPU% COMMAND 522319 522385 root 20 1635M 989M ready 86:32 69.1 69.10 smbd 522167 522385 root 20 1222M 800M run/1 63:25 68.9 68.91 smbd 522397 522385 root 20 7808K 1328K ready 80:11 66.9 66.89 smbd
Since updating to 3.0.21b from 3.0.20b on Irix 6.5.23f, the smbd process is causing heavy cpu utilization. Before I revert to 3.0.20b, I'd hope that someone take a look at this and let me know what they'd like me to do to provide more information. Thanks.
Can you attach to the high CPU process with a tool like strace or truss and give me some feedback on the set of system calls please ? Thanks, Jeremy.
this problem is same for me in x86 linux
Great, should be easier to track down than on IRIX, as we all have Linux boxes to hand. Please attach with ltrace or strace and let me know the output for a process eating CPU please. Jeremy.
Created attachment 1729 [details] par -s Here's a 5s "par -s" output. Mostly, 0mS[ 0] : select(26, [6:23:25], 0, 0, {sec=60, usec=0}) 0mS[ 0] : END-select(26, [6:23], 0, 0, 0x7fff2cd8) = 2
Ok, this (select(26, [6:23:25], 0, 0, {sec=60, usec=0})) is *not* high CPU utilization unless the select is returning prematurely for some reason (the select is trying to sleep for 60 seconds waiting for a new network packet). Is it returning prematurely and if so what is triggering it ? ie. If it's a real network packet then it shouldn't go directly into select again, but should do some other processing. Jeremy.
But it is not doing any other processing, just thousands of selects?
Can you attach using a debugger and give me a backtrace so I can see exactly which select is looping please ? Looks like a "read ready" condition isn't being read and cleared. Jeremy
# dbx -p 522911 dbx version 7.3.4 (86441_Nov11 MR) Nov 11 2002 11:31:55 Process 522911 (smbd) stopped at [__select:17 +0x8,0xfa43c38] Source (of /xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/select.s) not available for Process 522911 (dbx) where > 0 __select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/select.s":17, 0xfa43c38] 1 _select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/xlv42/6.5.23m/work/irix/lib/libc/libc_n32_M4/sys/selectSCI.c":30, 0xfa43cc4] 2 sys_select(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/lib/select.c":93, 0x102237c4] 3 receive_message_or_smb(0x1036c6f0, 0x20041, 0xea60, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/process.c":550, 0x100b28b8] 4 smbd_process(0x1a, 0x7fff2ce0, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/process.c":1734, 0x100b4bb4] 5 main(0xffffffff, 0x7fff2f54, 0x0, 0x0, 0x61c2a000, 0x4000, 0x68a1, 0x20) ["/home/ncac/jason/ports/samba/samba-3.0.21b/source/smbd/server.c":976, 0x102a19a0] 6 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1003f8f8] (dbx) q
Ok, this code looks the same between 3.0.14 and 3.0.21. The only thing that has changed is the way oplocks are processed. Once you've attached to this code can you walk through the code via dbx or gdb, finding out why, after the select returns it doesn't go onto process either an oplock break message or an incoming smb packet. That's what it should be doing, not looping round immediately into select again. Jeremy.
Quick test for me - can you turn off kernel oplocks in smb.conf and see if the problem is fixed. I think I might see the problem... Jeremy.
Maybe. After I used Swat to set oplocks = No, the one high cpu utilization process on the system crashed with: [2006/02/14 16:50:04, 0] lib/fault.c:(36) =============================================================== [2006/02/14 16:50:04, 0] lib/fault.c:(37) INTERNAL ERROR: Signal 10 in pid 523552 (3.0.21b) Please read the Trouble-Shooting section of the Samba3-HOWTO [2006/02/14 16:50:04, 0] lib/fault.c:(39) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2006/02/14 16:50:04, 0] lib/fault.c:(40) =============================================================== [2006/02/14 16:50:04, 0] lib/util.c:(1554) PANIC: internal error [2006/02/14 16:50:05, 0] lib/util.c:(1608) BACKTRACE: 7 stack frames: #0 0x1021731c smb_panic2 #1 0x102171ac smb_panic #2 0x101f9518 fault_report #3 0x101f96f0 sig_fault #4 0xfaee79c _sigtramp #5 0x100b4c10 smbd_process #6 0x102a19a0 main
Created attachment 1730 [details] Patch Please try this patch against 3.0.21b. Jeremy.
It looks like the patch works. I haven't had any smbd processes start to use a lot of cpu.
Frederico: Are you also able to verify Jeremy's fix from comment #13?
Great ! Thanks - I'll make sure this gets into the next release. Jeremy.
sorry, I had "downdating" to samba 3.0.20a when haved a problem, now I revert do 3.0.21b and the problem stoped! I
Created attachment 1736 [details] strace strace -p
Hi, the problem ocurred again! now I get de strace! it