The problem is: number of smbd processes unusual high. We service about 200 workstations office on samba-3.0.11 PDC, but number of smbd processes grows to 500 in period of two days (while deadtime=20). Symptoms look like "http://lists.samba.org/archive/samba/2004-December/096674. html" "smbstatus reports approximately the right number of clients, but ps shows a much larger number of smbd processes active. Smbstatus reports a list of active smbd processes ... but there is a block of smbd processes ... that are not in the smbstatus report. The hung processes need to be kill -9'ed." Fortunately, we have not file access, domain logon or memory shortage problems. netstat -an reports a large number of sockets in the CLOSE_WAIT state: samba_pdc.63688 ldap_server.389 49640 0 49640 0 CLOSE_WAIT (We use Solaris pam_unix with ldap'ed nsswitch.conf) samba-3.0.11, configured "--with-pam", smbpasswd backend SunOS samba_pdc 5.9 Generic_117171-17 sun4u sparc SUNW,UltraAX-i2 gcc 3.4.2 openldap 2.2.17 We have such situation only in main office. Other smaller offices (20 ws) with same hardware/software (except ldapsam) works fine.
For the processes that are hung and not in the status list can you attach to them with gdb and get a backtrace of where they are ? Also attach with strace -p <pid> and see if they're doing any system calls. Please post the results here. Thanks, Jeremy.
Here is gdb output: # gdb GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9". (gdb) attach 10110 Attaching to process 10110 Reading symbols from /usr/local/samba3/sbin/smbd...done. Reading symbols from /usr/lib/libthread.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libthread.so.1 Reading symbols from /usr/local/lib/libldap-2.2.so.7...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/local/lib/libldap-2.2.so.7 Reading symbols from /usr/local/lib/liblber-2.2.so.7...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/local/lib/liblber-2.2.so.7 Reading symbols from /usr/lib/libpam.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libpam.so.1 Reading symbols from /usr/lib/libsendfile.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libsendfile.so.1 Reading symbols from /usr/lib/libsec.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libsec.so.1 Reading symbols from /usr/lib/libgen.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libgen.so.1 Reading symbols from /usr/lib/libresolv.so.2...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libdl.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/local/lib/libiconv.so.2...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/local/lib/libiconv.so.2 Reading symbols from /usr/local/lib/libpopt.so.0...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/local/lib/libpopt.so.0 Reading symbols from /usr/lib/libc.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libcmd.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libcmd.so.1 Reading symbols from /usr/lib/libmp.so.2...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 Reading symbols from /usr/lib/nss_files.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/nss_files.so.1 Reading symbols from /usr/lib/nss_ldap.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/nss_ldap.so.1 Reading symbols from /usr/lib/libsldap.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libsldap.so.1 Reading symbols from /usr/lib/libldap.so.5...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libldap.so.5 Reading symbols from /usr/lib/libdoor.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libdoor.so.1 Reading symbols from /usr/lib/librt.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/librt.so.1 Reading symbols from /usr/lib/libmd5.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libmd5.so.1 Reading symbols from /usr/lib/libaio.so.1...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/lib/libaio.so.1 Reading symbols from /usr/local/samba3/lib/vfs/extd_audit.so...done. warning: sol_thread_new_objfile: td_ta_new: Debugger service failed Loaded symbols for /usr/local/samba3/lib/vfs/extd_audit.so Retry #1: Retry #2: Retry #3: Retry #4: [New LWP 1] Symbols already loaded for /usr/lib/libthread.so.1 Symbols already loaded for /usr/local/lib/libldap-2.2.so.7 Symbols already loaded for /usr/local/lib/liblber-2.2.so.7 Symbols already loaded for /usr/lib/libpam.so.1 Symbols already loaded for /usr/lib/libsendfile.so.1 Symbols already loaded for /usr/lib/libsec.so.1 Symbols already loaded for /usr/lib/libgen.so.1 Symbols already loaded for /usr/lib/libresolv.so.2 Symbols already loaded for /usr/lib/libsocket.so.1 Symbols already loaded for /usr/lib/libnsl.so.1 Symbols already loaded for /usr/lib/libdl.so.1 Symbols already loaded for /usr/local/lib/libiconv.so.2 Symbols already loaded for /usr/local/lib/libpopt.so.0 Symbols already loaded for /usr/lib/libc.so.1 Symbols already loaded for /usr/lib/libcmd.so.1 Symbols already loaded for /usr/lib/libmp.so.2 Symbols already loaded for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 Symbols already loaded for /usr/lib/nss_files.so.1 Symbols already loaded for /usr/lib/nss_ldap.so.1 Symbols already loaded for /usr/lib/libsldap.so.1 Symbols already loaded for /usr/lib/libldap.so.5 Symbols already loaded for /usr/lib/libdoor.so.1 Symbols already loaded for /usr/lib/librt.so.1 Symbols already loaded for /usr/lib/libmd5.so.1 Symbols already loaded for /usr/lib/libaio.so.1 Symbols already loaded for /usr/local/samba3/lib/vfs/extd_audit.so 0xff375e88 in __lwp_park () from /usr/lib/libthread.so.1 (gdb) bt #0 0xff375e88 in __lwp_park () from /usr/lib/libthread.so.1 #1 0xff371c10 in mutex_lock_queue () from /usr/lib/libthread.so.1 #2 0xff372610 in slow_lock () from /usr/lib/libthread.so.1 #3 0xfef46d00 in malloc () from /usr/lib/libc.so.1 Here is truss output for a long period: #truss -fp 10110 10110: *** SUID: ruid/euid/suid = 0 / 60001 / 60001 *** 10110: *** SGID: rgid/egid/sgid = 0 / 60001 / 60001 *** 10110: lwp_park(0x00000000, 0) (sleeping...) Here is output of p* tools: # pfiles 10110 10110: /usr/local/samba3/sbin/smbd -D -s/usr/local/samba3/lib/smb.conf Current rlimit: 10020 file descriptors 0: S_IFCHR mode:0666 dev:32,0 ino:21557 uid:0 gid:3 rdev:13,2 O_RDWR|O_LARGEFILE 1: S_IFCHR mode:0666 dev:32,0 ino:21557 uid:0 gid:3 rdev:13,2 O_RDWR|O_LARGEFILE 2: S_IFREG mode:0644 dev:32,0 ino:715682 uid:0 gid:0 size:2263066 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE 3: S_IFCHR mode:0644 dev:32,0 ino:35196 uid:0 gid:3 rdev:190,1 O_RDONLY|O_LARGEFILE 4: S_IFREG mode:0600 dev:32,0 ino:3621724 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE 5: S_IFSOCK mode:0666 dev:239,0 ino:42453 uid:0 gid:0 size:0 O_RDWR sockname: AF_INET 0.0.0.0 port: 0 6: S_IFDOOR mode:0444 dev:245,0 ino:44 uid:0 gid:0 size:0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[21267] 7: S_IFDOOR mode:0444 dev:245,0 ino:45 uid:0 gid:0 size:0 O_RDONLY FD_CLOEXEC door to ldap_cachemgr[20284] 8: S_IFREG mode:0600 dev:32,0 ino:3621727 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE advisory write lock set by process 29362 9: S_IFREG mode:0600 dev:32,0 ino:3621728 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE 10: S_IFREG mode:0644 dev:32,0 ino:3621744 uid:0 gid:0 size:6 O_WRONLY|O_NONBLOCK|O_CREAT|O_EXCL|O_LARGEFILE advisory write lock set by process 13198 11: S_IFREG mode:0600 dev:32,0 ino:3621730 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE advisory read lock set by process 13201 12: S_IFREG mode:0644 dev:32,0 ino:3621731 uid:0 gid:0 size:253952 O_RDWR|O_LARGEFILE advisory read lock set by process 13198 13: S_IFREG mode:0644 dev:32,0 ino:3621732 uid:0 gid:0 size:614400 O_RDWR|O_LARGEFILE advisory read lock set by process 13198 14: S_IFREG mode:0644 dev:32,0 ino:3621733 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE advisory read lock set by process 13198 15: S_IFREG mode:0644 dev:32,0 ino:3621734 uid:0 gid:0 size:65536 O_RDWR|O_LARGEFILE advisory read lock set by process 13198 16: S_IFREG mode:0600 dev:32,0 ino:3621735 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE 17: S_IFREG mode:0644 dev:32,0 ino:3621736 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE 18: S_IFREG mode:0600 dev:32,0 ino:3621737 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE 19: S_IFREG mode:0600 dev:32,0 ino:3621740 uid:0 gid:0 size:8192 O_RDWR|O_LARGEFILE 20: S_IFREG mode:0600 dev:32,0 ino:3621741 uid:0 gid:0 size:16384 O_RDWR|O_LARGEFILE 21: S_IFREG mode:0600 dev:32,0 ino:3621742 uid:0 gid:0 size:696 O_RDWR|O_LARGEFILE 22: S_IFREG mode:0644 dev:32,0 ino:715682 uid:0 gid:0 size:2263066 O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE 23: S_IFSOCK mode:0666 dev:239,0 ino:33411 uid:0 gid:0 size:0 O_RDWR sockname: AF_INET 127.0.0.1 port: 45860 24: S_IFIFO mode:0000 dev:240,0 ino:808983 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK 25: S_IFIFO mode:0000 dev:240,0 ino:808983 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK 26: S_IFIFO mode:0000 dev:240,0 ino:818715 uid:0 gid:0 size:1 O_RDWR|O_NONBLOCK 27: S_IFIFO mode:0000 dev:240,0 ino:818715 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK 28: S_IFCHR mode:0666 dev:32,0 ino:21553 uid:0 gid:3 rdev:21,0 O_WRONLY FD_CLOEXEC 29: S_IFSOCK mode:0666 dev:239,0 ino:27063 uid:0 gid:0 size:0 O_RDWR sockname: AF_INET our_samba_pdc port: 59434 peername: AF_INET our_ldap_server port: 389 # pflags 10110 10110: /usr/local/samba3/sbin/smbd -D -s/usr/local/samba3/lib/smb.conf data model = _ILP32 flags = PR_ORPHAN /1: flags = PR_PCINVAL|PR_ASLEEP [ lwp_park(0x0,0x0,0x0) ] sigmask = 0x00011280,0x00000000 # pstack 10110 10110: /usr/local/samba3/sbin/smbd -D -s/usr/local/samba3/lib/smb.conf ff375e88 lwp_park (0, 0, 0) ff371c08 mutex_lock_queue (ff388b44, 0, fefc0590, ff388000, 35, 0) + 104 ff372608 slow_lock (fefc0590, fefd0000, 252e3e, fefbc000, 0, 0) + 58 fef46cf8 malloc (37, 0, 252e28, ffbfdc28, 0, 0) + 18 0018d93c vasprintf (36, 252e28, ffbfdc28, fef97ec4, 13, 296bab) + 2c 0018df0c x_vfprintf (2a7670, 252e28, ffbfdc28, fefc27b0, 1, ff00) + c 001837d8 Debug1 (0, 0, 28e000, 252e28, 252e40, 252e50) + ec 00183acc dbghdr (1, 252e40, 252e50, 24, 0, 0) + 128 00183b84 fault_report (a, 0, 0, 0, 0, 0) + 58 00183d00 sig_fault (a, 0, ffbfdfc0, 0, 0, 0) + 4 ff3760a0 __sighndlr (a, 0, ffbfdfc0, 183cfc, 0, 0) + c ff36fdd8 call_user_handler (a, 0, ffbfdfc0, 0, 0, 0) + 234 ff36ff88 sigacthandler (a, 0, ffbfdfc0, 1, 1, ffbfe2f4) + 64 --- called from signal handler with signal 10 (SIGBUS) --- fef47c08 _free_unlocked (185, 0, fec8c9a0, fefbc000, 1, ff00) + 40 fef47bb8 free (185, 0, 370888, 0, 0, 0) + 20 fec95a10 ldap_set_lderrno (33ddc8, 0, 0, 0, ffbfe480, 0) + ec fecb0098 ldap_create_virtuallist_control (33ddc8, ffbfe460, ffbfe480, ffbfe484, 652c, ff00) + 198 feda9b38 setup_vlv_params (33dc38, fedaa4d0, 3719c0, 8, fedaa4d0, ff00) + 110 fedaa808 search_state_machine (5, 1, fedc0000, 8, d, e) + 2e8 fedab5d8 __ns_ldap_firstEntry (33dc1c, 0, feddb3e4, 3, 0, 0) + 258 feddaf34 _nss_ldap_getent (33dc00, ffbfeee0, feddaeb4, 0, 0, 0) + 80 fef4ee14 nss_getent_u (fefc0608, 2f9798, fefc0628, ffbfeee0, 0, 0) + c8 fef4e948 nss_getent (fefc0608, fef98cc0, fefc0628, ffbfeee0, 62, ff00) + 34 fef99200 getpwent_r (2e7204, 2e7228, 400, 18a5a4, 2000, ffbfe9d0) + 4c 0018a6a4 getpwent_list (2a7da0, 33fd8c, 33fd9c, 1c80, ff, ffbff1b8) + 168 0010bd34 get_memberuids (0, ffbff054, ffbff050, ffbff05c, 33ecc0, 400) + 30 0010c00c _samr_query_groupmem (33b038, ffbff2a0, ffbff278, 1, 14, 2e1510) + 20c 001016c4 api_samr_query_groupmem (33b038, 1015e0, 28dcf0, 33b046, 0, 33fd78) + e4 001196a8 api_rpcTNP (33b038, 33b046, 28dc84, 0, 0, 33dbe4) + 2c4 0011932c api_pipe_request (33b038, 1, 33cae0, 14, 65, 2f74a0) + e8 00113d18 process_request_pdu (33b038, 0, 1c, 0, 0, 33ba20) + 564 00113f74 process_complete_pdu (33b038, 1c, 0, 2, 0, 33ba20) + 218 001142dc process_incoming_data (1c, 2f74b0, 1c, fefbc000, ffbfef9e, 3fa) + 210 00114540 write_to_internal_pipe (ffffffff, 2f74b0, 2c, 1144b0, ffbfef98, 400) + 90 001144a0 write_to_pipe (2f7360, 2f74a0, 2c, 400, 45, 45) + 128 00050dd4 api_fd_reply (2f7360, 65, 31abe8, 26, 2f74a0, 0) + 2e4 00051044 named_pipe (2f8eb0, 65, 31abe8, ffbff9b6, 2edc60, 2f74a0) + 1c8 00051aa4 reply_trans (400, 2fa798, 0, 2, 65, 2f74a0) + 9f0 000984f8 switch_message (2f8eb0, 2fa798, 31abe8, 84, 20000, 0) + 5c0 00098584 construct_reply (2fa798, 31abe8, 84, 20000, ffbffb88, 2a3c00) + 5c 000988ec process_smb (2fa798, 31abe8, 31abe8, 20441, 0, 0) + 1d8 0009958c smbd_process (bba, 4ecf4, 16, 0, 3, ffbffd20) + 178 001fdd00 main (ffffffff, ffbffe94, ffbffea4, 2a6198, 0, 0) + 824 0003c0c4 _start (0, 0, 0, 0, 0, 0) + 5c
This looks like a problem in the nss_ldap libraries. How large is your LDAP tree? I'm currently working on improving the Samba->LDAP connection. In particular the _samr_query_groupmem that seems to be problematic for you has been vastly improved recently. This is post-3.0.11 however, and you need to activate the option 'ldapsam:trusted = yes'. On the other hand you should try to reproduce the problem by issuing several 'getent group' calls and see whether you can get one of them into a stall. If you can, ask your friendly solaris support for fixing the problem. Volker
(In reply to comment #3) > This looks like a problem in the nss_ldap libraries. How large is your LDAP > tree? I'm currently working on improving the Samba->LDAP connection. In > particular the _samr_query_groupmem that seems to be problematic for you has > been vastly improved recently. This is post-3.0.11 however, and you need to > activate the option 'ldapsam:trusted = yes'. On the other hand you should try to > reproduce the problem by issuing several 'getent group' calls and see whether > you can get one of them into a stall. If you can, ask your friendly solaris > support for fixing the problem. > > Volker We have ~1000 user entries ldap tree in Sun One DS 5.2. It's not ldapsam on PDC, only smbpasswd, think "ldapsam:trusted = yes" useless. Unix accounts resolved thrue pam_unix->nss_ldap well, groups thrue files only. I cannot get getent group into a stall.
getpwent_list is the last function in the call chain in Samba. From there we call getpwent() which gives the next entry from /etc/passwd and the corresponding nss equivalent, probably nss_ldap. This could possibly be slow, but it should never hand indefinitely. The way I'm reading your backtrace suggests that nss_ldap gets a SIGBUS signal from within either nss_ldap or the ldap libraries. It seems to use a AFAIK quite novel LDAP feature called virtual list view that might have bugs in its implementation. Getting a SIGBUS from within free() really sounds like an nss_ldap and/or ldap library bug. Closing this bug, close inspection of the relevant samba code does not show any bugs. The only way this could be a samba bug is a general memory corruption. To trace this there is not enough info in your bug report, and I'm afraid that I don't know a way to really diagnose this. Again: I *really* don't believe it's a samba bug, this too much smells like nss_ldap and/or ldap libs. Volker