Join 30 clients as Member servers and started login/logoff test from 25 windows XP clients . winbind dumped core during this : Core was generated by `/usr/sbin/winbindd -s /etc/samba/smb.conf'. Program terminated with signal 6, Aborted. #0 0xffffe410 in __kernel_vsyscall () (gdb) where #0 0xffffe410 in __kernel_vsyscall () #1 0xb7ccb8d0 in raise () from /lib/libc.so.6 #2 0xb7cccff3 in abort () from /lib/libc.so.6 #3 0x800c06f9 in dump_core () at lib/fault.c:192 #4 0x800d5c55 in smb_panic (why=0x802282f4 "internal error") at lib/util.c:1649 #5 0x800c0bfa in sig_fault (sig=11) at lib/fault.c:47 #6 <signal handler called> #7 0x8006f79a in async_request (mem_ctx=0x802967d0, child=0x8026e3c0, request=0x806e6814, response=0x806e703c, continuation=0x80070d00 <do_async_recv>, private_data=0x806e6810) at nsswitch/winbindd_dual.c:135 #8 0x80070be5 in do_async (mem_ctx=0x802967d0, child=0x8026e3c0, request=0xbfa1f298, cont=0x8006fe40 <winbindd_uid2sid_recv>, c=0x80042c10, private_data=0x8033f7f0) at nsswitch/winbindd_async.c:84 #9 0x80074a53 in winbindd_uid2sid_async (mem_ctx=0x802967d0, uid=81, cont=0x80042c10 <getpwuid_recv>, private_data=0x8033f7f0) at nsswitch/winbindd_async.c:1502 #10 0x80042a37 in winbindd_getpwuid (state=0x8033f7f0) at nsswitch/winbindd_user.c:432 #11 0x8003fb70 in request_recv (private_data=0x8033f7f0, success=1) at nsswitch/winbindd.c:343 #12 0x80040298 in rw_callback (event=0x8033f7fc, flags=1) at nsswitch/winbindd.c:426 #13 0x80040ccd in main (argc=Cannot access memory at address 0x807a03f8 ) at nsswitch/winbindd.c:882 Attaching the /var/log/samba/log.winbindd. We analyzed core and found that the child in domain list in async_request() was having junk next pointer. Here the API to allocate memory to state structure was not doing memset to zero for state structure members. We changed the TALLOC_P() to TALLOC_ZERO_P. The later one does the memset after allocation. I will attach patch to the bug.
Created attachment 3502 [details] winbindd log contents when winbindd dumped core.
Created attachment 3503 [details] Patch to resolve the winbindd core in winbindd_dual.c The patch memsets the members of state structure by using TALLOC_ZERO_P instead of TALLOC_P.
The TALLOC_ZERO_P is an safe fix but are you sure that the bug is not similar to the one Volker found (and fixed ) in http://gitweb.samba.org/?p=samba.git;a=commit;h=c70e2b6476d2d99c79624e15a4a3cfcdc850fc7c and jeremy backported to the 3-0 tree http://gitweb.samba.org/?p=samba.git;a=commit;h=c93d42969451949566327e7fdbf29bfcee2c8319
Is that different from the bug fixed with c93d42969451949566327e7fdbf29bfcee2c8319? Volker
I went through the description of commit c93d42969451949566327e7fdbf29bfcee2c8319?. It says the core is because of some race conditions and we have also faced the same issue before winbindd dumped the core. Since next pointer of child->requests was point to invalid memory address I thought memset of state would be safer. Will back port it to my code base which is 3.0.28. Will make our guys to think about moving to latest code base of Samba so that there would not be duplicate work. Thanks Jerry, Volker for your fast response. Will verify the changes and move the status.
Assuming it's fixed. Please re-open if it is not.