Bug 5711 - winbind dumped core during windows login test from 30 physical clients
Summary: winbind dumped core during windows login test from 30 physical clients
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: winbind (show other bugs)
Version: 3.0.28a
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Gerald (Jerry) Carter (dead mail address)
QA Contact: Samba QA Contact
Depends on:
Reported: 2008-08-22 09:54 UTC by Tukaram
Modified: 2009-01-01 14:45 UTC (History)
0 users

See Also:

winbindd log contents when winbindd dumped core. (733.97 KB, text/plain)
2008-08-22 09:56 UTC, Tukaram
no flags Details
Patch to resolve the winbindd core in winbindd_dual.c (705 bytes, patch)
2008-08-22 10:00 UTC, Tukaram
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tukaram 2008-08-22 09:54:55 UTC
Join 30 clients as Member servers and started login/logoff test from 25 windows
XP clients .

winbind dumped core during this :

Core was generated by `/usr/sbin/winbindd -s /etc/samba/smb.conf'.
Program terminated with signal 6, Aborted.
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7ccb8d0 in raise () from /lib/libc.so.6
#2  0xb7cccff3 in abort () from /lib/libc.so.6
#3  0x800c06f9 in dump_core () at lib/fault.c:192
#4  0x800d5c55 in smb_panic (why=0x802282f4 "internal error") at
#5  0x800c0bfa in sig_fault (sig=11) at lib/fault.c:47
#6  <signal handler called>
#7  0x8006f79a in async_request (mem_ctx=0x802967d0, child=0x8026e3c0,
request=0x806e6814, response=0x806e703c,
    continuation=0x80070d00 <do_async_recv>, private_data=0x806e6810) at
#8  0x80070be5 in do_async (mem_ctx=0x802967d0, child=0x8026e3c0,
request=0xbfa1f298, cont=0x8006fe40 <winbindd_uid2sid_recv>,
    c=0x80042c10, private_data=0x8033f7f0) at nsswitch/winbindd_async.c:84
#9  0x80074a53 in winbindd_uid2sid_async (mem_ctx=0x802967d0, uid=81,
cont=0x80042c10 <getpwuid_recv>, private_data=0x8033f7f0)
    at nsswitch/winbindd_async.c:1502
#10 0x80042a37 in winbindd_getpwuid (state=0x8033f7f0) at
#11 0x8003fb70 in request_recv (private_data=0x8033f7f0, success=1) at
#12 0x80040298 in rw_callback (event=0x8033f7fc, flags=1) at
#13 0x80040ccd in main (argc=Cannot access memory at address 0x807a03f8
) at nsswitch/winbindd.c:882

Attaching the /var/log/samba/log.winbindd.

We analyzed core and found that the child in domain list in async_request() was having junk next pointer.
Here the API to allocate memory to state structure was not doing memset to zero for state structure members.
We changed the TALLOC_P() to TALLOC_ZERO_P. The later one does the memset after allocation. 
I will attach patch to the bug.
Comment 1 Tukaram 2008-08-22 09:56:18 UTC
Created attachment 3502 [details]
winbindd log contents when winbindd dumped core.
Comment 2 Tukaram 2008-08-22 10:00:27 UTC
Created attachment 3503 [details]
Patch to resolve the winbindd core in winbindd_dual.c

The patch memsets the members of state structure by using TALLOC_ZERO_P instead of TALLOC_P.
Comment 3 Gerald (Jerry) Carter (dead mail address) 2008-08-22 10:14:17 UTC
The TALLOC_ZERO_P is an safe fix but are you sure that the bug is not 
similar to the one Volker found (and fixed ) in 

and jeremy backported to the 3-0 tree 
Comment 4 Volker Lendecke 2008-08-22 10:15:54 UTC
Is that different from the bug fixed with c93d42969451949566327e7fdbf29bfcee2c8319?

Comment 5 Tukaram 2008-08-22 10:33:02 UTC
I went through the description of commit c93d42969451949566327e7fdbf29bfcee2c8319?. It says the core is because of some race conditions and we have also faced the same issue before winbindd dumped the core. Since next pointer of child->requests was point to invalid memory address I thought memset of state would be safer. Will back port it to my code base which is 3.0.28. Will make our guys to think about moving to latest code base of Samba so that there would not be duplicate work. Thanks Jerry, Volker for your fast response. Will verify the changes and move the status.
Comment 6 Volker Lendecke 2009-01-01 14:45:52 UTC
Assuming it's fixed. Please re-open if it is not.