Bug 7526 - talloc: double free error - first free may be at ../dsdb/common/util.c:2705
Summary: talloc: double free error - first free may be at ../dsdb/common/util.c:2705
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.0
Classification: Unclassified
Component: DCE-RPCs and pipes (show other bugs)
Version: unspecified
Hardware: x64 Linux
: P3 critical (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: samba4-qa@samba.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-21 05:27 UTC by Michael Wood
Modified: 2012-10-24 21:39 UTC (History)
2 users (show)

See Also:


Attachments
Full backtrace from crashed samba process. (11.94 KB, text/plain)
2010-06-21 05:28 UTC, Michael Wood
no flags Details
valgrind /usr/local/samba/sbin/samba -i -M single (18.08 KB, text/plain)
2010-06-24 09:23 UTC, Michael Wood
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Wood 2010-06-21 05:27:39 UTC
Last week I compiled 65ca3e4, vampired from a Win2k3 machine, and left
it running over the weekend.  There have been a couple of other
instances of Samba4 that have vampired the domain before.  Nothing was
happening on the domain over the weekend, i.e. no added/deleted users
etc.

This morning there's a zombie samba process and its parent is:

/usr/local/samba/sbin/samba -i -M single -d10

In the window where I started Samba I see this:

[...]
queued DsReplicaSync for DC=xyz to
bd0034a8-6252-4b42-ba54-1270a66630c4._msdcs.xyz (urgent=false)
uSN=0:3244
queued DsReplicaSync for DC=xyz to
4a86865e-403e-4aa1-b3b5-f8508ba9c873._msdcs.xyz (urgent=false)
uSN=0:3244
dreplsrv_notify_schedule(5) scheduled for: Fri Jun 18 19:18:57 2010 SAST
Timed out smb_krb5 packet
Received smb_krb5 packet of length 154
talloc: double free error - first free may be at ../dsdb/common/util.c:2705
Bad talloc magic value - double free
PANIC: Bad talloc magic value - double free
*** glibc detected *** /usr/local/samba/sbin/samba: corrupted
double-linked list: 0x0000000002691e20 ***

I've attached to the process with gdb and done a bt full.  I'll attach it.
Comment 1 Michael Wood 2010-06-21 05:28:43 UTC
Created attachment 5801 [details]
Full backtrace from crashed samba process.
Comment 2 Matthias Dieter Wallnöfer 2010-06-24 08:16:32 UTC
Is this reproducible or does it happen randomly? Otherwise a valgrind report would also be deeply appreciated.
Comment 3 Michael Wood 2010-06-24 08:38:45 UTC
I am not sure if the following is the same problem.  It looks similar, though.

I started Samba up again and let it run.  This time I ran it as a daemon, so it just died and did not give me the chance to attach with gdb.  (No panic action.)

If this is the same thing, then, yes it seems to be repeatable.  Looks like it might have something to do with network timeouts?  The Samba box is in South Africa behind a sometimes congested and fairly low bandwidth ADSL line while the Windows box is in the US.

I am not sure how best to run Samba under valgrind.  Just "valgrind samba -i -M single"?  Should I have any panic action defined?

[Wed Jun 23 23:08:46 2010 SAST, 4 ../dsdb/repl/drepl_notify.c:363:dreplsrv_notify_check()]
queued DsReplicaSync for DC=x,DC=y,DC=z to bd0034a8-6252-4b42-ba54-1270a66630c4._msdcs.x.y.z (urgent=false) uSN=0:3244
[Wed Jun 23 23:08:46 2010 SAST, 4 ../dsdb/repl/drepl_notify.c:363:dreplsrv_notify_check()]
queued DsReplicaSync for DC=x,DC=y,DC=z to 4a86865e-403e-4aa1-b3b5-f8508ba9c873._msdcs.x.y.z (urgent=false) uSN=0:3244
[Wed Jun 23 23:08:46 2010 SAST, 4 ../dsdb/repl/drepl_notify.c:439:dreplsrv_notify_schedule()]
dreplsrv_notify_schedule(5) scheduled for: Wed Jun 23 23:08:51 2010 SAST
[Wed Jun 23 23:08:47 2010 SAST, 5 ../auth/kerberos/krb5_init_context.c:132:smb_krb5_request_timeout()]
Timed out smb_krb5 packet
[Wed Jun 23 23:08:50 2010 SAST, 5 ../auth/kerberos/krb5_init_context.c:132:smb_krb5_request_timeout()]
Timed out smb_krb5 packet
[Wed Jun 23 23:08:50 2010 SAST, 3 ../auth/gensec/gensec_gssapi.c:557:gensec_gssapi_update()]
[Wed Jun 23 23:08:50 2010 SAST, 0 ../../lib/util/debug.c:188:talloc_log_fn()]
Bad talloc magic value - unknown value
[Wed Jun 23 23:08:50 2010 SAST, 0 ../../lib/util/fault.c:143:smb_panic()]
PANIC: Bad talloc magic value - unknown value
[Wed Jun 23 23:08:50 2010 SAST, 0 ../../lib/util/fault.c:62:call_backtrace()]
BACKTRACE: 32 stack frames:
 #0 /usr/local/samba/lib/libsamba-util.so.0(call_backtrace+0x1f) [0x7f994c7f5a1f]
 #1 /usr/local/samba/lib/libsamba-util.so.0(smb_panic+0x235) [0x7f994c7f5d0c]
 #2 /usr/local/samba/lib/libtalloc-samba4.so.2(+0x1ff9) [0x7f994b916ff9]
 #3 /usr/local/samba/lib/libtalloc-samba4.so.2(+0x2087) [0x7f994b917087]
 #4 /usr/local/samba/lib/libtalloc-samba4.so.2(+0x20fe) [0x7f994b9170fe]
 #5 /usr/local/samba/lib/libtalloc-samba4.so.2(+0x2351) [0x7f994b917351]
 #6 /usr/local/samba/lib/libtalloc-samba4.so.2(+0x47c0) [0x7f994b9197c0]
 #7 /usr/local/samba/lib/libtalloc-samba4.so.2(talloc_strndup+0x54) [0x7f994b9198b9]
 #8 /usr/local/samba/lib/libgensec.so.0(+0x26300) [0x7f994b2e3300]
 #9 /usr/local/samba/lib/libgensec.so.0(+0x276a0) [0x7f994b2e46a0]
 #10 /usr/local/samba/lib/libgensec.so.0(gensec_update+0x4b) [0x7f994b2ec5fb]
 #11 /usr/local/samba/lib/libdcerpc.so.0(dcerpc_bind_auth_send+0x5ce) [0x7f994bf549e8]
 #12 /usr/local/samba/lib/libdcerpc.so.0(dcerpc_pipe_auth_send+0x44b) [0x7f994bf56d39]
 #13 /usr/local/samba/lib/libdcerpc.so.0(+0x3332c) [0x7f994bf5c32c]
 #14 /usr/local/samba/lib/libdcerpc.so.0(+0x3317d) [0x7f994bf5c17d]
 #15 /usr/local/samba/lib/libldb-samba4.so.0(composite_done+0xb8) [0x7f994d35d14a]
 #16 /usr/local/samba/lib/libdcerpc.so.0(+0x326db) [0x7f994bf5b6db]
 #17 /usr/local/samba/lib/libldb-samba4.so.0(composite_done+0xb8) [0x7f994d35d14a]
 #18 /usr/local/samba/lib/libdcerpc.so.0(+0x31795) [0x7f994bf5a795]
 #19 /usr/local/samba/lib/libldb-samba4.so.0(composite_done+0xb8) [0x7f994d35d14a]
 #20 /usr/local/samba/lib/libdcerpc.so.0(+0x312eb) [0x7f994bf5a2eb]
 #21 /usr/local/samba/lib/libldb-samba4.so.0(composite_done+0xb8) [0x7f994d35d14a]
 #22 /usr/local/samba/lib/libldb-samba4.so.0(+0x11a082) [0x7f994d38c082]
 #23 /usr/local/samba/lib/libtevent-samba4.so.0(+0x7833) [0x7f994c192833]
 #24 /usr/local/samba/lib/libtevent-samba4.so.0(+0x7f9f) [0x7f994c192f9f]
 #25 /usr/local/samba/lib/libtevent-samba4.so.0(_tevent_loop_once+0xe8) [0x7f994c18ebb8]
 #26 /usr/local/samba/lib/libtevent-samba4.so.0(tevent_common_loop_wait+0x25) [0x7f994c18edf5]
 #27 /usr/local/samba/lib/libtevent-samba4.so.0(_tevent_loop_wait+0x2b) [0x7f994c18eec0]
 #28 /usr/local/samba/sbin/samba() [0x791009]
 #29 /usr/local/samba/sbin/samba() [0x79104f]
 #30 /lib/libc.so.6(__libc_start_main+0xfd) [0x7f994905fc4d]
 #31 /usr/local/samba/sbin/samba() [0x439919]
Comment 4 Michael Wood 2010-06-24 09:23:53 UTC
Created attachment 5807 [details]
valgrind /usr/local/samba/sbin/samba -i -M single

I ran it through valgrind and it crashed very soon afterwards.  Hope this is what you wanted.
Comment 5 Matthias Dieter Wallnöfer 2010-06-30 02:23:27 UTC
abartlet, can you look into this? I've really no clue.
Comment 6 Andrew Bartlett 2010-07-01 20:56:29 UTC
So, this is probably the semi-async code biting us.  See how we have a Kerberos request to the KDC half-way down this code?  While we are blocked waiting for this, we have had a callback free the parent context we are on, and so we go boom.

The fix is probably to either take a reference to whatever is being free'ed until we exit the call stack, or to ensure the talloc_free() waits for this part of the request to end. 
Comment 7 Matthias Dieter Wallnöfer 2010-10-04 10:43:39 UTC
Metze, is this now fixed by your patch: http://gitweb.samba.org/samba.git/?p=samba.git;a=commitdiff;h=4423aa59abda50c8b71815f922ea03e2009f9e50?
Comment 8 Matthias Dieter Wallnöfer 2010-10-16 12:48:45 UTC
Well, I think this has been fixed now - if not, please reopen!
Comment 9 Michael Wood 2010-10-17 14:49:14 UTC
Unfortunately I no longer have access to the remote Win2k3 machine, so I can't test this now.  If I run into the problem in future I will reopen this bug.
Comment 10 Stefan Metzmacher 2010-10-18 02:51:02 UTC
My fix has nothing to do with this bug,
so I don't think it's fixed.

I think it's just our wellknown bug, that
we have bugs in handling of broken connections.

I hope to fix this correctly while creating
the new dcerpc library.
http://wiki.samba.org/index.php/DCERPC
Comment 11 John Westerlund 2010-11-05 09:13:41 UTC
(In reply to comment #10)
> My fix has nothing to do with this bug,
> so I don't think it's fixed.
> 
> I think it's just our wellknown bug, that
> we have bugs in handling of broken connections.
> 
> I hope to fix this correctly while creating
> the new dcerpc library.
> http://wiki.samba.org/index.php/DCERPC
> 


Hello,

Do you know status now about this bug or when it going to be fixxed?
Is there any work around?

/
John
Comment 12 Matthias Dieter Wallnöfer 2012-03-15 08:57:20 UTC
metze,

shouldn't this have been fixed by your recent rpc library rework?
Comment 13 Stefan Metzmacher 2012-03-15 15:40:16 UTC
I'm not sure, I fixed the the rpc layer not the gensec layer.
Comment 14 Stefan Metzmacher 2012-03-15 15:47:11 UTC
I'm not sure, I fixed the the rpc layer not the gensec layer.
Comment 15 Matthias Dieter Wallnöfer 2012-03-15 20:45:21 UTC
Michael does this still bite you? Or can we close this.
Comment 16 Michael Wood 2012-03-16 06:21:04 UTC
(In reply to comment #15)
> Michael does this still bite you? Or can we close this.

My comment from comment #9 still applies, but maybe John can comment.
Comment 17 Andrew Bartlett 2012-08-23 04:27:32 UTC
I think this is fixed, as we now delay the shutdown of the DCE/RPC pipe while we are doing kerberos ops.
Comment 18 Stefan Metzmacher 2012-10-24 21:39:08 UTC
(In reply to comment #17)
> I think this is fixed, as we now delay the shutdown of the DCE/RPC pipe while
> we are doing kerberos ops.

I agree that this is very likely be fixed