Bug 6793 - winbindd crash with "INTERNAL ERROR: Signal 6" (double-free of "entry_dn")
Summary: winbindd crash with "INTERNAL ERROR: Signal 6" (double-free of "entry_dn")
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.4.1
Hardware: x86 Linux
: P3 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Reported: 2009-10-08 13:38 UTC by Pavel May
Modified: 2009-10-31 04:36 UTC (History)
1 user (show)

See Also:

Details of compile options, smb.conf, et cetera. (30.69 KB, text/plain)
2009-10-08 13:39 UTC, Pavel May
no flags Details
WinbindD under valgrind. (18.82 KB, text/plain)
2009-10-09 08:08 UTC, Pavel May
no flags Details
Patch for 3.4 (1.58 KB, patch)
2009-10-09 15:05 UTC, Volker Lendecke
no flags Details
valgrind output of winbindd -d 10 (23.89 KB, text/plain)
2009-10-13 13:49 UTC, Pavel May
no flags Details
winbindd -d 10, under valgrind. (153.02 KB, application/bzip2)
2009-10-13 13:50 UTC, Pavel May
no flags Details
patch (671 bytes, patch)
2009-10-13 13:58 UTC, Volker Lendecke
no flags Details
git-am format patch for 3.4.3 to fix problem with attachment #5 (946 bytes, patch)
2009-10-14 13:13 UTC, Jeremy Allison
vl: review+
gd: review+
git-am patch for 3.3.9 (1.64 KB, patch)
2009-10-14 13:48 UTC, Jeremy Allison
vl: review+
gd: review+
winbindd + both patches, -d 10, fails to look up AD users. (28.61 KB, application/bzip2)
2009-10-15 07:57 UTC, Pavel May
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel May 2009-10-08 13:38:47 UTC
Symptoms: when trying to ssh in (PAM), winbindd cores. May core at other times, haven't checked.

OS: CentOS 5.3 x86_64

Details in attached log file.
Comment 1 Pavel May 2009-10-08 13:39:15 UTC
Created attachment 4821 [details]
Details of compile options, smb.conf, et cetera.
Comment 2 Volker Lendecke 2009-10-09 01:45:43 UTC
Can you compile with -g and run winbind under valgrind?


Comment 3 Pavel May 2009-10-09 08:06:14 UTC

I've recompiled the whole 3.4.1 tree with "--enable-developer" tacked onto ./configure's list of options, then fed winbindd to valgrind.

Running winbindd under valgrind and SSHing in actually let me in (logged in successfully) and I did not notice winbindd dying. 

Re-ran winbindd in stand-alone mode, and the crash was reproduced.

valgrind's output is attached.
Comment 4 Pavel May 2009-10-09 08:08:40 UTC
Created attachment 4825 [details]
WinbindD under valgrind.
Comment 5 Volker Lendecke 2009-10-09 15:05:15 UTC
Created attachment 4829 [details]
Patch for 3.4

Can you try the attached patch?


Comment 6 Pavel May 2009-10-10 08:41:16 UTC

Thanks for the patch. Tried it. WinbindD stays without crashing when I try to SSH in. When I start smbd/nmbd and try to access a share, WinbindD does crash, though:
INTERNAL ERROR: Signal 6 in pid 5731 (3.4.1)
Please read the Trouble-Shooting section of the Samba3-HOWTO

From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
smb_panic: clobber_region() last called from [sid_to_fstring(178)]
PANIC (pid 5731): internal error
BACKTRACE: 35 stack frames:
 #0 winbindd(log_stack_trace+0x1c) [0x7f031e5242de]
 #1 winbindd(smb_panic+0x153) [0x7f031e5240b9]
 #2 winbindd [0x7f031e50d7ac]
 #3 winbindd [0x7f031e50d7bf]
 #4 /lib64/libc.so.6 [0x7f031c0a8280]
 #5 /lib64/libc.so.6(gsignal+0x35) [0x7f031c0a8215]
 #6 /lib64/libc.so.6(abort+0x110) [0x7f031c0a9cc0]
 #7 /usr/lib64/libtalloc.so.1 [0x7f031ca0186e]
 #8 /usr/lib64/libtalloc.so.1 [0x7f031ca0188d]
 #9 /usr/lib64/libtalloc.so.1 [0x7f031ca0198a]
 #10 /usr/lib64/libtalloc.so.1 [0x7f031ca02587]
 #11 /usr/lib64/libtalloc.so.1(talloc_free+0x15) [0x7f031ca031d2]
 #12 /usr/lib64/nss_info/adex.so [0x7f03172396dc]
 #13 /usr/lib64/nss_info/adex.so [0x7f0317239b11]
 #14 /usr/lib64/nss_info/adex.so [0x7f0317239ee5]
 #15 /usr/lib64/nss_info/adex.so [0x7f031723b6c4]
 #16 /usr/lib64/nss_info/adex.so [0x7f0317234f25]
 #17 winbindd(idmap_backends_sid_to_unixid+0x172) [0x7f031e99fb7a]
 #18 winbindd(idmap_sid_to_gid+0x3fd) [0x7f031e9a1509]
 #19 winbindd(winbindd_dual_sid2gid+0x1b5) [0x7f031e4764ae]
 #20 winbindd [0x7f031e468785]
 #21 winbindd [0x7f031e46c34e]
 #22 winbindd [0x7f031e468259]
 #23 winbindd(async_request+0x348) [0x7f031e4679ee]
 #24 winbindd(do_async+0x183) [0x7f031e46c69c]
 #25 winbindd(winbindd_sid2uid_async+0x25e) [0x7f031e475d08]
 #26 winbindd [0x7f031e427a79]
 #27 winbindd [0x7f031e4702f4]
 #28 winbindd [0x7f031e46c517]
 #29 winbindd [0x7f031e4681f5]
 #30 winbindd [0x7f031e424492]
 #31 winbindd [0x7f031e425a09]
 #32 winbindd(main+0xde7) [0x7f031e426851]
 #33 /lib64/libc.so.6(__libc_start_main+0xf4) [0x7f031c095974]
 #34 winbindd [0x7f031e422ba9]
smb_panic(): calling panic action [/bin/sleep 999999999]
[ 5905]: request interface version
[ 5905]: request location of privileged pipe
[ 5905]: getpwnam pmay
[ 5727]: lookupname DELACY\pmay
[ 5727]: lookupsid S-1-5-21-79843086-108998794-1039276024-4393
[ 5908]: request interface version
[ 5908]: request location of privileged pipe
final write to client failed: Broken pipe
Comment 7 Pavel May 2009-10-10 08:41:42 UTC
Additionally, SSH/PAM auth fails.
Comment 8 Volker Lendecke 2009-10-11 15:18:21 UTC
I'm tempted to say that you should contact Likewise Software about that bug, it's their code. But as they have stopped supporting Samba, I guess it is now upon the Samba Team to clean up what's there.

Again -- does valgrind show anything significant? Alternatively, can you please send a full debug level 10 log up to that crash?


Comment 9 Pavel May 2009-10-12 07:23:45 UTC
I strongly suspect that Likewise Software would come back with something like:

1) Sure. The solution is "Likewise Open (tm)(r)(c)(q)(z)(v)(f)(p)"


I'll re-run the new binary under valgrind, and will post debuglevel 10 logs.

Previous valgrind attempt suggests this to be a heisenbug, but my valgrind-fu is weak.
Comment 10 Pavel May 2009-10-13 13:48:51 UTC
Ok. Attached two files, as requested.
Comment 11 Pavel May 2009-10-13 13:49:35 UTC
Created attachment 4835 [details]
valgrind output of winbindd -d 10
Comment 12 Pavel May 2009-10-13 13:50:29 UTC
Created attachment 4836 [details]
winbindd -d 10, under valgrind.
Comment 13 Pavel May 2009-10-13 13:52:12 UTC
The behavior of winbindd is still:
1) PAM Auth succeeds (smbd/nmbd off)
2) Accessing a share fails with winbindd crashing (smbd/nmbd on)
Comment 14 Volker Lendecke 2009-10-13 13:58:34 UTC
Created attachment 4837 [details]

Can you try the attached patch?


Comment 15 Jeremy Allison 2009-10-13 19:06:26 UTC
+1 - this is obviously correct. The entry_dn variable is being freed twice, once when frame is deleted, and then again. The "talloc_tos()" reference at line 371 should be renamed to "frame" as well to make this clearer (IMHO). I think we need this for 3.4.3.
Comment 16 Volker Lendecke 2009-10-14 03:15:40 UTC
Karo, I think both patches are required for 3.4.3. The second one has formally been reviewed, the first one not yet, but the reporter got much further with that patch applied.

Comment 17 Karolin Seeger 2009-10-14 03:31:09 UTC
Pushed both patches to v3-4-test.
Closing out bug report.
Please re-open if it's still an issue.

Comment 18 Guenther Deschner 2009-10-14 09:29:04 UTC
Shouldnt that go into 3.3 as well ?
Comment 19 Jeremy Allison 2009-10-14 13:07:38 UTC
I think there is a bug in attachment #5 [details]


The new code doesn't initialize the fstring mapped_user from the state->request->data.auth.user value, in fact it doesn't initialize it at all.

Further patch for 3.4.3 to follow.

Comment 20 Jeremy Allison 2009-10-14 13:13:43 UTC
Created attachment 4844 [details]
git-am format patch for 3.4.3 to fix problem with attachment #5 [details]
Comment 21 Jeremy Allison 2009-10-14 13:48:06 UTC
Created attachment 4845 [details]
git-am patch for 3.3.9

Here is the same patch for the 3.3.9 codebase as attachment 4829 [details] and attachment 4844 [details] combined. It should fix the issue for 3.3.9. Not that the patch in attachment 4837 [details] is not needed as the code in 3.3.x doesn't use talloc here.

I think this needs to go into 3.3.9 as it's a nasty interface misuse that can easily lead to winbindd crashes.

Comment 22 Karolin Seeger 2009-10-15 07:38:32 UTC
Sorry, it's too late to include it in 3.3.9.
Can be shipped with 3.3.10 once review has been granted.
Comment 23 Pavel May 2009-10-15 07:50:58 UTC
If my summary is objectionable, please feel free to fix (or suggest) in a better manner.

Re: Volker's patch to "idmap_adex/provider_unified.c": 
New behavior: WinbdindD no longer knows who any AD-based user/group is. Will try to leave AD/delete host account/re-join, but it is a datapoint. 

When running winbindd -d 10 under valgrind, are the default CLI switches to valgrind sufficient or are there others which will make the output more useful?
Comment 24 Pavel May 2009-10-15 07:57:23 UTC
Created attachment 4854 [details]
winbindd + both patches, -d 10, fails to look up AD users.
Comment 25 Pavel May 2009-10-15 08:08:34 UTC
"net ads testjoin -P" claims that the "Join is OK".

"id" or "id pmay" shows a lack of knowledge about AD-based users/groups.

"getent passwd" pauses, there is a lot of activity in winbindd's log (running -F -S -i -d 3), and then nothing from AD is listed.

$ find /usr/lib64 -name adex* -exec ls -alF {} \;
-rwxr-xr-x 1 root root 159413 Oct 13 16:02 /usr/lib64/idmap/adex.so*
lrwxrwxrwx 1 root root 16 Sep 24 09:29 /usr/lib64/nss_info/adex.so -> ../idmap/adex.so

The above output *seems* sane.

Looking through the output, saw this:
  cell_do_search: Base = ,  Filter = (|(&(uid=pmay)(objectclass=User))(&(displayName=pmay)(objectclass=Group))), Scope = 2, GC = yes
  cell_do_search: Located 0 entries
"ldapsearch" below, however, returns my user record:
ldapsearch -h nyc-wdc-000 -x -LLL -b "CN=Users,DC=Delacy,DC=COM" -v -D "cn=a_pmay,ou=Service Accounts,DC=Delacy,DC=COM" -W "(|(&(uid=pmay)(objectclass=User))(&(displayName=pmay)(objectclass=Group)))" dn
Enter LDAP Password: 
filter: (|(&(uid=pmay)(objectclass=User))(&(displayName=pmay)(objectclass=Group)))
requesting: dn 
dn: CN=Pavel May,CN=Users,DC=DELACY,DC=com

Colour me confused. (A lovely shade of mauve taupe, in case you're wondering).
Comment 26 Guenther Deschner 2009-10-15 10:11:37 UTC
Comment on attachment 4844 [details]
git-am format patch for 3.4.3 to fix problem with attachment #5 [details]

Yes, absolutely correct. Otherwise, any PAM_AUTH request to a trusted domain will end up being sent to the local SAM child.
Comment 27 Guenther Deschner 2009-10-15 10:12:39 UTC
Comment on attachment 4845 [details]
git-am patch for 3.3.9

same here.
Comment 28 Guenther Deschner 2009-10-15 10:13:38 UTC
Karolin, please pull the additional patch from Jeremy for 3.4.3 (and for 3.3.9).
Comment 29 Pavel May 2009-10-15 16:55:21 UTC
Switching to the "ad" backend from "adex", with the patch 4829, fixes the crashes under the previous "crud, it crashed" conditions.
Comment 30 Guenther Deschner 2009-10-15 17:01:05 UTC
Pavel, I am reopening so that Karolin can pick the remaining patch which we need in any case.
Comment 31 Pavel May 2009-10-15 17:03:37 UTC

Far be it from me to object. Thanks for all the help.
Comment 32 Karolin Seeger 2009-10-16 07:56:47 UTC
(In reply to comment #28)
> Karolin, please pull the additional patch from Jeremy for 3.4.3 (and for
> 3.3.9).

Closing out bug report.