Bug 9820 - crash of winbind after "ls -l /usr/local/samba/var/locks/sysvol"
crash of winbind after "ls -l /usr/local/samba/var/locks/sysvol"
Status: RESOLVED FIXED
Product: Samba 4.0
Classification: Unclassified
Component: Winbind
4.0.5
x64 Linux
: P5 normal
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
: 9842 (view as bug list)
Depends on:
Blocks: 9842
  Show dependency treegraph
 
Reported: 2013-04-21 12:03 UTC by François Lafont
Modified: 2013-08-07 09:10 UTC (History)
10 users (show)

See Also:


Attachments
the log without valgrind and with valgrind (162.43 KB, application/x-bzip)
2013-04-22 23:07 UTC, François Lafont
no flags Details
log from the mailing list, without 'lost memory' noise (117.61 KB, text/plain)
2013-05-28 11:04 UTC, Andrew Bartlett
no flags Details
proposed, but unsted patch to fix this (1.07 KB, patch)
2013-06-15 11:14 UTC, Andrew Bartlett
no flags Details
rework how BUILTIN domains are handled in s4-winbind (4.24 KB, patch)
2013-06-15 13:39 UTC, Andrew Bartlett
no flags Details
Patches for v4-0-test and v4-1-test (19.12 KB, patch)
2013-07-12 09:15 UTC, Stefan Metzmacher
abartlet: review+
metze: review+
Details
correct 4.1 patch cherry-picked from master (14.77 KB, patch)
2013-07-29 23:47 UTC, Andrew Bartlett
metze: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description François Lafont 2013-04-21 12:03:22 UTC
I used Samba 4.0.5 in Wheezy. Here is that I have done:

---------------------------------------------------------------
samba-tool domain provision --realm=CHEZMOI.PRIV --domain=CHEZMOI \
    --server-role=dc --dns-backend=SAMBA_INTERNAL --adminpass='+toto123'
echo "nameserver 192.168.0.21" > /etc/resolv.conf
samba

ln -s /usr/local/samba/lib/libnss_winbind.so /lib/libnss_winbind.so
ln -s /lib/libnss_winbind.so /lib/libnss_winbind.so.2

# I put "winbind" in the nsswitch.conf file.
sed -i -r -e 's/^(passwd:.*)$/\1 winbind/g' -e 's/^(group:.*)$/\1 winbind/g' /etc/nsswitch.conf
---------------------------------------------------------------

After the insignificant command below, winbind is crashing:

---------------------------------------------------------------
# time ls -l /usr/local/samba/var/locks/sysvol
total 8
drwxrws---+ 4 root 3000000 4096 Apr 14 01:40 chezmoi.priv

real	0m33.483s # <---- ***33 seconds !***
user	0m0.012s
sys	0m0.000s

# wbinfo -u
Error looking up domain users

# wbinfo -g
failed to call wbcListGroups: WBC_ERR_WINBIND_NOT_AVAILABLE
Error looking up domain groups

# wbinfo -i Guest
failed to call wbcGetpwnam: WBC_ERR_WINBIND_NOT_AVAILABLE
Could not get info for user Guest

# wbinfo -p
Ping to winbindd failed
could not ping winbindd!
---------------------------------------------------------------

I have to restart samba:

---------------------------------------------------------------
# killall samba; sleep 2; samba

# wbinfo -u
Administrator
Guest
krbtgt
test1

# wbinfo -g
Enterprise Read-Only Domain Controllers
Domain Admins
Domain Users
Domain Guests
Domain Computers
Domain Controllers
Schema Admins
Enterprise Admins
Group Policy Creator Owners
Read-Only Domain Controllers
DnsUpdateProxy

# wbinfo -p
Ping to winbindd succeeded
---------------------------------------------------------------

Here is the ouput of the "samba -i -M single -d 10" command during the "ls -l /usr/local/samba/var/locks/sysvol/" problem:

http://sisco.laf.free.fr/codes/samba4_gid_3000000.log

I have tried this too:

---------------------------------------------------------------
apt-get install valgrind

./configure --enable-debug  #<--- I add the --enable-debug
make 
make install

samba-tool domain provision --realm=CHEZMOI.PRIV --domain=CHEZMOI \
    --server-role=dc --dns-backend=SAMBA_INTERNAL --adminpass='+toto123'
echo "nameserver 192.168.0.21" > /etc/resolv.conf
samba

ln -s /usr/local/samba/lib/libnss_winbind.so /lib/libnss_winbind.so
ln -s /lib/libnss_winbind.so /lib/libnss_winbind.so.2

# I put "winbind" in the nsswitch.conf file.
sed -i -r -e 's/^(passwd:.*)$/\1 winbind/g' -e 's/^(group:.*)$/\1 winbind/g' /etc/nsswitch.conf
---------------------------------------------------------------

and I have done this:

---------------------------------------------------------------
valgrind --leak-check=full samba -i M single > out 2>&1
---------------------------------------------------------------

Here is the output during the "ls -l /usr/local/samba/var/locks/sysvol/" problem:

http://sisco.laf.free.fr/codes/samba4_gid_3000000_valgrind.log
Comment 1 François Lafont 2013-04-22 23:07:31 UTC
Created attachment 8805 [details]
the log without valgrind and with valgrind
Comment 2 François Lafont 2013-04-22 23:09:09 UTC
(In reply to comment #0)

> I have tried this too:
> 
> ---------------------------------------------------------------
> apt-get install valgrind
> 
> ./configure --enable-debug  #<--- I add the --enable-debug
> make 
> make install
> 
> samba-tool domain provision --realm=CHEZMOI.PRIV --domain=CHEZMOI \
>     --server-role=dc --dns-backend=SAMBA_INTERNAL --adminpass='+toto123'
> echo "nameserver 192.168.0.21" > /etc/resolv.conf
> samba

Just remove the line just below because I run samba with...

> 
> ln -s /usr/local/samba/lib/libnss_winbind.so /lib/libnss_winbind.so
> ln -s /lib/libnss_winbind.so /lib/libnss_winbind.so.2
> 
> # I put "winbind" in the nsswitch.conf file.
> sed -i -r -e 's/^(passwd:.*)$/\1 winbind/g' -e 's/^(group:.*)$/\1 winbind/g'
> /etc/nsswitch.conf
> ---------------------------------------------------------------
> 
> and I have done this:
> 
> ---------------------------------------------------------------
> valgrind --leak-check=full samba -i M single > out 2>&1
> ---------------------------------------------------------------

... this line just below.

I have attached the two logs directly to the bug report here.
Comment 3 Volker Lendecke 2013-04-23 07:21:22 UTC
Hmm. There's no crash mentioned in those logs. Probably the logs did not catch the right winbind component.
Comment 4 François Lafont 2013-04-24 20:46:14 UTC
(In reply to comment #3)
> Hmm. There's no crash mentioned in those logs. Probably the logs did not catch
> the right winbind component.

Damn it. Yet, after the long "ls -l /.../sysvol" command, the "wbinfo -u" command fails systematically.

Is there something that I can do in order to mentioned this problem in the logs?
Comment 5 philippe.simonet 2013-05-28 07:18:01 UTC
same problem by me, present in 4.0.5 and 4.0.6. i've done a bisect between 4.04 and 4.05, to produce the problem : (3.2.0-4-amd64 #1 SMP Debian 3.2.39-2 x86_64 GNU/Linux)
---------------------
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --enable-fhs 
make install 
samba -i -M single 
wbinfo --uid-info 3000000

---------------------
last bisect : 
---------------------
git bisect good
f77d5d6479c879c8770fbc9a6ca5656ef3e41019 is the first bad commit commit f77d5d6479c879c8770fbc9a6ca5656ef3e41019
Author: Timur Bakeyev <timur@FreeBSD.org>
Date:   Wed Feb 27 16:25:07 2013 -0800

    Fix bug # 9666 - Broken filtering of link-local addresses.
    
    This patch should address the problem with Link Local addresses
    on FreeBSD and Linux.
    
    Reviewed-by: Jeremy Allison <jra@samba.org>
    
    Autobuild-User(v4-0-test): Karolin Seeger <kseeger@samba.org>
    Autobuild-Date(v4-0-test): Fri Mar  1 18:21:19 CET 2013 on sn-devel-104

:040000 040000 e022079ce7298f5cfa9d99e51e7afedb35048b02 164c1aba0559999b0179d3b47f415f6e3e5b3cd7 M      lib
---------------------

and here the samba console log while the wbinfo call :

wbinfo ok , no crash : 
--------------------------------------------
Terminating connection - 'wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
not adding non-broadcast interface tun1
not adding non-broadcast interface tun0
not adding non-broadcast interface tun1
not adding non-broadcast interface tun0
interpret_string_addr_internal: getaddrinfo failed for name (null) (flags 4) [Name or service not known] not adding non-broadcast interface tun1 not adding non-broadcast interface tun0 not adding non-broadcast interface tun1 not adding non-broadcast interface tun0 not adding non-broadcast interface tun1 not adding non-broadcast interface tun0 not adding non-broadcast interface tun1 not adding non-broadcast interface tun0
interpret_addr: host address is invalid for host fe80::5246:5dff:fea3:7167%eth0 Terminating connection - 'wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]

wbinfo doing samba crash :
-------------------------------------------------
Terminating connection - 'wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
not adding non-broadcast interface tun0
not adding non-broadcast interface tun1
not adding non-broadcast interface tun0
not adding non-broadcast interface tun1
interpret_string_addr_internal: getaddrinfo failed for name (null) (flags 4) [Name or service not known] not adding non-broadcast interface tun0 not adding non-broadcast interface tun1 not adding non-broadcast interface tun0 not adding non-broadcast interface tun1 not adding non-broadcast interface tun0 not adding non-broadcast interface tun1 not adding non-broadcast interface tun0 not adding non-broadcast interface tun1
/usr/sbin/smbd: Allowed connection from 192.168.1.113 (192.168.1.113)
/usr/sbin/smbd: init_oplocks: initializing messages.
/usr/sbin/smbd: Transaction 0 of length 194 (0 toread)
/usr/sbin/smbd: switch message SMBnegprot (pid 14995) conn 0x0
/usr/sbin/smbd: Requested protocol [PC NETWORK PROGRAM 1.0]
/usr/sbin/smbd: Requested protocol [MICROSOFT NETWORKS 1.03]
/usr/sbin/smbd: Requested protocol [MICROSOFT NETWORKS 3.0]
/usr/sbin/smbd: Requested protocol [LANMAN1.0]
/usr/sbin/smbd: Requested protocol [LM1.2X002]
/usr/sbin/smbd: Requested protocol [DOS LANMAN2.1]
/usr/sbin/smbd: Requested protocol [LANMAN2.1]
/usr/sbin/smbd: Requested protocol [Samba]
/usr/sbin/smbd: Requested protocol [NT LANMAN 1.0]
/usr/sbin/smbd: Requested protocol [NT LM 0.12]
/usr/sbin/smbd: GENSEC backend 'gssapi_spnego' registered
/usr/sbin/smbd: GENSEC backend 'gssapi_krb5' registered
/usr/sbin/smbd: GENSEC backend 'gssapi_krb5_sasl' registered
/usr/sbin/smbd: GENSEC backend 'schannel' registered
/usr/sbin/smbd: GENSEC backend 'spnego' registered
/usr/sbin/smbd: GENSEC backend 'ntlmssp' registered
/usr/sbin/smbd: GENSEC backend 'krb5' registered
/usr/sbin/smbd: GENSEC backend 'fake_gssapi_krb5' registered
/usr/sbin/smbd: ldb_wrap open of secrets.ldb
/usr/sbin/smbd: AUTH backend 'sam' registered
/usr/sbin/smbd: AUTH backend 'sam_ignoredomain' registered
/usr/sbin/smbd: AUTH backend 'anonymous' registered
/usr/sbin/smbd: AUTH backend 'winbind' registered
/usr/sbin/smbd: AUTH backend 'winbind_wbclient' registered
/usr/sbin/smbd: AUTH backend 'name_to_ntstatus' registered
/usr/sbin/smbd: AUTH backend 'unix' registered
/usr/sbin/smbd: using SPNEGO
/usr/sbin/smbd: Selected protocol NT LANMAN 1.0
Kerberos: AS-REQ GWNOIS03$@TEST.CH from ipv4:192.168.1.113:57556 for krbtgt/TEST.CH@TEST.CH
Kerberos: No preauth found, returning PREAUTH-REQUIRED -- GWNOIS03$@TEST.CH
Kerberos: AS-REQ GWNOIS03$@TEST.CH from ipv4:192.168.1.113:42916 for krbtgt/TEST.CH@TEST.CH
Kerberos: Client sent patypes: encrypted-timestamp
Kerberos: Looking for PKINIT pa-data -- GWNOIS03$@TEST.CH
Kerberos: Looking for ENC-TS pa-data -- GWNOIS03$@TEST.CH
Kerberos: ENC-TS Pre-authentication succeeded -- GWNOIS03$@TEST.CH using arcfour-hmac-md5
Kerberos: AS-REQ authtime: 2013-04-30T22:18:57 starttime: unset endtime: 2013-05-01T08:18:57 renew till: unset
Kerberos: Client supported enctypes: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96, des3-cbc-sha1, des3-cbc-md5, arcfour-hmac-md5, using arcfour-hmac-md5/arcfour-hmac-md5
Kerberos: Requested flags: proxiable, forwardable
Kerberos: TGS-REQ GWNOIS03$@TEST.CH from ipv4:192.168.1.113:53697 for cifs/gwnois03.test.ch@TEST.CH [canonicalize]
Kerberos: TGS-REQ authtime: 2013-04-30T22:18:57 starttime: 2013-04-30T22:18:57 endtime: 2013-05-01T08:18:57 renew till: unset
Kerberos: TGS-REQ GWNOIS03$@TEST.CH from ipv4:192.168.1.113:45930 for krbtgt/TEST.CH@TEST.CH [forwarded, forwardable]
Kerberos: TGS-REQ authtime: 2013-04-30T22:18:57 starttime: 2013-04-30T22:18:57 endtime: 2013-05-01T08:18:57 renew till: unset
/usr/sbin/smbd: Transaction 1 of length 2688 (0 toread)
/usr/sbin/smbd: switch message SMBsesssetupX (pid 14995) conn 0x0
/usr/sbin/smbd: wct=12 flg2=0xc803
/usr/sbin/smbd: Doing spnego session setup
/usr/sbin/smbd: NativeOS=[Unix] NativeLanMan=[Samba 4.0.4-GIT-f77d5d6] PrimaryDomain=[TEST]
/usr/sbin/smbd: ldb_wrap open of secrets.ldb
/usr/sbin/smbd: ldb_wrap open of privilege.ldb Terminating connection - 'wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
/usr/sbin/smbd: Adding homes service for user 'TEST\GWNOIS03$' using home directory: '/srv1/home/%U'
/usr/sbin/smbd: adding home's share [GWNOIS03$] for user 'TEST\GWNOIS03$' at '/srv1/home/%U'
/usr/sbin/smbd: Transaction 2 of length 102 (0 toread)
/usr/sbin/smbd: switch message SMBtconX (pid 14995) conn 0x0
/usr/sbin/smbd: Allowed connection from 192.168.1.113 (192.168.1.113)
/usr/sbin/smbd: Connect path is '/tmp' for service [IPC$]
/usr/sbin/smbd: Initialising default vfs hooks
/usr/sbin/smbd: Initialising custom vfs hooks from [/[Default VFS]/]
/usr/sbin/smbd: Initialising custom vfs hooks from [acl_xattr]
/usr/sbin/smbd: Module 'acl_xattr' loaded
/usr/sbin/smbd: Initialising custom vfs hooks from [dfs_samba4]
/usr/sbin/smbd: connect_acl_xattr: setting 'inherit acls = true' 'dos filemode = true' and 'force unknown acl user = true' for service IPC$
/usr/sbin/smbd: 192.168.1.113 (ipv4:192.168.1.113:54676) connect to service IPC$ initially as user TEST\GWNOIS03$ (uid=3000022, gid=3000023) (pid 14995)
/usr/sbin/smbd: tconX service=IPC$
/usr/sbin/smbd: Transaction 3 of length 108 (0 toread)
/usr/sbin/smbd: switch message SMBntcreateX (pid 14995) conn 0x2723c70
/usr/sbin/smbd: Transaction 4 of length 160 (0 toread)
/usr/sbin/smbd: switch message SMBtrans (pid 14995) conn 0x2723c70
/usr/sbin/smbd: trans <\PIPE\> data=72 params=0 setup=2
/usr/sbin/smbd: named pipe command on <> name
/usr/sbin/smbd: Got API command 0x26 on pipe "netlogon" (pnum 8102)
/usr/sbin/smbd: Transaction 5 of length 104 (0 toread)
/usr/sbin/smbd: switch message SMBntcreateX (pid 14995) conn 0x2723c70 Terminating connection - 'wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
Terminating connection - 'wbsrv_samba3_send_reply_done: tstream_writev_queue_recv() - 32:Broken pipe'
single_terminate: reason[wbsrv_samba3_send_reply_done: tstream_writev_queue_recv() - 32:Broken pipe] Terminating connection - 'NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[NT_STATUS_CONNECTION_DISCONNECTED]
Terminating connection - 'NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[NT_STATUS_CONNECTION_DISCONNECTED]
/usr/sbin/smbd: 192.168.1.113 (ipv4:192.168.1.113:54676) closed connection to service IPC$
Kerberos: TGS-REQ GWNOIS03$@TEST.CH from ipv4:192.168.1.113:53290 for host/gwnois03.test.ch@TEST.CH [canonicalize]
Kerberos: TGS-REQ authtime: 2013-04-30T22:18:57 starttime: 2013-04-30T22:19:32 endtime: 2013-05-01T08:18:57 renew till: unset
/usr/sbin/smbd: Server exit (failed to receive smb request) Terminating connection - 'wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
single_terminate: reason[wbsrv_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
===============================================================
INTERNAL ERROR: Signal 11 in pid 14988 (4.0.4-GIT-f77d5d6) Please read the Trouble-Shooting section of the Samba HOWTO ===============================================================
PANIC: internal error
Aborted
Comment 6 Andrew Bartlett 2013-05-28 11:04:32 UTC
Created attachment 8926 [details]
log from the mailing list, without 'lost memory' noise
Comment 7 Timur Bakeyev 2013-05-28 12:30:57 UTC
(In reply to comment #6)
> Created attachment 8926 [details]
> log from the mailing list, without 'lost memory' noise

I see "access after free" in this log, could it be connected to the:
https://bugzilla.samba.org/show_bug.cgi?id=9832 ?
Comment 8 Andrew Bartlett 2013-05-28 12:36:52 UTC
It would be very useful to rule in or out that this still reproduces with current master.  That said, I don't think it is bug 9832, because this would show up in printing the message, not deep in kerberos routines.
Comment 9 Thys Nel 2013-06-06 09:04:09 UTC
This is still broken on the current master (as at 6 June).

Another way to reproduce this (mentioned on mailing list) is to do: 

wbinfo --uid-info 3000000
Comment 10 Andrew Bartlett 2013-06-15 11:14:12 UTC
Created attachment 8969 [details]
proposed, but unsted patch to fix this

This patch attempts to ensure the context variables being used in the update do not go away during the gensec_update() call.

It certainly isn't the only fix, and I would really like to know why we are able to process a EOF on the winbind pipe while processing on it, but if confirmed it may be able to fix this for folks in the meantime.

Sadly while I have reproduced the issue locally, I can't any more (with or without the change), which makes it much harder to be certain I've even fixed or worked around the issue.
Comment 11 Andrew Bartlett 2013-06-15 13:39:59 UTC
Created attachment 8970 [details]
rework how BUILTIN domains are handled in s4-winbind

This patch may not fully address the issue, but should make the BUILTIN domain here no more special than the normal domain.

More work needs to be done to actually return group (no users in BUILTIN) entries for the aliases.
Comment 12 Stefan Metzmacher 2013-07-12 09:15:27 UTC
Created attachment 9043 [details]
Patches for v4-0-test and v4-1-test
Comment 13 Andrew Bartlett 2013-07-12 12:02:18 UTC
Comment on attachment 9043 [details]
Patches for v4-0-test and v4-1-test

Thanks, this should make winbind much less painful for folks.
Comment 14 Andrew Bartlett 2013-07-12 12:03:10 UTC
Assigning to Karolin for 4.0 and 4.1
Comment 15 Andrew Bartlett 2013-07-13 07:41:19 UTC
*** Bug 9842 has been marked as a duplicate of this bug. ***
Comment 16 Karolin Seeger 2013-07-15 18:48:43 UTC
(In reply to comment #14)
> Assigning to Karolin for 4.0 and 4.1

Waiting for second review flag.
Comment 17 Karolin Seeger 2013-07-29 19:35:36 UTC
Pushed to autobuild-v4-0-test.
Comment 18 Karolin Seeger 2013-07-29 19:36:35 UTC
Patch does not apply on current v4-1-test branch:

Applying: s4-winbind: Add special case for BUILTIN domain
/data/git/samba/v4-1-test/.git/rebase-apply/patch:16: trailing whitespace.
	if (dom_sid_equal(sid, &global_sid_Builtin) || 
/data/git/samba/v4-1-test/.git/rebase-apply/patch:52: trailing whitespace.
		
/data/git/samba/v4-1-test/.git/rebase-apply/patch:61: trailing whitespace.
		
error: patch failed: source4/winbind/wb_dom_info.c:67
error: source4/winbind/wb_dom_info.c: patch does not apply
error: patch failed: source4/winbind/wb_init_domain.c:369
error: source4/winbind/wb_init_domain.c: patch does not apply
Patch failed at 0001 s4-winbind: Add special case for BUILTIN domain

Please provide a version for v4-1-test.
Thanks!
Comment 19 Andrew Bartlett 2013-07-29 23:47:25 UTC
Created attachment 9087 [details]
correct 4.1 patch cherry-picked from master

The issue with the patch is that the first bit (the BUILTIN domain handling) got into 4.1 before it branched.

Attached is a patch I've confirmed builds and tests with v4-1-test.
Comment 20 Karolin Seeger 2013-08-05 18:07:03 UTC
(In reply to comment #19)
> Created attachment 9087 [details]
> correct 4.1 patch cherry-picked from master
> 
> The issue with the patch is that the first bit (the BUILTIN domain handling)
> got into 4.1 before it branched.
> 
> Attached is a patch I've confirmed builds and tests with v4-1-test.

Pushed ot autobuild-v4-1-test.
Thanks, Andrew!
Comment 21 Karolin Seeger 2013-08-05 18:07:52 UTC
(In reply to comment #17)
> Pushed to autobuild-v4-0-test.

Pushed to v4-0-test.
Comment 22 Karolin Seeger 2013-08-07 09:10:45 UTC
Pushed to v4-1-test.
Closing out bug report.

Thanks!