Bug 11808 - smbd hangs while libtdb is initializing
Summary: smbd hangs while libtdb is initializing
Alias: None
Product: TDB
Classification: Unclassified
Component: libtdb (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Uri Simchoni
QA Contact: Samba QA Contact
Depends on:
Reported: 2016-03-23 10:35 UTC by Uri Simchoni
Modified: 2017-03-09 10:14 UTC (History)
4 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Uri Simchoni 2016-03-23 10:35:33 UTC
On a specific system (x86_64 multiple cores), running a file server based on samba 4.3.6 with some vendor patches, smbd often hangs while loading. Getting a stack trace shows the following:

#0  0x00007fb98b4ef69e in waitpid () from rootfs/lib64/libpthread.so.0
#1  0x00007fb98556d4c5 in tdb_runtime_check_for_robust_mutexes () at ../lib/tdb/common/mutex.c:890
#2  0x00007fb980926163 in tdb_wrap_open (mem_ctx=0x0, name=0x958610 "/var/vol/12/.ctera/samba/lock/gencache_notrans.tdb", hash_size=0, tdb_flags=6337, open_flags=66, mode=420) at ../lib/tdb_wrap/tdb_wrap.c:151
#3  0x00007fb988c2ba83 in gencache_init () at ../source3/lib/gencache.c:126
#4  0x00007fb988c2c686 in gencache_parse (keystr=0x7fff135bcfe0 "IDMAP/GID2SID/4", parser=0x7fb988c32ed6 <idmap_cache_xid2sid_parser>, private_data=0x7fff135bcfc0) at ../source3/lib/gencache.c:489
#5  0x00007fb988c330e8 in idmap_cache_find_gid2sid (gid=4, sid=0x7fff135bd140, expired=0x7fff135bd11f) at ../source3/lib/idmap_cache.c:270
#6  0x00007fb9894fc574 in gid_to_sid (psid=0x7fff135bd140, gid=4) at ../source3/passdb/lookup_sid.c:1267
#7  0x00007fb988e7a4f6 in add_local_groups (result=0x9579b0, is_guest=true) at ../source3/auth/token_util.c:470
#8  0x00007fb988e7a622 in finalize_local_nt_token (result=0x9579b0, is_guest=true) at ../source3/auth/token_util.c:495
#9  0x00007fb988e79f4b in create_local_nt_token_from_info3 (mem_ctx=0x957810, is_guest=true, info3=0x957e90, extra=0x957d20, ntok=0x957810) at ../source3/auth/token_util.c:314
#10 0x00007fb988e84f9f in create_local_token (mem_ctx=0x955240, server_info=0x957cd0, session_key=0x0, smb_username=0x958020 "nobody", session_info_out=0x7fb989098058) at ../source3/auth/auth_util.c:555
#11 0x00007fb988e85a08 in make_new_session_info_guest (session_info=0x7fb989098058, server_info=0x7fb989098060) at ../source3/auth/auth_util.c:831
#12 0x00007fb988e867e3 in init_guest_info () at ../source3/auth/auth_util.c:1128
#13 0x000000000040ac9b in main (argc=5, argv=0x7fff135bdb78) at ../source3/smbd/server.c:1518

(gdb) info thread
* 1 Thread 1779  0x00007fb98b4ef69e in waitpid () from rootfs/lib64/libpthread.so.0

When this happening there are 2 smbd processes - the main one with this stack trace, and notifyd.

Close examination of the code shows there's a race condition between a signal handler and the main thread code.
Comment 1 Garri 2017-01-16 17:29:40 UTC
I'm experiencing same issue using Firefox with WINS name resolution
enabled in nsswitch.conf. While reporting details on Mozilla bug
tracker [1], it became evident the issue is related to libtdb.


(gdb) bt
#0  0x00007ffff6d40576 in sigsuspend () from /lib64/libc.so.6
#1  0x00007fffdd1967c9 in tdb_runtime_check_for_robust_mutexes () from
#2  0x00007fffddeecfc5 in tdb_wrap_open () from
#3  0x00007fffe11525f0 in ?? () from /usr/lib64/libsmbconf.so.0
#4  0x00007fffe1152975 in gencache_parse () from
#5  0x00007fffe11531a2 in gencache_get_data_blob () from
#6  0x00007fffe1153249 in gencache_get () from
#7  0x00007fffe114e65d in wins_srv_is_dead () from
#8  0x00007fffe0d073ee in resolve_wins_send () from
#9  0x00007fffe0d07831 in resolve_wins () from /usr/lib64/samba/libgse-
#10 0x00007fffe15c3014 in _nss_wins_gethostbyname_r () from
#11 0x00007ffff6e06062 in gethostbyname_r () from /lib64/libc.so.6
#12 0x00007fffe6dd0825 in PR_GetHostByName () from
#13 0x00007fffe9b9ab5a in ?? () from /usr/lib64/firefox/libxul.so
#14 0x00007fffe9b9b547 in ?? () from /usr/lib64/firefox/libxul.so
#15 0x00007fffe9b9b5dd in ?? () from /usr/lib64/firefox/libxul.so
#16 0x00007fffe9b9b844 in ?? () from /usr/lib64/firefox/libxul.so
#17 0x00007fffe9b9b8dc in ?? () from /usr/lib64/firefox/libxul.so
#18 0x00007fffe9ba11bb in ?? () from /usr/lib64/firefox/libxul.so
#19 0x00007fffe9ba1f61 in ?? () from /usr/lib64/firefox/libxul.so
#20 0x00007fffe9ba21f0 in XRE_main () from /usr/lib64/firefox/libxul.so
#21 0x0000000000405868 in ?? ()
#22 0x0000000000404f32 in ?? ()
#23 0x00007ffff6d2d670 in __libc_start_main () from /lib64/libc.so.6
#24 0x00000000004051a9 in _start ()

System details:

OS: Gentoo
Kernel: Linux 4.9.3
libtdb: 1.3.12

The issue is almost 100% reproducible in my environment. Feel free to
request supplementary details. Thanks. 

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1308997

Comment 2 Volker Lendecke 2017-01-16 19:30:10 UTC
The nss_wins problem should be fixed as part of bug 11563, which was fixed with Samba 4.2.6. Since that bugfix, we don't call directly into gencache from libnss_wins. Are you sure that you are on any recent Samba version?
Comment 3 Garri 2017-01-17 16:29:13 UTC
(In reply to Volker Lendecke from comment #2)
>Are you sure that you are on any recent Samba version?

Thank you Volker. I thought I used currently supported version of Samba. In fact it was EOL 4.2.14. Other branches are masked by default in Gentoo currently. I've installed current release 4.5.3 and I can no longer reproduce the issue.
Comment 4 Stefan Metzmacher 2017-03-09 10:13:51 UTC
*** Bug 12593 has been marked as a duplicate of this bug. ***
Comment 5 Stefan Metzmacher 2017-03-09 10:14:37 UTC
This should be fixed with tdb: version 1.3.9