Bug 11381 - winbind failing to respond
winbind failing to respond
Status: RESOLVED FIXED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Winbind
4.2.2
All All
: P5 normal
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-03 11:19 UTC by David Woodhouse
Modified: 2015-10-19 08:47 UTC (History)
7 users (show)

See Also:


Attachments
log.wb-GER (107.35 KB, text/plain)
2015-07-03 11:20 UTC, David Woodhouse
no flags Details
winbind backtrace from process 1003 (parent) (4.01 KB, text/plain)
2015-07-03 11:22 UTC, David Woodhouse
no flags Details
All backtraces (3.64 KB, application/gzip)
2015-07-03 11:23 UTC, David Woodhouse
no flags Details
Reproducer (6.39 KB, patch)
2015-07-06 09:46 UTC, Volker Lendecke
no flags Details
possible fix (1.14 KB, patch)
2015-07-06 12:28 UTC, Volker Lendecke
no flags Details
Patch for 4.2 cherry-picked from master (1.92 KB, patch)
2015-08-03 13:08 UTC, Ralph Böhme
slow: review? (metze)
jra: review+
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Woodhouse 2015-07-03 11:19:35 UTC
Winbind seems to have locked up. I am unable to log in remotely via SSH, as my password isn't accepted. Even wbinfo --online-status' doesn't work. After a delay it reports:

failed to call wbcListTrusts: WBC_ERR_WINBIND_NOT_AVAILABLE
Could not show online-status

The last thing in /var/log/samba/log.winbindd is from when I joined the VPN yesterday and it will have been prodded to go online:

[2015/07/02 13:19:11.586610, 10, pid=1003, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_dual.c:667(winbind_msg_online)
  winbind_msg_online: got online message.
[2015/07/02 13:19:11.586646, 10, pid=1003, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cache.c:3566(set_global_winbindd_state_online)
  set_global_winbindd_state_online: online requested.
[2015/07/02 13:19:11.586657,  5, pid=1003, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_dual.c:685(winbind_msg_online)
  winbind_msg_online: requesting GER to go online.
Comment 1 David Woodhouse 2015-07-03 11:20:25 UTC
Created attachment 11229 [details]
log.wb-GER
Comment 2 David Woodhouse 2015-07-03 11:21:07 UTC
# ps axf | grep winbind
 1003 ?        SLs    0:48 /usr/sbin/winbindd
 1027 ?        SL     0:02  \_ /usr/sbin/winbindd
20258 ?        S      0:00  |   \_ /usr/sbin/winbindd
 3209 ?        S      0:00  \_ /usr/sbin/winbindd
 3827 ?        S      0:01  \_ /usr/sbin/winbindd
20256 ?        S      0:00      \_ /usr/sbin/winbindd
20257 ?        S      0:00      \_ /usr/sbin/winbindd
# for a in 1003 1027 20258 3209 3827 20256 20257  ; do gdb /usr/sbin/winbindd $a --batch -ex 't a a bt' > winbind-bt.$a ; done
Comment 3 David Woodhouse 2015-07-03 11:22:20 UTC
Created attachment 11230 [details]
winbind backtrace from process 1003 (parent)
Comment 4 David Woodhouse 2015-07-03 11:23:50 UTC
Created attachment 11231 [details]
All backtraces

Rest of them in a tarball rather than individual attachments...
Comment 5 David Woodhouse 2015-07-03 11:32:45 UTC
Process 3827 and 1027 are in epoll_wait() and probably not involved.

Process 1003, 20256, 20257 and 20258 all have a backtrace which looks much like this:

#1  0x00007f8e7c355558 in __pthread_mutex_lock_full (mutex=0x7f8e7ca650a8) at ../nptl/pthread_mutex_lock.c:259
#2  0x00007f8e7c3558b5 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f8e7ca650a8) at ../nptl/pthread_mutex_lock.c:73
#3  0x00007f8e7551608d in allrecord_mutex_lock (m=0x7f8e7ca65000, waitflag=<optimized out>) at ../common/mutex.c:201
#4  0x00007f8e75516346 in tdb_mutex_lock (tdb=tdb@entry=0x7f8e7ea80bb0, rw=rw@entry=1, off=off@entry=168, len=len@entry=1, waitflag=waitflag@entry=true, pret=pret@entry=0x7ffdc96d0b20) at ../common/mutex.c:307
#5  0x00007f8e7550eaf6 in fcntl_lock (waitflag=true, len=1, off=168, rw=1, tdb=0x7f8e7ea80bb0) at ../common/lock.c:44
#6  tdb_brlock (tdb=0x7f8e7ea80bb0, rw_type=1, offset=168, len=1, flags=TDB_LOCK_WAIT) at ../common/lock.c:174
#7  0x00007f8e7550efab in tdb_nest_lock (tdb=0x7f8e7ea80bb0, offset=168, ltype=1, flags=<optimized out>) at ../common/lock.c:346

Process 3209 looks like this:
#0  0x00007f8e7c35b6b7 in do_fcntl (arg=<optimized out>, cmd=7, fd=29) at ../sysdeps/unix/sysv/linux/fcntl.c:31
#1  __libc_fcntl (fd=29, cmd=cmd@entry=7) at ../sysdeps/unix/sysv/linux/fcntl.c:71
#2  0x00007f8e7550eb33 in fcntl_lock (waitflag=true, len=1, off=8, rw=1, tdb=0x7f8e7ea80bb0) at ../common/lock.c:58
#3  tdb_brlock (tdb=0x7f8e7ea80bb0, rw_type=1, offset=8, len=1, flags=TDB_LOCK_WAIT) at ../common/lock.c:174
#4  0x00007f8e7550efab in tdb_nest_lock (tdb=0x7f8e7ea80bb0, offset=8, ltype=1, flags=<optimized out>) at ../common/lock.c:346


Is this an ABBA deadlock with the fcntl lock vs. the pthread mutex? I'm assuming the mutex is shared across processes?
Comment 6 David Woodhouse 2015-07-03 11:41:19 UTC
# egrep 1003\|1027\|20258\|3209\|3827\|20256\|20257 /proc/locks
1: POSIX  ADVISORY  WRITE 1003 08:07:23072398 8 8
1: -> POSIX  ADVISORY  WRITE 3209 08:07:23072398 8 8
2: POSIX  ADVISORY  READ  3827 08:07:23076578 4 4
5: POSIX  ADVISORY  READ  1003 08:07:23076578 4 4
6: POSIX  ADVISORY  READ  1027 08:07:23076578 4 4
8: POSIX  ADVISORY  WRITE 20256 08:07:23079460 0 EOF
9: POSIX  ADVISORY  READ  1003 08:07:23076457 4 4
11: POSIX  ADVISORY  READ  1003 08:07:23072403 4 4
14: POSIX  ADVISORY  READ  1003 08:07:23072398 4 4
15: POSIX  ADVISORY  READ  1027 08:07:23072398 4 4
16: POSIX  ADVISORY  WRITE 1027 08:07:23076899 0 EOF
18: POSIX  ADVISORY  WRITE 20258 08:07:23079464 0 EOF
21: POSIX  ADVISORY  WRITE 20257 08:07:23079462 0 EOF
22: POSIX  ADVISORY  WRITE 3209 08:07:23072398 692 EOF
23: POSIX  ADVISORY  WRITE 3827 08:07:23076998 0 EOF
32: POSIX  ADVISORY  WRITE 3209 08:07:23076981 0 EOF
38: POSIX  ADVISORY  WRITE 1003 08:07:23076872 0 EOF
39: POSIX  ADVISORY  READ  1003 08:07:23076392 4 4
40: POSIX  ADVISORY  WRITE 1003 00:13:21430 0 0

The second line you can see is process 3209 attempting to get a lock on inode #23072398 (/var/lib/samba/gencache_notrans.tdb) for the range 8-8. But it's already held by process 1003.
Comment 7 Volker Lendecke 2015-07-03 11:50:47 UTC
Thanks for the backtraces! I need to take a look how this can deadlock :-(
Comment 8 David Woodhouse 2015-07-03 12:06:17 UTC
In case there was any doubt, it's the same mutex being accessed by all four waiting processes:

winbind-bt.1003:#1  0x00007f8e7c355558 in __pthread_mutex_lock_full (mutex=0x7f8e7ca650a8) at ../nptl/pthread_mutex_lock.c:259
winbind-bt.20256:#1  0x00007f8e7c355558 in __pthread_mutex_lock_full (mutex=0x7f8e7ca650a8) at ../nptl/pthread_mutex_lock.c:259
winbind-bt.20257:#1  0x00007f8e7c355558 in __pthread_mutex_lock_full (mutex=0x7f8e7ca650a8) at ../nptl/pthread_mutex_lock.c:259
winbind-bt.20258:#1  0x00007f8e7c355558 in __pthread_mutex_lock_full (mutex=0x7f8e7ca680a8) at ../nptl/pthread_mutex_lock.c:259
[root@shinybook samba]# grep -H /var/lib/samba/gencache_notrans.tdb /proc/{1003,20256,20257,20258}/maps 
/proc/1003/maps:7f8e7ca65000-7f8e7ca6b000 rw-s 00000000 08:07 23072398                   /var/lib/samba/gencache_notrans.tdb
/proc/20256/maps:7f8e7ca65000-7f8e7ca6b000 rw-s 00000000 08:07 23072398                   /var/lib/samba/gencache_notrans.tdb
/proc/20257/maps:7f8e7ca65000-7f8e7ca6b000 rw-s 00000000 08:07 23072398                   /var/lib/samba/gencache_notrans.tdb
/proc/20258/maps:7f8e7ca64000-7f8e7ca68000 rw-s 00002000 08:07 23072398                   /var/lib/samba/gencache_notrans.tdb
/proc/20258/maps:7f8e7ca68000-7f8e7ca6a000 rw-s 00000000 08:07 23072398                   /var/lib/samba/gencache_notrans.tdb

And (debugging from the POV of pid 1003, not that it should matter) it seems to be owned by process 3209:
(gdb) up
#1  0x00007f8e7c355558 in __pthread_mutex_lock_full (mutex=0x7f8e7ca650a8) at ../nptl/pthread_mutex_lock.c:259
259		  oldval = LLL_ROBUST_MUTEX_LOCK (mutex, id);
(gdb) p *mutex
$1 = {__data = {__lock = -2147480439, __count = 1, __owner = 3209, __nusers = 1, __kind = 146, __spins = 0, __elision = 0, __list = {__prev = 0x7f8e7ca13ae0, 
      __next = 0x7f8e7ca13ae0}}, 
  __size = "\211\f\000\200\001\000\000\000\211\f\000\000\001\000\000\000\222\000\000\000\000\000\000\000\340:\241|\216\177\000\000\340:\241|\216\177\000", 
  __align = 6442454153}


So yes, process 3209 is holding the pthread mutex while waiting for a fcntl lock that's held by process 1003. And process 1003 is holding that fcntl lock while trying to obtain the pthread mutex that's held by process 3209.
Comment 9 Volker Lendecke 2015-07-06 09:46:23 UTC
Created attachment 11234 [details]
Reproducer

This is a reproducer for the problem:

* Process 1: get the allrecord_lock on a tdb.
* Process 2: Start a traverse, this will stall waiting for first chainlock
* Process 1: Start a traverse: This will get EDEADLK in trying to
*            get the TRANSACTION_LOCK. It will deadlock for mutexes

... now to fix it...
Comment 10 Volker Lendecke 2015-07-06 12:28:48 UTC
Created attachment 11235 [details]
possible fix

prototype fix
Comment 11 David Woodhouse 2015-07-06 15:28:40 UTC
From the point of view of a complete outsider, unfamiliar with the code at all, this seems a little odd. I would normally expect there to be a strict *order* for locks to be obtained in, which avoids the ABBA deadlock.

This looks like it'll just make tdb_traverse() spuriously fail when it gets unlucky. It wasn't the approach I was expecting...
Comment 12 Volker Lendecke 2015-07-06 15:33:46 UTC
Sorry for my bad patch. It was the best I could come up with.
Comment 13 Stefan Metzmacher 2015-07-07 10:38:15 UTC
(In reply to Volker Lendecke from comment #10)

What about holding the transaction lock while holding the all record lock?

We already do fcntl() locks for all records, while holding the
allrecord lock.

In order to make this work would need convert the transaction lock
to use a mutex or somehow merge the allrecord and transaction lock.
As we would need to get the transaction lock before the all record
lock, so the contention would be on the transaction lock.

In the meantime we could revert the following commit:

  commit ad16a9a5b30bc0c026aea380402d6072c3d62e72
  Author: Michael Adam <obnox@samba.org>
  Date:   Wed Jul 2 07:44:04 2014 +0200

    s3:gencache: don't use transaction non non-persistent gencache_notrans.tdb

as transactions work fine on tdbs with mutexes since tdb-1.3.4.
Comment 14 Volker Lendecke 2015-07-07 10:48:30 UTC
(In reply to Stefan (metze) Metzmacher from comment #13)

Why is there a transaction lock separate from the allrecord lock?
Comment 15 Stefan Metzmacher 2015-07-08 08:37:43 UTC
(In reply to Volker Lendecke from comment #14)

I think because traverses should return a consistent view,
either before a transaction or after a transaction, but multiple
traverses should be possible and in the non transaction case,
the traverse run should be able to modify records, while another
traverse is in progress. If we would always grab the allrecord lock
for a traverse, we would get bad performance for non-persistent tdbs.
smbstatus would block all smbd processes.
Comment 16 David Woodhouse 2015-07-08 08:41:28 UTC
(In reply to Stefan (metze) Metzmacher from comment #13)
> In the meantime we could revert the following commit:
>  commit ad16a9a5b30bc0c026aea380402d6072c3d62e72

I've seen no 'yes, this is a good idea' or 'no, that will break xxx...' in response to this suggestion, so I'm going to ask:

Does it makes sense for me to do this as an interim measure in my own package (or even in the Fedora package, if a real fix is going to take longer)?

My users tend to get fractious when they can't log in to their own systems... :)
Comment 17 Volker Lendecke 2015-07-08 10:27:54 UTC
(In reply to David Woodhouse from comment #16)

Sorry, but as the patch I proposed (which removes the deadlock) does not meet your expectations and got a reject, we need to discuss more thoroughly how to properly fix this.
Comment 18 David Woodhouse 2015-07-08 11:04:26 UTC
(In reply to Volker Lendecke from comment #17)

I'm sorry; I didn't meant to be overly critical in comment #11 and it certainly wasn't a reject — I'm not at all familiar with this code and just wanted to check my understanding was correct. I'm extremely grateful for your attention to this issue!

Unfortunately I'm not really sure how long it takes to reproduce this problem. A number of days, certainly. There was a Linux kernel bug in 4.0 kernels before 4.0.6 which caused similar symptoms, so after diagnosing *one* winbind lockup as caused by that, I just started killing winbind and restarting it without paying attention, until the kernel bug was fixed. And *then* I looked harder and found this.

So I don't actually know how long I'd have to run with a proposed fix before calling it successful. If my concerns in comment #11 are unfounded and that's the preferred approach, I can test with that instead of the revert suggested in comment #13.
Comment 19 Volker Lendecke 2015-07-08 11:36:06 UTC
(In reply to David Woodhouse from comment #18)

The behaviour that got the reject (namely that tdb_traverse can fail) is much older than the deadlock. It did not result in a deadlock because the kernel checks fcntl locks for such conditions. It also went unnoticed because the traverse that triggers it is not critical. The conversion to mutexes for gencache removed the kernel deadlock protection, so this condition turns into an actual deadlock. My patch makes tdb with mutexes in this condition exactly the same as without.

Now we are in a situation that this deadlock exposed a subtle and not easy to fix problem in tdb, and just applying my patch is not what you expected, because it just properly fails when this ABBA happens and not fully avoids ABBA. So I guess you have to wait until we have a patch that fully avoid ABBA. But with all locking hierarchy problems this is subtle stuff and you will have to wait, sorry for that.
Comment 20 David Woodhouse 2015-07-08 12:09:33 UTC
(In reply to Volker Lendecke from comment #19)

> The behaviour that got the reject (namely that tdb_traverse can fail) is much
> older than the deadlock. … My patch makes tdb with mutexes in this condition 
> exactly the same as without.

Ah, that makes a lot more sense now. Thanks for the explanation. So, should I get people to test with that patch instead of the revert?

As I noted before, it's difficult to establish success criteria since the deadlock was not trivial to reproduce in real life in the first place.

In email you'd suggested increasing the values of gencache:stablize_{count,interval} to help make it less likely. Perhaps I should do the *opposite* for testing? And run with a sufficient level of debugging that I can look for the error that's reported when EDEADLK happens, and know I've successfully avoided the deadlock? What debugging should I enable (in tdb?) and what would I be looking for?
Comment 21 Volker Lendecke 2015-07-08 12:18:39 UTC
(In reply to David Woodhouse from comment #20)

Sure, decreasing those values I mentioned will expose this condition more often. 

Just FYI: I've proposed the patch officially for upstream as an interim emergency fix until we got the ABBA properly redone:

https://lists.samba.org/archive/samba-technical/2015-July/108454.html
Comment 22 David Woodhouse 2015-07-09 12:20:52 UTC
Thanks.

In the downstream Red Hat bug (https://bugzilla.redhat.com/1239051) I've put a scratch build with your fix, for testing.
Comment 23 Ralph Böhme 2015-08-03 13:08:37 UTC
Created attachment 11304 [details]
Patch for 4.2 cherry-picked from master

This fix should go in 4.2 asap, shouldn't it?
Comment 24 Jeremy Allison 2015-10-06 23:40:30 UTC
Re-assigning to Karolin for inclusion in 4.2.next. This is already in 4.3.next.
Comment 25 Karolin Seeger 2015-10-12 19:35:21 UTC
(In reply to Jeremy Allison from comment #24)
Pushed to autobuild-v4-2-test.
Comment 26 Karolin Seeger 2015-10-19 08:47:24 UTC
(In reply to Karolin Seeger from comment #25)
Pushed to v4-2-test.
Closing out bug report.

Thanks!