Bug 12465 - winbind terminates after machine password change and needs domain rejoin
Summary: winbind terminates after machine password change and needs domain rejoin
Status: RESOLVED DUPLICATE of bug 12262
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 4.4.7
Hardware: x64 Linux
: P5 regression (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on: 12
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-08 10:33 UTC by Alban Rodriguez
Modified: 2017-02-22 19:34 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alban Rodriguez 2016-12-08 10:33:49 UTC
Hello,

Samba 4.4.7 AD member on Linux SLES 12 here ...

We've been running flawlessly for weeks with version 4.4.5 until we updated to 4.4.6 and experienced this bug: https://bugzilla.samba.org/show_bug.cgi?id=12369
So we updated to 4.4.7 in which this issue was fixed with an interim downgrade to version 4.4.5 until 4.4.7 was available.

Now, we're experiencing another issue and it seems related to machine (trusted account) password change.
When this happens:
- users get an 'access denied' error to their home directory.
- winbindd is not running anymore on the Samba server
- restarting winbindd is not enough to fix the issue. We also need to join the domain again.

We first had the issue Mon 28th early in the afternoon and then yesterday early in the afternoon which is exactly 7 days after.

log.wb-{DOMAINNAME} showed the same lines in either case:
[2016/11/30 10:25:26.114186,  1] ../source3/libsmb/trusts_util.c:264(trust_pw_change)
  2016/11/30 10:25:26 : trust_pw_change(UNIV-LR): Changed password locally
[2016/11/30 10:25:26.179269,  1] ../source3/libsmb/trusts_util.c:278(trust_pw_change)
  2016/11/30 10:25:26 : trust_pw_change(UNIV-LR): Changed password remotely.
[2016/11/30 10:25:26.516562,  0] ../source3/winbindd/winbindd.c:280(winbindd_sig_term_handler)
  Got sig[15] terminate (is_parent=0)

The 'machine password timeout' parameter has the default value of 604800 seconds which is exactly 7 days.

We have 2 systems showing this behavior.
Both of them had the issue on two consecutive weeks with a frequency of 7 days.
But interestingly, we have another system with the exact same version NOT showing the issue.
The first two have in common a configuration where unix extensions are enabled because they serve home directories to UNIX clients (Linux and MacOS) while the third one does not (maybe a hint ?).
Today I had to downgrade the first two back to version 4.4.5 to avoid new outage on Monday 12th and Wednesday 14th respectively because we have a test period starting today.
But I'm going to setup a vm with the same configuration and monitor the windbindd process. 

Please let me know how I can help

Thank you
Alban
Comment 1 Alban Rodriguez 2016-12-09 08:33:00 UTC
Hello,

this is quite unexpected but with a new test vm running Samba 4.4.7 and the same configuration from one of the failing servers, I can't reproduce the issue.
I changed the machine password change to a five minutes frequency and nothing wrong happened; it just works.

Any idea what can cause winbindd process to terminate on SIGTERM ?
I mean: it doesn't crash; it stops gracefully after the password has been changed.
Why that ?


Thanks 
Alban
Comment 2 Alban Rodriguez 2016-12-14 11:14:13 UTC
Ok, so since we downgraded to 4.4.5, no issue on machine account password change.

But, I did some more testing last sunday and I updated one of the two boxes to 4.4.7 again.
Then I configured the machine password timeout to a value of 300 (every five minutes).
Nothing bad happened. I could see the consecutive password changes in the winbind log.
So maybe it's related to machine password change but it does not occur every time.
On sunday I had no user session anyways ...
Now I may have to resync the password change date because as I set the default value back (1 week), it will now change on next sunday which is not very useful to track the issue.

I now understand the log lines I submitted are not relevant because the winbindd process terminating on signal 15 is a child process. So this is a normal termination.
Now I wonder if when the issue occurs, does the winbindd process crash ?
I've enabled core dumps on my samba startup script in order to figure out.

Any thought ?

Thanks

Alban
Comment 3 Stefan Metzmacher 2016-12-14 11:45:26 UTC
(In reply to Alban Rodriguez from comment #2)

Currently I have no idea what might go wrong.

bug #12262 is also a very strange problem after the machine password
changed...
Comment 4 Stefan Metzmacher 2017-02-22 19:34:39 UTC

*** This bug has been marked as a duplicate of bug 12262 ***