Hi, Currently trying to let our linux users be able to reset their SAMBA/AD password via SSH. They can login in a specific samba server set with pam_winbind fine, using /usr/bin/password reset their domain password without issues. If the "must change password at next logon" checkbox is ticked, login with SSH just hang, and winbindd CPU usage creeps until it reaches 100%, even if the SSH connection attempt is killed. wbinfo -u/-g, or everything else that use winbind stops working as well, until the process is killed with "-9 <PID>" Not sure if it relevant but the process that was running at 100% CPU was running as the user trying to log on according to the uid number. auth.log error Jan 8 10:59:54 test-smbpasswd sshd[7186]: pam_winbind(sshd:auth): user 'DOMAIN\testfrancois' denied access (incorrect password or invalid membership) (that's the only error we're getting) relevant configuration: /etc/pam.d/common-account: account sufficient pam_winbind.so debug /etc/pam.d/common-auth: auth [success=1 default=ignore] pam_winbind.so krb5_auth krb5_ccache_type=FILE cached_login try_first_pass auth sufficient pam_winbind.so use_first_pass /etc/pam.d/common-password: password [success=1 default=ignore] pam_winbind.so use_authtok try_first_pass password sufficient pam_winbind.so debug /etc/pam.d/common-session: session optional pam_winbind.so /etc/pam.d/common-session-noninteractive: session optional pam_winbind.so smb.conf: [global] netbios name = test-smbpasswd workgroup = DOMAIN realm = DOMAIN.NET.AU security = ads domain logons = no template homedir = /srv/ template shell = /bin/bash winbind enum groups = yes winbind enum users = yes winbind use default domain = no winbind nested groups = yes domain master = no local master = no prefered master = no root preexec = /usr/local/bin/mkhomedir.sh %U interfaces = 10.51.10.74 idmap config DOMAIN:backend = rid idmap config DOMAIN:base_rid = 0 idmap config DOMAIN:range = 50000 - 100000 idmap uid = 10000-20000 idmap gid = 10000-20000
Can you get a stack backtrace of the spinning process ?
Hi Jeremy, I'm not sure it's going to be useful to you, but here goes: strace -p 8546 Process 8546 attached - interrupt to quit And that's it. No waiting for anything, or doing anything. I get the same result with strace winbindd -F <...> gettimeofday({1420678614, 630943}, NULL) = 0 epoll_wait(3, {{EPOLLIN, {u32=3651421424, u64=140552161215728}}}, 1, 248225) = 1 recvfrom(27, "0\10\0\0", 4, 0, NULL, NULL) = 4 gettimeofday({1420678614, 631135}, NULL) = 0 epoll_wait(3, {{EPOLLIN, {u32=3651421424, u64=140552161215728}}}, 1, 248225) = 1 recvfrom(27, "\r\0\0\0\0\0\0\0\212&\0\0\0\0\0\0\236\360\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2092, 0, NULL, NULL) = 2092 epoll_ctl(3, EPOLL_CTL_DEL, 27, {0, {u32=0, u64=0}}) = 0 gettimeofday({1420678614, 631428}, NULL) = 0 epoll_wait(3, ^C <unfinished ...> with the last line staying identical and the process not exiting after ^C
That's an strace, not a backtrace. If the trace ends in epoll_wait(3, ^C <unfinished ...> then the process is waiting, not spinning on the CPU. Is this the correct process you're looking at ? I need to see traces from an actual CPU spinning process.
Sorry, I thought running strace -p <PID> would be the way to do it. What do you suggest I do to get you the required back trace?
Attach to the spinning process using gdb, then use the "bt" command to get a backtrace. Disconnect, and do that a few times so we can see where it is spinning. If it's in epoll_wait, then it isn't spinning.
As requested. (gdb) bt #0 0x00007fe5f2257bb7 in krb5_get_init_creds_password () from /usr/lib/x86_64-linux-gnu/libkrb5.so.26 #1 0x00007fe5f42d44b9 in kerberos_kinit_password_ext () from /usr/lib/x86_64-linux-gnu/samba/libgse.so.0 #2 0x00007fe5f82f56e0 in kerberos_return_pac () #3 0x00007fe5f8307f48 in winbindd_dual_pam_auth () #4 0x00007fe5f831dcfc in ?? () #5 0x00007fe5f179786b in ?? () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #6 0x00007fe5f1795d56 in ?? () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #7 0x00007fe5f17923ed in _tevent_loop_once () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #8 0x00007fe5f8320200 in ?? () #9 0x00007fe5f8320915 in ?? () #10 0x00007fe5f1792ca2 in tevent_common_loop_immediate () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #11 0x00007fe5f1797601 in ?? () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #12 0x00007fe5f1795d56 in ?? () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #13 0x00007fe5f17923ed in _tevent_loop_once () from /usr/lib/x86_64-linux-gnu/libtevent.so.0 #14 0x00007fe5f82ecf3b in main () (gdb) #0 0x00
I'm seeing a similar problem on 4.1.18 with expired passwords (I guess the flag is set for those automatically?) – winbindd hangs, which in turn hangs nss_winbind, making it nigh impossible to use the server (login shells hang, etc.). Are more backtraces/logs required?
What is your Unix system and your heimdal version? Just tried with according to /etc/debian_version Debian 7.8 and I could not reproduce the issue.
Does this reproduce when the in-tree Heimdal is used, rather than the system one?
Created attachment 11378 [details] git-am fix for 4.3.0, 4.2.next, 4.1.next Cherry-pick from master applies cleanly.
Comment on attachment 11378 [details] git-am fix for 4.3.0, 4.2.next, 4.1.next I'm fine with it, but I would like gd to bless it too
OK, here's a private email Rowland sent to me reporting a problem with the patch. Not sure how repeatable this is. Gunther, can you check this out ? Thanks ! Jeremy. --------------------------------------------------- OK Jeremy, I downloaded the source for sernet samba 4.2.3 onto the client, added Volkers patch and then rebuilt the sernet packages. I then installed the new sernet winbind & libs packages, put sshd_config back to what it was and tried to login via ssh as a user that is set to change password at next logon. I got this: rowland@ThinkPad ~ $ ssh user3@192.168.0.239 user3@192.168.0.239's password: Password expired. You must change it now. Password change rejected: Password is already in password history. New password must not match any of your 24 previous passwords.. Please try again. Password change rejected: Password is already in password history. New password must not match any of your 24 previous passwords.. Please try again. Your password has expired Creating directory '/home/user3'. Linux client 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. user3@client:~$ So it logged me in with old password and I then checked the users object in AD and found this: pwdLastSet: 0 So, Volker has made it that you can log in with the original password, but it is totally ignoring the fact that the user is supposed to change their password. Rowland ---------------------------------------------------
Roland can you attach your pam config file ? Thanks !
Created attachment 11385 [details] /etc/pam.d dir from test client
(In reply to Jeremy Allison from comment #12) Right, I noticed too that I can log in. I get a message that I have to change my password, but I'm not forced to. That's the reason why I want the Kerberos gurus to take a look. Until then, shall we revert the patch in master?
Can I point out, as I already pointed out to Jeremy, that if you set 'ChallengeResponseAuthentication yes' in /etc/ssh/sshd_config, you don't need Volkers patch. You get asked for the original password, are then told it has expired and then asked to enter a new password (twice), you are then logged in and 'pwdLastSet' is updated in AD. At no time does winbind wind up the CPU use.
(In reply to Rowland Penny from comment #16) Right. The 100% CPU spin only happens with Kerberos. I have no clue how this is supposed to work with Kerberos. I can only repeat: Kerberos GURUs, please please speak up. If there is no resolution, I would rather disable direct kinit from within winbind instead of leaving this security restriction bypass open.
(In reply to Volker Lendecke from comment #17) Ah, light begins to dawn, (I am always a bit slow on the uptake ;-) ) It is kerberos churning away that is causing the excessive CPU use and when I get prompted to change the password, it is NTLM or similar that is doing the password change and not kerberos. So, as I see it, you need Volkers patch (or something similar) to really stop the churning AND the change I made to sshd_config to get prompted to change the password. (by the way, the config change works with Volkers patch )
(In reply to Volker Lendecke from comment #15) Let's wait until Gunther has taken a look first, don't like to use reverts in master unless we have to. If not correct, your fix is certainly along the right lines (I'm going to drag ab into this bug too for comments :-). It's not in any release branches yet so I don't think there's any urgency in revert (yet :-). Alexander, can you examine this issue as well please (added him to CC: list). Jeremy.
Whole idea of Volker's patch is that we don't want to change passwords in the place where kerb_prompter() is called. I guess we should return some other Kerberos error than KRB5KDC_ERR_KEY_EXPIRED. Looking at the default prompter (krb5_prompter_posix) in Heimdal, I see that it always returns 1 for any error. MIT Kerberos prompter does return KRB5_LIBOS_CANTREADPWD for such cases. The same error code exists in Heimdal. So, I guess we can go by returning KRB_LIBOS_CANTREADPWD instead of KRB5KDC_ERR_KEY_EXPIRED.
(In reply to Alexander Bokovoy from comment #20) Not sure that's right. KRB5KDC_ERR_KEY_EXPIRED is mapped into NT_STATUS_PASSWORD_EXPIRED inside source3/libads/krb5_errs.c, KRB_LIBOS_CANTREADPWD isn't mapped into anything. So when we return KRB5KDC_ERR_KEY_EXPIRED, it becomes NT_STATUS_PASSWORD_EXPIRED inside: source3/libads/kerberos.c:kerberos_kinit_password_ext() which is called from kerberos_return_pac() which should then return NT_STATUS_PASSWORD_EXPIRED to the caller in winbindd_raw_kerberos_login(), which should end up here: in winbindd_dual_pam_auth(). 1714 result = winbindd_dual_pam_auth_kerberos(domain, state, &info3); 1715 /* save for later */ 1716 krb5_result = result; 1717 1718 1719 if (NT_STATUS_IS_OK(result)) { 1720 DEBUG(10,("winbindd_dual_pam_auth_kerberos succeeded\n")); 1721 goto process_result; 1722 } else { 1723 DEBUG(10,("winbindd_dual_pam_auth_kerberos failed: %s\n", nt_errstr(result))); 1724 } 1725 1726 if (NT_STATUS_EQUAL(result, NT_STATUS_NO_LOGON_SERVERS) || 1727 NT_STATUS_EQUAL(result, NT_STATUS_IO_TIMEOUT) || 1728 NT_STATUS_EQUAL(result, NT_STATUS_DOMAIN_CONTROLLER_NOT_FOUND)) { 1729 DEBUG(10,("winbindd_dual_pam_auth_kerberos setting domain to offline\n")); 1730 set_domain_offline( domain ); 1731 goto cached_logon; 1732 } 1733 1734 /* there are quite some NT_STATUS errors where there is no 1735 * point in retrying with a samlogon, we explictly have to take 1736 * care not to increase the bad logon counter on the DC */ 1737 1738 if (NT_STATUS_EQUAL(result, NT_STATUS_ACCOUNT_DISABLED) || 1739 NT_STATUS_EQUAL(result, NT_STATUS_ACCOUNT_EXPIRED) || 1740 NT_STATUS_EQUAL(result, NT_STATUS_ACCOUNT_LOCKED_OUT) || 1741 NT_STATUS_EQUAL(result, NT_STATUS_INVALID_LOGON_HOURS) || 1742 NT_STATUS_EQUAL(result, NT_STATUS_INVALID_WORKSTATION) || 1743 NT_STATUS_EQUAL(result, NT_STATUS_LOGON_FAILURE) || 1744 NT_STATUS_EQUAL(result, NT_STATUS_NO_SUCH_USER) || 1745 NT_STATUS_EQUAL(result, NT_STATUS_PASSWORD_EXPIRED) || 1746 NT_STATUS_EQUAL(result, NT_STATUS_PASSWORD_MUST_CHANGE) || 1747 NT_STATUS_EQUAL(result, NT_STATUS_WRONG_PASSWORD)) { 1748 goto done; 1749 } 1750 1751 if (state->request->flags & WBFLAG_PAM_FALLBACK_AFTER_KRB5) { 1752 DEBUG(3,("falling back to samlogon\n")); 1753 goto sam_logon; 1754 } else { 1755 goto cached_logon; 1756 } So I don't understand how the login is proceeding when NT_STATUS_PASSWORD_EXPIRED comes back from this ? Roland, can you post a winbindd debug level 10 log when you're able to log in to see if we get the: 1723 DEBUG(10,("winbindd_dual_pam_auth_kerberos failed: %s\n", nt_errstr(result))); message we should see in the log ?
If we *DO* see the "winbindd_dual_pam_auth_kerberos failed:" message in the debug log, I'd dearly love to see what NT status error message is being printed... If KRB5KDC_ERR_KEY_EXPIRED *ISN'T* getting mapped into NT_STATUS_PASSWORD_EXPIRED inside source3/libads/krb5_errs.c (and there are some #ifdef's around there that might be dodgy), then the pam login might proceed into: 751 if (state->request->flags & WBFLAG_PAM_FALLBACK_AFTER_KRB5) { 1752 DEBUG(3,("falling back to samlogon\n")); 1753 goto sam_logon; 1754 } else { 1755 goto cached_logon; 1756 } which would explain a lot...
Created attachment 11387 [details] logfiles from client
Drat, I will get the hang of this ;-) I can log in with a user that has had their password set to 'must change password at next login' and I always get this: rowland@ThinkPad ~ $ ssh user4@192.168.0.239 Password: Password expired. You must change it now. Enter new password: Enter it again: Warning: Your password will expire in 42 days on Tue Oct 13 09:02:16 2015 Creating directory '/home/user4'. Linux client 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. user4@client:~$ I also found this in /var/log/auth.log Sep 1 09:02:21 client sshd[2646]: pam_krb5(sshd:auth): user user4 authenticated as user4@EXAMPLE.COM Sep 1 09:02:21 client sshd[2644]: Accepted keyboard-interactive/pam for user4 from 192.168.0.119 port 48422 ssh2 Sep 1 09:02:21 client sshd[2644]: pam_unix(sshd:session): session opened for user user4 by (uid=0)
(In reply to Rowland Penny from comment #23) Can we also get network captures matching the log files? Thanks!
(In reply to Alexander Bokovoy from comment #20) Do we really need the prompter at all? As far as I can see it was added to work around bugs in old heimdal or MIT versions back in 2002. As we require recent versions, we may be able to drop the prompter. Otherwise instead of blacklisting some commands we should whitelist the commands we are able to handle.
(In reply to Stefan Metzmacher from comment #25) If you can tell me just how to do this, I am very willing to attempt it, please base your reply on as if you are talking to an idiot (i.e. me)
(In reply to Rowland Penny from comment #27) Assuming our just on a test network. On the server where winbindd runs, have two separate windows open as root doing the following: WINDOW1: # stop the winbindd processes (maybe 'killall winbindd') WINDOW1: # (re)move old files under /var/log/samba WINDOW2:tcpdump -p -s 0 -w /var/log/samba/capture-for-bug-11038-01.pcap WINDOW1: # start the winbindd daemon with "log level = 10" or -d 10 (maybe 'winbindd -d 10') Reproduce the problem WINDOW2: # Stop tcpdump using 'strg+c' WINDOW1: # stop winbindd again WINDOW1: # copy the smb.conf, sshd_config and /etc/pam.d to /var/log/samba/conf/ WINDOW1: # if this is really a test network with no secret passwords # create a keytab file containing every password in the whole domain! net rpc vampire keytab /var/log/samba/capture-for-bug-11038-01.keytab \ -I <ip_of_domain_controller> -U <user_with_admin_rights> WINDOW1: tar cfj /tmp/bug-11038-logs-and-capture-01.tar.gz /var/log/samba Upload /tmp/bug-11038-logs-and-capture-01.tar.gz to the bugreport. Thanks!
Created attachment 11388 [details] logs etc as requested Files as requested
(In reply to Rowland Penny from comment #29) Hmmm. I can't see any pam debugs in those winbind logs at all.. Did you see the 'Password must be changed' prompt ?
(In reply to Jeremy Allison from comment #30) Hmmm. I do see the STATUS_PASSWORD_MUST change in packet 344 in the wireshark trace, I just don't see it in the winbindd logs.
(In reply to Jeremy Allison from comment #30) well yes and no ;-) what I get is this: Password: Password expired. You must change it now. Enter new password: Enter it again:
(In reply to Jeremy Allison from comment #31) I did what I was asked too and what I posted was the result, but I have looked at the other earlier logs and was struggling to see any errors. What I can say is that at the moment I am using Volkers patch, I can if needed, create another test client and do it all again, but this time without Volkers patch.
Interesting what Samba considers "urgent attention". What more data do we need to get this fixed? More tests of the patch? More backtraces? …?
Sorry, PAM is black art for me. I'm digging now. I don't know how to get sshd request a password change. Maybe you have more docs on that available?
(In reply to Volker Lendecke from comment #35) It is in comment #16 , but you just need to set 'ChallengeResponseAuthentication yes' in /etc/ssh/sshd_config.
(In reply to Rowland Penny from comment #36) It's not that I don't get it reproduced. It's the PAM API documentation I am missing.
(In reply to Volker Lendecke from comment #37) With e551cdb37d3e re-applied the problem is gone with and without kerberos. Moreover, if correctly configured, sshd requests you to change your password at logon time, which then succeeds. The problem why I had this reverted was because I had not gone through the pain to correctly configure all the PAM services (in particular the "account" section), leading to sshd letting the user in when the password had to be changed. This meant I had thought I had introduced a security problem. So, everyone listening: Re-apply e551cdb37d3e, and I believe the problem is gone.
(In reply to Volker Lendecke from comment #38) Ok, I'm outta here. Sorry for raising this again. There is doubt on the ML that this is the right fix. Please contact Andrew Bartlett, our main kerberos expert about this defect.
Nobody has time right now, closing as LATER.
Sigh. Reopening as this is a vital fix. Volker, Michael has already re-pushed your fix to master (and if he didn't get to it, I will). I really appreciate the work you did on getting a reproducible environment for this, and the fix is an essential one (as you know :-). Jeremy.
Marking this as a blocker bug. We must not ship another Samba release with this bug unfixed.
Created attachment 11471 [details] git-am fix for 4.3.next, 4.2.next. Cherry-picked from master.
Created attachment 11473 [details] git-am back-port for 4.3.next Adds both patches needed.
Created attachment 11474 [details] git-am back-port for 4.2.next
Comment on attachment 11473 [details] git-am back-port for 4.3.next LGTM, Thanks!
Comment on attachment 11474 [details] git-am back-port for 4.2.next LGTM, Thanks!
Re-assigning to Karolin for inclusion in 4.3.next, 4.2.next.
(In reply to Jeremy Allison from comment #49) I was confused by the bug number in the file names, but the commit message belongs to this one ;-). Pushed to autobuild-v4-[3|2]-test.
(In reply to Karolin Seeger from comment #50) Pushed to both branches. Closing out bug report. Thanks!