I have no idea why winbind thinks it's offline. That will have to be the subject of a different bug. But why is it *deleting* my valid Kerberos credentials...? [dwoodhou@i7 f22]$ kinit dwoodhou Password for dwoodhou@GER.CORP.INTEL.COM: Warning: Your password will expire in 6 days on Mon 13 Apr 2015 17:49:56 BST [dwoodhou@i7 f22]$ ls -l /tmp/krb5cc_500 -rw-------. 1 dwoodhou dwoodhou 3804 Apr 7 12:35 /tmp/krb5cc_500 [dwoodhou@i7 f22]$ wbinfo -K dwoodhou Enter dwoodhou's password: plaintext kerberos password authentication for [dwoodhou] succeeded (requesting cctype: FILE) user_flgs: NETLOGON_CACHED_ACCOUNT credentials were put in: FILE:/tmp/krb5cc_500 [dwoodhou@i7 f22]$ ls -l /tmp/krb5cc_500 ls: cannot access /tmp/krb5cc_500: No such file or directory ISTR it also does this if I attempt 'su' and get my password wrong, even when it *is* correctly online.
I can no longer trigger this with su or sudo and an incorrect password, but it is still happening when I drop off the VPN briefly and have to authenticate to something while I'm offline. The Kerberos TGT which *should* have remained valid, is being deleted. This is painful because applications like Evolution will get notified as soon as the VPN comes up again and will try to communicate... while they don't have a valid TGT because winbind hasn't *quite* managed to replace it in time. child_process_request: request fn PAM_AUTH [17087]: dual pam auth GER\dwoodhou winbindd_dual_pam_auth: domain: GER last was online winbindd_dual_pam_auth_kerberos is_myname("GER") returns 0 using ccache: FILE:/tmp/krb5cc_500 winbindd_raw_kerberos_login: uid is 500 kerberos_kinit_password: as dwoodhou@GER.CORP.INTEL.COM using [FILE:/tmp/krb5cc_500] as ccache and config [(null)] no krb5_error kinit failed for 'dwoodhou@GER.CORP.INTEL.COM' with: Cannot contact any KDC for requested realm (-1765328228) winbindd_dual_pam_auth_kerberos failed: NT_STATUS_NO_LOGON_SERVERS winbindd_dual_pam_auth_kerberos setting domain to offline set_domain_offline: called for domain GER set_domain_offline: added event handler for domain GER messaging_dgm_send: Sending message to 17087 winbindd_dual_pam_auth_cached get_cache: Setting ADS methods for domain GER centry_expired: Key NS/GER/DWOODHOU for domain GER valid as domain is offline. wcache_fetch: returning entry NS/GER/DWOODHOU for domain GER name_to_sid: [Cached] - cached name for domain GER status: NT_STATUS_OK messaging_recv_cb: Received message 0x40c len 4 (num_fds:0) from 17089 centry_expired: Key CRED/S-1-5-21-2052111302-1275210071-1644491937-279532 for domain GER valid as domain is offline. wcache_fetch: returning entry CRED/S-1-5-21-2052111302-1275210071-1644491937-279532 for domain GER Domain GER is marked as offline now. wcache_get_creds: [Cached] - cached creds for user S-1-5-21-2052111302-1275210071-1644491937-279532 status: NT_STATUS_OK ... wcache_tdc_fetch_domain: Searching for domain GER wcache_tdc_fetch_domain: Found domain GER using ccache: FILE:/tmp/krb5cc_500 add_ccache_to_list: successfully destroyed krb5 ccache FILE:/tmp/krb5cc_500 for user GER\dwoodhou add_ccache_to_list: ref count on entry GER\dwoodhou is now 2 winbindd_add_memory_creds_internal: ref count for user GER\dwoodhou is now 2 winbindd_add_memory_creds returned: NT_STATUS_OK wcache_save_creds: S-1-5-21-2052111302-1275210071-1644491937-279532
This code is doing it: source3/winbindd/winbindd_cred_cache.c:add_ccache_to_list() 519 /* If it is cached login, destroy krb5 ticket 520 * to avoid surprise. */ 521 #ifdef HAVE_KRB5 522 if (postponed_request) { 523 /* ignore KRB5_FCC_NOFILE error here */ 524 ret = ads_kdestroy(ccname); 525 if (ret == KRB5_FCC_NOFILE) { 526 ret = 0; 527 } 528 if (ret) { 529 DEBUG(0, ("add_ccache_to_list: failed to destroy " 530 "user krb5 ccache %s with %s\n", ccname, 531 error_message(ret))); 532 return krb5_to_nt_status(ret); 533 } 534 DEBUG(10, ("add_ccache_to_list: successfully destroyed " 535 "krb5 ccache %s for user %s\n", ccname, 536 username)); 537 } 538 #endif This commit shows the details. git show f389b97c6
(In reply to Jeremy Allison from comment #2) > 519 /* If it is cached login, destroy krb5 ticket > 520 * to avoid surprise. */ That's a... rather opaque comment. It's not entirely clear what form this "surprise" would take. One might normally expect such things to be expounded in the commit comment... but no, that's somewhat taciturn too. What would the failure mode be if we *didn't* destroy the existing krb5 ticket? And why is there no better workaround, like actually inspecting it to see what its renew/refresh times are and setting our timers accordingly?
(In reply to David Woodhouse from comment #3) Yeah, I'm not sure I understand precisely the logic here. From the f389b97c6 commit there is a comment: + /* This is evil, if the ticket was already expired. + * renew ticket function returns KRB5KRB_AP_ERR_TKT_EXPIRED. + * But there is still a chance that we can rekinit it. + * + * This happens when user login in online mode, and then network + * down or something cause winbind goes offline for a very long time, + * and then goes online again. ticket expired, renew failed. + * This happens when machine are put to sleep for a long time, + * but shorter than entry->renew_util. + * NB + * Looks like the KDC is reachable, we want to rekinit as soon as + * possible instead of waiting some time later. */ which I'm not sure I follow. Can you explain exactly the logic you want here ?
(In reply to Jeremy Allison from comment #4) > Can you explain exactly the logic you want here ? That's simple: Never delete a valid TGT and leave me with nothing. If you can get a *new* one, fine. Please do it atomically. But if you temporarily can't communicate with the server and you delete my valid TGT in a fit of pique, that's bad. This was *really* painful a few weeks ago when we had some infrastructure problems. Anyone with an existing TGT was OK, but once it expired you couldn't get a new one and everything stopped working. And then I *really* hated this bug, and vowed to chase it up :)
OK, but this is the key "a valid TGT".. How do we know if it's valid and not expired when we're offline ? What logic should we use here ?
I'm perfectly happy to remove the word 'valid'. If there's already a TGT when you're authenticating in offline mode, don't delete it at all. Who cares if it's valid or not? Later on when you come back online, you can get a shiny new TGT, which might replace the one that already existed. If your renew/refresh logic is "surprised" by the existence of a TGT which you didn't expect, let's fix that. Although like you, I still don't quite see what the problem was there.
Created attachment 11395 [details] git-am possible patch for master. David can you check if this does what you want ? Alexander, can you take a look and see if this looks ok to you ?
Comment on attachment 11395 [details] git-am possible patch for master. We discussed with Jakub and keeping existing ccache is the behavior SSSD has as well -- in offline mode it injects a placeholder TGT (expired in Unix epoch start time) because the ccache path is exposed via KRB5CCNAME to the environment. My only worry would be if we have another places which depend on the valid ticket in the user's ccache. If that code is not expecting an expired ticket, it might fail.
Comment on attachment 11395 [details] git-am possible patch for master. Forgot my RB+
Please note that the ccache might not only contain TGTs but service tickets as we. Although the KDC might not be reachable which triggers a transition into the offline mode there might be still valid service tickets in the ccache for services which are still reachable, think e.g. of NFS.
Created attachment 11410 [details] winbind log (In reply to Jeremy Allison from comment #8) > David can you check if this does what you want ? It doesn't delete the valid TGT when I do an offline login, certainly. It also didn't immediately get me a *new* one when I subsequently went online, though. That wasn't what I expected. Further testing shows that even if I have no existing TGT, or if I have an expired TGT, it never gets me a new one. After I go online, I still see... [dwoodhou@i7 1.1.fc22]$ wbinfo --online-status BUILTIN : online DWOODHOU-LINUX : online GER : online [dwoodhou@i7 1.1.fc22]$ wbinfo -K dwoodhou Enter dwoodhou's password: plaintext kerberos password authentication for [dwoodhou] succeeded (requesting cctype: FILE) user_flgs: NETLOGON_CACHED_ACCOUNT credentials were put in: FILE:/tmp/krb5cc_500 But it lies. The credentials *weren't* put in FILE:/tmp/krb5cc_500. Winbind log attached. I start it offline, run 'wbinfo -K dwoodhou' while offline, then join the VPN (which prods it to go online) and then do the above.
Hm, I think you can disregard comment 12; I cannot repeat it. I didn't change anything — instead of running the packaged build (Fedora's 4.2.2-1 with the patch applied), I tried running the *same* build from its build directory, as a prelude to reverting the patch and double-checking the 'gain TGT after going online' behaviour. It worked fine. At some point I disabled and re-enabled SELinux, and double-checked yet again that it was enabled. And the packaged build that I originally tested is no longer showing the same behaviour. The only thing that's changed between then and now is that I had another cup of tea. Therefore I have to blame comment #12 on the fact that I was insufficiently caffeinated. Will continue to run with this build and report any issues that arise.
(In reply to David Woodhouse from comment #13) Thanks. Fix has gone into master. Once you confirm it's good I'll back-port for 4.2.next, 4.3.0 and 4.1.next.