Bug 9020 - winbind failure on ssh login, failed to mlock memory: Cannot allocate memory (12)
Summary: winbind failure on ssh login, failed to mlock memory: Cannot allocate memory ...
Status: RESOLVED INVALID
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.6.5
Hardware: All FreeBSD
: P5 critical
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-01 15:32 UTC by Michael Letzgus-Koppmann
Modified: 2012-07-16 14:36 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Letzgus-Koppmann 2012-07-01 15:32:31 UTC
When trying to login via ssh and pam_winbind.so, sometimes (after a few hours without errors) this error occurs and login is not possible:


Jul  1 17:26:22 uche-pc1-pc1 sshd[33333]: pam_winbind(sshd): internal module error (retval = PAM_AUTHINFO_UNAVAIL(12), user = 'ad\mletzgus')
Jul  1 17:26:22 uche-pc1-pc1 sshd[33331]: error: PAM: authentication information is unavailable for ad\\mletzgus from pc1.uni-bielefeld.de

Jul  1 17:26:22 uche-pc1-pc1 winbindd[28465]: [2012/07/01 17:26:22.118219,  0] winbindd/winbindd_cred_cache.c:845(delete_memory_creds)
Jul  1 17:26:22 uche-pc1-pc1 winbindd[28465]:   failed to munlock memory: Cannot allocate memory (12)
Jul  1 17:26:22 uche-pc1-pc1 sshd[33333]: pam_winbind(sshd): internal module error (retval = PAM_AUTHINFO_UNAVAIL(12), user = 'ad\mletzgus')
Jul  1 17:26:22 uche-pc1-pc1 sshd[33331]: error: PAM: authentication information is unavailable for ad\\mletzgus from pc1.uni-bielefeld.de


But:

wbinfo -a works and domain membership is fine:

Status:
  net ads testjoin:                          Join is OK
  net ads workgroup:                         Workgroup: AD
  wbinfo -p:                                 Ping to winbindd succeeded
  wbinfo -P:                                 checking the NETLOGON dc connection succeeded
  wbinfo -t:                                 checking the trust secret for domain AD via RPC calls succeeded

WINS lookup:
  wbinfo -N uche-pc1-pc1:                    129.70.70.250      uche-pc1-pc1

Winbind:
  smbcontrol winbindd ping:                  PONG from pid 28465
  smbcontrol winbindd onlinestatus:          PID 28465: global:Online BUILTIN:Online UCHE-PC1-PC1:Online AD:Online
  smbcontrol winbindd validate-cache:        Winbindd cache is valid. (answer from pid 33583)

smbd/nmbd:
  smbcontrol smbd ping:                      PONG from pid 28462
  smbcontrol nmbd ping:                      PONG from pid 28459


===========================================

#
# $FreeBSD: release/9.0.0/etc/pam.d/sshd 197769 2009-10-05 09:28:54Z des $
#
# PAM configuration for the "sshd" service
#

# auth
#auth           sufficient      pam_opie.so             no_warn no_fake_prompts
#auth           requisite       pam_opieaccess.so       no_warn allow_local
#auth           sufficient      pam_krb5.so             no_warn try_first_pass
#auth           sufficient      pam_ssh.so              no_warn try_first_pass

auth            sufficient      pam_unix.so             no_warn try_first_pass
auth            required        /usr/local/lib/pam_winbind.so          try_first_pass cached_login krb5_auth krb5_ccache_type=FILE require_membership_of=AD\UCHE-PC1


# account
account         required        pam_nologin.so
#account        required        pam_krb5.so
account         required        pam_login_access.so
account         sufficient      /usr/local/lib/pam_winbind.so          try_first_pass cached_login krb5_auth krb5_ccache_type=FILE require_membership_of=AD\UCHE-PC1
account         required        pam_unix.so

# session
#session        optional        pam_ssh.so              want_agent
session         required        pam_permit.so

# password
#password       sufficient      pam_krb5.so             no_warn try_first_pass
password        required        pam_unix.so             no_warn try_first_pass


===========================================

smb.conf:

[global]

    netbios name = uche-pc1-pc1

    server string = PC1-Server reloaded

    bind interfaces only = yes
    interfaces = 127.0.0.0/8 129.70.70.0/24
    socket options = SO_KEEPALIVE IPTOS_LOWDELAY TCP_NODELAY

#    use sendfile = yes
    aio read size = 16384

    realm = ad.uni-bielefeld.de
    workgroup = AD
    security = ads

    name resolve order = host wins bcast

    wins server = 129.70.4.79 129.70.4.80 129.70.4.106

    allow trusted domains = No

    winbind use default domain = no
    winbind refresh tickets = yes

    winbind offline logon = yes
    winbind reconnect delay = 30

    winbind cache time = 1800

    winbind enum users = no
    winbind enum groups = yes
    
    winbind expand groups = 2
    winbind nested groups = yes

    idmap config *: backend = rid
    idmap config *: range = 100000-9000000

    idmap config AD: backend = rid
    idmap config AD: range = 100000-9000000

    template shell = /bin/bash
    template homedir = /home/%D/%U


    time server = yes
    unix charset = utf8

    
    load printers = no
    printing = bsd
    printcap name = /dev/null
    disable spoolss = yes

    syslog only = yes

    min protocol = NT1

    create mask = 0600
    directory mask = 0700

    log level = 1 passdb:1 auth:1 winbind:1

    unix extensions = no
    



[rootz$]
    hosts allow = 129.70.172.0/23 129.70.70.100

    read only = yes
    browseable = no
    Valid Users = root

    path = /


[homes]
    hosts allow = 129.70.70.0/24 129.70.172.0/23 127.0.0.1/8

    read only = no
    browseable = yes
    Valid Users = @AD\uche-pc1

    comment = home directory

    follow symlinks = yes
    wide links      = yes


    ## Solaris + ZFS ACL (NFSv4 ACL) with aclinherit=passthrough
    vfs objects = zfsacl
    nfs4:acedup = merge
    nfs4:chown = yes
    nfs4:mode = special

    ## ACL inheritance is done by ZFS
    inherit acls = no

    ## Avoid chmod(2) that breaks ACL
    inherit permissions = no
    force create mode = 00000
    force security mode = 00000
    force directory mode = 00000
    force directory security mode = 00000
    store dos attributes = yes

    ## ZFS ACL implements "write_acl" and "write_owner" permissions that
    ## is compatible with Windows (NT) ACL better than "dos filemode = yes"
    dos filemode = no
Comment 1 Michael Letzgus-Koppmann 2012-07-01 16:43:16 UTC
output from winbindd with log level 9 when trying to login via ssh/pam_winbind:


accepted socket 23
[34487]: request interface version
[34487]: request location of privileged pipe
accepted socket 24
closing socket 23, client exited
getpwnam ad\mletzgus
accepted socket 23
[34489]: request interface version
[34489]: request location of privileged pipe
closing socket 23, client exited
accepted socket 23
getpwnam ad\mletzgus
getpwnam ad\mletzgus
accepted socket 25
[34489]: request interface version
[34489]: request location of privileged pipe
closing socket 25, client exited
accepted socket 25
lookupname AD\UCHE-PC1
[34489]: pam auth ad\mletzgus
child daemon request 13
[34478]: dual pam auth AD\mletzgus
is_myname("AD") returns 0
ads_cleanup_expired_creds: Ticket in ccache[FILE:/tmp/krb5cc_104494] expiration Mon, 02 Jul 2012 04:42:05 CEST
Found account name from PAC: mletzgus []
Setting unix username to [AD\mletzgus]
Plain-text authentication for user AD\mletzgus returned NT_STATUS_OK (PAM: 0)
Finished processing child request 13
failed to mlock memory: Resource temporarily unavailable (35)
closing socket 25, client exited
closing socket 23, client exited
accepted socket 23
[34490]: request interface version
[34490]: request location of privileged pipe
closing socket 23, client exited
accepted socket 23
getpwnam ad\mletzgus
Comment 2 Michael Letzgus-Koppmann 2012-07-01 18:23:27 UTC
even higher debug level:

Plain-text authentication for user AD\mletzgus2 returned NT_STATUS_OK (PAM: 0)
Finished processing child request 13
Writing 3893 bytes to parent
timed_events_timeout: 3599/504701
select will use timeout of 3599.504701 seconds
s3_event: Destroying timer event 0x80582c990 "tevent_req_timedout"
failed to mlock memory: Resource temporarily unavailable (35)
winbindd_add_memory_creds returned: NT_STATUS_NETWORK_BUSY
wb_request_done[59289:PAM_AUTH]: NT_STATUS_NETWORK_BUSY
s3_event: Schedule immediate event "tevent_queue_immediate_trigger": 0x80582c390
s3_event: Run immediate event "tevent_queue_immediate_trigger": 0x80582c390
winbind_client_response_written[59289:PAM_AUTH]: delivered response to client
closing socket 26, client exited
Comment 3 Michael Letzgus-Koppmann 2012-07-01 20:41:42 UTC
It's not onyl related do SSH/PAM,

wbinfo -K ad\\mletzgus

stops working, too:

plaintext kerberos password authentication for [ad\mletzgus] failed (requesting cctype: FILE)
Could not authenticate user [ad\mletzgus] with Kerberos (ccache: FILE)


A couple of minutes befor the answer was:

plaintext kerberos password authentication for [ad\mletzgus] succeeded (requesting cctype: FILE)
credentials were put in: FILE:/tmp/krb5cc_0

So in principle it works...

Just a guess: My System has 16GB RAM and was running rsync on a ZFS filesystem.
ZFS may use a lot of memory - and winbinds error message was "failed to mlock memory: Resource temporarily unavailable (35)". Maybe there are not enough resources of something?

CPU:  0.3% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.3% idle
Mem: 42M Active, 83M Inact, 7770M Wired, 664K Cache, 218M Buf, 7897M Free
Swap: 10G Total, 10G Free

There are still 8 GB left... should be enough for winbind...
Comment 5 Michael Letzgus-Koppmann 2012-07-03 20:39:38 UTC
It seems to be not a winbind specific problem but a problem with FreeBSDs default config.

Good summarization of the problem:
http://ximalas.info/2012/04/10/not-enough-room-for-wired-pages/
Comment 6 Michael Letzgus-Koppmann 2012-07-04 15:01:04 UTC
Finally, it was not a winbindd-problem.

Problem was caused by FreeBSD, there was not enough wired-pages memory, it was exhausted by the ZFS filesystem. But winbindd (and other software) needs to lock wired pages ( mlock() ).

Solution for FreeBSD:
http://forums.freebsd.org/showthread.php?t=33111
Comment 7 Michael Adam 2012-07-16 14:36:47 UTC
Thanks for the report and for solving it yourself, before we (I) really looked.. :)