I've got an odd problem with samba3 using domain authentication.. occassionally smbd will report that it can't find the logon server. This condition only appears to happen when 30+ users are mapping drives, and is easily remedied by stopping and starting smbd. This problem is only happening on a couple of my machines - and the problematic times don't correspond between the pair. Every other machine pointing at this particular password server hasn't had a single problem. I've tried 3.0.0 and 3.0.1rc3, and both yield the same results. The errors look like this in my logs: [2003/11/26 08:53:25, 3] auth/auth.c:(216) check_ntlm_password: Checking password for unmapped user [EIS459]\[be6]@[EIS459] with the new password interface [2003/11/26 08:53:25, 3] auth/auth.c:(219) check_ntlm_password: mapped user is: [GT]\[be6]@[EIS459] [2003/11/26 08:53:25, 3] smbd/sec_ctx.c:(256) push_sec_ctx(0, 0) : sec_ctx_stack_ndx = 1 [2003/11/26 08:53:25, 3] smbd/uid.c:(287) push_conn_ctx(0) : conn_ctx_stack_ndx = 0 [2003/11/26 08:53:25, 3] smbd/sec_ctx.c:(288) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 1 [2003/11/26 08:53:25, 3] smbd/sec_ctx.c:(386) pop_sec_ctx (0, 0) - sec_ctx_stack_ndx = 0 [2003/11/26 08:53:25, 4] passdb/secrets.c:(255) Using cleartext machine password [2003/11/26 08:53:25, 4] passdb/secrets.c:(255) Using cleartext machine password [2003/11/26 08:53:25, 4] libsmb/namequery.c:(1350) get_dc_list: returning 1 ip addresses in an ordered list [2003/11/26 08:53:25, 4] libsmb/namequery.c:(1351) get_dc_list: 130.207.165.193:0 [2003/11/26 08:53:29, 2] auth/auth.c:(309) check_ntlm_password: Authentication for user [be6] -> [be6] FAILED with error NT_STATUS_NO_LOGON_SERVERS I've made several days worth of logs available at the URL above: http://www.daloft.com/samba3.errors.tar.gz The relevant sections of my smb.conf are: [global] server string = psoft devel netbios name = pool17 workgroup = GT security = domain password server = gate3.gatech.edu name resolve order = host smb ports = 445 139 hosts allow = localhost .gatech.edu .oit.gatech.edu .vpn.gatech.edu hosts deny = .resnet.gatech.edu .eastnet.gatech.edu socket options = TCP_NODELAY log file = /var/log/samba3/log.%m log level = 4 max log size = 500 idmap uid = 60003-60004 idmap gid = 60003-60004 share modes = yes
You are disabling netbios name resolution when you set the following: name resolve order = host Sine you are in domain mode security, Samba must resolve the DOMAIN<0x1b> and DOMAIN<0x1c> names. Set this to name resolve order = host wins bcast
btw. the URL you gave returns a 404 error.
ok, changing name resolve order, but it's odd that I was getting away with using name resolve order = host at all. I fixed the link to my logs - which shows it working for a long while before dying mysteriously.
Created attachment 348 [details] log files from error period this file contains my samba-3.0.0 log directory (with log level = 10) during a period where users were seeing "NT_STATUS_NO_LOGON_SERVERS" the log files have been pruned to save space - I discard any that didn't have these errors.
changed config file to read: name resolve order = host wins bcast and still spuriously get NT_STATUS_NO_LOGON_SERVER.. note, this machine is but one of many samba servers contacting gate3, none of the others experience this problem (or if they did, I dropped them back to security=server). I just attached relevant log files, with log level = 10.
still there in 3.0.2 Will send up detailed logs tomorrow morning.
What kind of DC is this? It is not responding to a name_status_find() requeust on udp/137? Looks like you DC is getting overloaded. I would suggest trying 'password server = gate3.gatech.edu *' get_sorted_dc_list: attempting lookup using [host wins bcast] internal_resolve_name: looking up gate3.gatech.edu#20 Returning valid cache entry: key = NBT/GATE3.GATECH.EDU#20, value = 130.207.165.193:0, timeout = Mon Jan 5 08:49:43 2004 name gate3.gatech.edu#20 found. remove_duplicate_addrs2: looking for duplicate address/port pairs get_dc_list: returning 1 ip addresses in an ordered list get_dc_list: 130.207.165.193:0 name_status_find: looking up GT#1c at 130.207.165.193 Cache entry with key = NBT/GT#1C.20.130.207.165.193 couldn't be found namecache_status_fetch: no entry for NBT/GT#1C.20.130.207.165.193 found. Deleting cache entry (key = NBT/GT#1C.20.130.207.165.193) bind succeeded on port 0 Sending a packet of len 50 to (130.207.165.193) on port 137 Sending a packet of len 50 to (130.207.165.193) on port 137 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ name_status_find: name not found
This DC is: gate3.gatech.edu: uname -a SunOS gate3.gatech.edu 5.8 Generic_108528-20 sun4u sparc SUNW,UltraAX-i2 gate3.gatech.edu: prtdiag -v System Configuration: Sun Microsystems sun4u Sun Netra X1 (UltraSPARC-IIe 400MHz) System clock frequency: 100 MHz Memory size: 128 Megabytes ========================= CPUs ========================= Run Ecache CPU CPU Brd CPU Module MHz MB Impl. Mask --- --- ------- ----- ------ ------ ---- 0 0 0 400 0.2 13 1.4 ========================= IO Cards ========================= No failures found in System =========================== ========================= HW Revisions ========================= ASIC Revisions: --------------- System PROM revisions: ------------------ CORE 1.0.1 2001/02/19 09:55 gate3.gatech.edu: uptime 2:06pm up 195 day(s), 5:46, 1 user, load average: 0.01, 0.02, 0.05 using an ldap backend - which is uglier than you want to know about... basically we hacked MIT kerberos to store NT and LANMAN hashes, which get dumped to this local ldap database for samba authentication. Note, we did this way before Active Directory was dreamt up. Here's the config: [global] debug level = 2 workgroup = GT server string = Campus Samba Authentication Server security = user smb ports = 445 139 log file = /var/log/samba3/log.%m max log size = 2000 ldap admin dn = "cn=samba,dc=auth,dc=gatech,dc=edu" ldap ssl = off ldap delete dn = yes ldap user suffix = ou=people,dc=auth,dc=gatech,dc=edu ldap machine suffix = ou=machines,dc=auth,dc=gatech,dc=edu ldap suffix = "dc=auth,dc=gatech,dc=edu" passdb backend = ldapsam:ldap://localhost socket options = TCP_NODELAY local master = yes os level = 90 domain master = yes preferred master = yes netbios name = gate3 domain logons = yes wins support = yes dns proxy = no [netlogon] comment = Network Logon Service path = /usr/local/samba-3.0.0/lib/netlogon guest ok = yes writable = no share modes = no gate3 is the only machine of this type -- we're using it for domain authentication as we transition to AD. In the mean time, gate1 and gate2 are referred to by samba machines still using security=server.
Rick, It is possible that the other 2 samba boxes (with security = server) are draining the sockets on your Samba pdc since each smbd on the file server will maintain a connection to the DC. Unless you have some other idea why nmbd is not responding on port 137? Maybe check the output from netstat on the Samba PDC?
the other 2 gate machines are using an smbpasswd file, which is dumped out of the MIT database periodically. gate3 (which does domain authentication) is also being used by a few other samba servers around campus, but none of them ever see this transient problem.
never could reproduce. If you still see this in 3.0.11, let me know.