Bug 870 - smbd spuriously reports NT_STATUS_NO_LOGON_SERVERS
smbd spuriously reports NT_STATUS_NO_LOGON_SERVERS
Status: RESOLVED WORKSFORME
Product: Samba 3.0
Classification: Unclassified
Component: winbind
3.0.2
All Solaris
: P3 critical
: none
Assigned To: Gerald (Jerry) Carter
http://www.daloft.com/rick/samba3.err...
:
Depends on:
Blocks: 807 1294
  Show dependency treegraph
 
Reported: 2003-12-09 12:01 UTC by Rick Brown
Modified: 2005-02-08 20:42 UTC (History)
0 users

See Also:


Attachments
log files from error period (750.52 KB, application/octet-stream)
2004-01-05 06:13 UTC, Rick Brown
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rick Brown 2003-12-09 12:01:17 UTC
I've got an odd problem with samba3 using domain authentication..  occassionally
smbd will report that it can't find the logon server.   This condition only
appears to happen when 30+ users are mapping drives, and is easily remedied
by stopping and starting smbd.   This problem is only happening on a couple
of my machines - and the problematic times don't correspond between the pair. 
Every other machine pointing at this particular password server hasn't had a
single problem.   I've tried 3.0.0 and 3.0.1rc3, and both yield the same
results. 

The errors look like this in my logs:
[2003/11/26 08:53:25, 3] auth/auth.c:(216)
  check_ntlm_password:  Checking password for unmapped user
[EIS459]\[be6]@[EIS459] with the new password interface
[2003/11/26 08:53:25, 3] auth/auth.c:(219)
  check_ntlm_password:  mapped user is: [GT]\[be6]@[EIS459]
[2003/11/26 08:53:25, 3] smbd/sec_ctx.c:(256)
  push_sec_ctx(0, 0) : sec_ctx_stack_ndx = 1
[2003/11/26 08:53:25, 3] smbd/uid.c:(287)
  push_conn_ctx(0) : conn_ctx_stack_ndx = 0
[2003/11/26 08:53:25, 3] smbd/sec_ctx.c:(288)
  setting sec ctx (0, 0) - sec_ctx_stack_ndx = 1
[2003/11/26 08:53:25, 3] smbd/sec_ctx.c:(386)
  pop_sec_ctx (0, 0) - sec_ctx_stack_ndx = 0
[2003/11/26 08:53:25, 4] passdb/secrets.c:(255)
  Using cleartext machine password
[2003/11/26 08:53:25, 4] passdb/secrets.c:(255)
  Using cleartext machine password
[2003/11/26 08:53:25, 4] libsmb/namequery.c:(1350)
  get_dc_list: returning 1 ip addresses in an ordered list
[2003/11/26 08:53:25, 4] libsmb/namequery.c:(1351)
  get_dc_list: 130.207.165.193:0
[2003/11/26 08:53:29, 2] auth/auth.c:(309)
  check_ntlm_password:  Authentication for user [be6] -> [be6] FAILED with error
NT_STATUS_NO_LOGON_SERVERS

I've made several days worth of logs available at the URL above:
http://www.daloft.com/samba3.errors.tar.gz

The relevant sections of my smb.conf are:
[global]
   server string = psoft devel
   netbios name = pool17
   workgroup = GT
   security = domain
   password server = gate3.gatech.edu
   name resolve order = host
   smb ports = 445 139
   hosts allow = localhost .gatech.edu .oit.gatech.edu .vpn.gatech.edu
   hosts deny = .resnet.gatech.edu .eastnet.gatech.edu
   socket options = TCP_NODELAY
   log file = /var/log/samba3/log.%m
   log level = 4
   max log size = 500
idmap uid = 60003-60004
idmap gid = 60003-60004
share modes = yes
Comment 1 Gerald (Jerry) Carter 2003-12-22 13:37:07 UTC
You are disabling netbios name resolution when 
you set the following:

   name resolve order = host

Sine you are in domain mode security, Samba must resolve the 
DOMAIN<0x1b> and DOMAIN<0x1c> names.

Set this to 

   name resolve order = host wins bcast
Comment 2 Gerald (Jerry) Carter 2003-12-22 13:38:22 UTC
btw.  the URL you gave returns a 404 error.
Comment 3 Rick Brown 2004-01-03 10:11:52 UTC
ok, changing name resolve order, but it's odd that I was getting away
with using 
   name resolve order = host
at all.   I fixed the link to my logs - which shows it working for a long
while before dying mysteriously.

Comment 4 Rick Brown 2004-01-05 06:13:49 UTC
Created attachment 348 [details]
log files from error period

this file contains my samba-3.0.0 log directory (with log level = 10) during a
period where users were seeing "NT_STATUS_NO_LOGON_SERVERS"

the log files have been pruned to save space - I discard any that didn't have
these errors.
Comment 5 Rick Brown 2004-01-05 06:17:55 UTC
changed config file to read:
   name resolve order = host wins bcast

and still spuriously get NT_STATUS_NO_LOGON_SERVER..   note, this machine is 
but one of many samba servers contacting gate3, none of the others experience
this problem (or if they did, I dropped them back to security=server).

I just attached relevant log files, with log level = 10.
Comment 6 Rick Brown 2004-02-10 06:36:48 UTC
still there in 3.0.2   Will send up detailed logs tomorrow morning. 
Comment 7 Gerald (Jerry) Carter 2004-03-08 10:12:28 UTC
What kind of DC is this?  It is not responding to a 
name_status_find() requeust on udp/137?  Looks like you DC 
is getting overloaded.  I would suggest trying
'password server = gate3.gatech.edu *'


  get_sorted_dc_list: attempting lookup using [host wins bcast]
  internal_resolve_name: looking up gate3.gatech.edu#20
  Returning valid cache entry: key = NBT/GATE3.GATECH.EDU#20, value =
130.207.165.193:0, timeout = Mon Jan  5 08:49:43 2004

  name gate3.gatech.edu#20 found.
  remove_duplicate_addrs2: looking for duplicate address/port pairs
  get_dc_list: returning 1 ip addresses in an ordered list
  get_dc_list: 130.207.165.193:0
  name_status_find: looking up GT#1c at 130.207.165.193
  Cache entry with key = NBT/GT#1C.20.130.207.165.193 couldn't be found
  namecache_status_fetch: no entry for NBT/GT#1C.20.130.207.165.193 found.
  Deleting cache entry (key = NBT/GT#1C.20.130.207.165.193)
  bind succeeded on port 0
  Sending a packet of len 50 to (130.207.165.193) on port 137
  Sending a packet of len 50 to (130.207.165.193) on port 137
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  name_status_find: name not found

Comment 8 Rick Brown 2004-03-08 11:19:08 UTC
This DC is:
gate3.gatech.edu: uname -a
SunOS gate3.gatech.edu 5.8 Generic_108528-20 sun4u sparc SUNW,UltraAX-i2
gate3.gatech.edu: prtdiag -v
System Configuration:  Sun Microsystems  sun4u Sun Netra X1 (UltraSPARC-IIe
400MHz)
System clock frequency: 100 MHz
Memory size: 128 Megabytes

========================= CPUs =========================

                    Run   Ecache   CPU    CPU
Brd  CPU   Module   MHz     MB    Impl.   Mask
---  ---  -------  -----  ------  ------  ----
 0     0     0      400     0.2   13       1.4


========================= IO Cards =========================


No failures found in System
===========================

========================= HW Revisions =========================

ASIC Revisions:
---------------

System PROM revisions:
------------------
  CORE 1.0.1 2001/02/19 09:55   


gate3.gatech.edu: uptime
  2:06pm  up 195 day(s),  5:46,  1 user,  load average: 0.01, 0.02, 0.05


using an ldap backend - which is uglier than you want to know about... basically
we hacked MIT kerberos to store NT and LANMAN hashes, which get dumped to 
this local ldap database for samba authentication.   Note, we did this way
before Active Directory was dreamt up.   Here's the config:
 [global]
   debug level = 2
   workgroup = GT
   server string = Campus Samba Authentication Server
   security = user
   smb ports = 445 139
   log file = /var/log/samba3/log.%m
   max log size = 2000
   ldap admin dn = "cn=samba,dc=auth,dc=gatech,dc=edu"
   ldap ssl = off
   ldap delete dn = yes
   ldap user suffix = ou=people,dc=auth,dc=gatech,dc=edu
   ldap machine suffix = ou=machines,dc=auth,dc=gatech,dc=edu
   ldap suffix = "dc=auth,dc=gatech,dc=edu"
   passdb backend = ldapsam:ldap://localhost
   socket options = TCP_NODELAY
   local master = yes
   os level = 90
   domain master = yes
   preferred master = yes
   netbios name = gate3
   domain logons = yes
   wins support = yes
   dns proxy = no
[netlogon]
   comment = Network Logon Service
   path = /usr/local/samba-3.0.0/lib/netlogon
   guest ok = yes
   writable = no
   share modes = no

gate3 is the only machine of this type -- we're using it for domain
authentication as we transition to AD.   In the mean time, gate1 and gate2 
are referred to by samba machines still using security=server. 

Comment 9 Gerald (Jerry) Carter 2004-03-17 05:50:07 UTC
Rick,

It is possible that the other 2 samba boxes (with security = 
server) are draining the sockets on your Samba pdc since each smbd 
on the file server will maintain a connection to the DC.

Unless you have some other idea why nmbd is not responding 
on port 137?   Maybe check the output from netstat on the 
Samba PDC?
Comment 10 Rick Brown 2004-03-17 07:30:14 UTC
the other 2 gate machines are using an smbpasswd file, which is dumped
out of the MIT database periodically.  

gate3 (which does domain authentication) is also being used by a few other
samba servers around campus, but none of them ever see this transient problem.
Comment 11 Gerald (Jerry) Carter 2005-02-08 20:42:50 UTC
never could reproduce.  If you still see this in 3.0.11, let me know.