Bug 10614 - a few times per day nobody can login: Windows cannot obtain the domain controller name
a few times per day nobody can login: Windows cannot obtain the domain contro...
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
All Linux
: P5 critical
: ---
Assigned To: Samba QA Contact
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2014-05-17 08:38 UTC by Bram Matthys
Modified: 2015-07-07 09:09 UTC (History)
2 users (show)

See Also:

smb.conf (8.40 KB, text/plain)
2014-05-17 09:01 UTC, Bram Matthys
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bram Matthys 2014-05-17 08:38:24 UTC
A few times per day - seemingly when a lot of users try to login - we are having problems where tens of people/computers suddenly can't login. The issue seems to resolve itself after 5-20 minutes without intervention. All existing connections are seemingly unaffected (other computers where users are already logged on to).

This problem started when we migrated from Samba 3.x to Samba 4.1 two weeks ago. It happens every day at roughly the same time when a lot of people turn on their computer or log in, and also at a few other moments during the day which may or may not vary.

These are all Windows XP clients.

Event log shows two variants of the error:
Windows cannot obtain the domain controller name for your computer network. (The specified domain either does not exist or could not be contacted. ). Group Policy processing aborted. 
Windows cannot copy file \\Green\profiles\gebruiker\Templates to location C:\Documents and Settings\llxxxxyyyy\Templates. Possible causes of this error include network problems or insufficient security rights. If this problem persists, contact your network administrator.

DETAIL - The specified network name is no longer available. 
^ Naturally the exact file/directory differs, it just seems the DC has suddenly 'gone away'.

I have ICINGA running which connects to this server every two minutes by IP (\\192.168.2.X\sharename), it has never raised an alarm. Same for the LDAP check. I also have a DNS check (although it resolves just google.com), never raised an alarm either. So the problem seems to be related to 'finding the DC'.

Is this a known issue or do you have any suggestions as to how to debug (or even fix) this?

How can I emulate this 'finding the DC' process? Is that done by WINS? DNS?
Is there a samba command line option to do this? So I can add a script to check for this e.g. every 30 seconds.

I have loglevel 3 logs available.

The load when all these users try to log in isn't very high. I currently have two CPU's and it seems neither one is using 100%. The only thing I noticed in Munin graphs is that there's a spike of UDP connections when this error happens... which may be related to this problem (a cause or a result).

Any help would be appreciated as this is naturally is a major problem for us.
Comment 1 Bram Matthys 2014-05-17 09:01:58 UTC
Created attachment 9947 [details]

Samba configuration file attached (smb.conf + included file).

I couldn't find a way to make attachments 'private' here so did not attach my loglevel 3 logs. I can e-mail the files on request (ask here, or mail syzop@vulnscan.org).
Comment 2 Bram Matthys 2014-05-20 15:28:41 UTC
My co-worker said that, on a computer he couldn't login due to this problem, if he logged in locally as administrator then he could browse the network (files) just fine.

I asked previously for a (Linux) command line tool to emulate finding the PDC so I could create an alert system / see when it happens. Have searched myself in 'net' but couldn't find any.

I just launched the following two commands in a batch script that loops every 5 seconds, and will check the results in 24 hours or so:
netdom query pdc /DOMAIN:XXXXXXX
netdom query pdc /DOMAIN:jnet.xxxxxxxxxxxx.nl

I also fired up a script on the Linux side to do "host -t srv _ldap._tcp.dc._msdcs.jnet.hermanjordan.nl" every X number of seconds.

I'll let you know the results.

Additionally, I started a thread on the samba mailing list called "Samba 4 + Windows XP very slow - especially noticeable with many files". http://marc.info/?l=samba&m=140059905627497&w=2
The issue describes slow performance with XP when I copy/read 1000 files of 10kb... which takes about half a minute on XP but only 3 seconds on Win7.
So far I've been dealing with it as a separate issue, but it may also very well be related.
Comment 3 Bram Matthys 2014-05-21 09:36:22 UTC
Still seeing many of these on clients: "No Domain Controller is available for domain JORDANET due to the following:  There are currently no logon servers available to service the logon request"

On my two machines the logs indicate no problem of finding the domain controller. These scripts run every 5 seconds so it should have caught something if it's a general problem.

Script 1: NETDOM QUERY PDC (every 5s). All is good:
Primary domain controller for the domain:
The command completed successfully.
Primary domain controller for the domain:
The command completed successfully.

Script 2: Similarly for DNS:
host -t srv _ldap._tcp.dc._msdcs.jnet.xxxxxxxxxxx.nl
host -t srv _kerberos._tcp.dc._msdcs.jnet.xxxxxxxxxxx.nl
host -t srv _gc._tcp.jnet.xxxxxxxxxxx.nl
host -t srv _kerberos._tcp.jnet.xxxxxxxxxxx.nl
host -t srv _kpasswd._tcp.jnet.xxxxxxxxxxx.nl
host -t srv _ldap._tcp.jnet.xxxxxxxxxxx.nl
All returned correct results (ran every 5 seconds for the past XX hours)

I'm out of ideas now..