Created attachment 12603 [details]
Reverses bug 11520 against 4.5.1 src
I had 4 samba 4.5.0 ADS DCs. I could connect via SMB to two of them and not to another two. I'd get an error "The request is not supported". I'd also get an "RPC server is unavailable" when trying to connect ADUC to the two DCs that I couldn't via SMB.
I also intermittently got an "Access Denied" message when trying to RDP to a member Windows 2008 R2 server, but nothing in the Windows event log on the member server nor in the samba logs. I don't have many member Windows servers, but only had issues with the one.
I also got errors when trying to join Linux (winbind) or Windows 2008 R2 members both indicating a SID structure issue.
/usr/bin/net join -w MEDITURE -S dc01.mediture.dom -U Administrator
Enter Administrator's password:
Failed to join domain: failed to lookup DC info for domain 'MEDITURE.DOM' over rpc: Indicates the SID structure is not valid.
ADS join did not work, falling back to RPC...
After downgrading to 4.4.6 I had the same problems. I downgraded again to 4.4.5 and the issues were resolved. Prior to upgrading to 4.5.0, I was stable on 4.4.4. I upgraded to 4.5.0 to resolve the security vulnerability and get the old password fix.
I applied the patch for bug 11520 to 4.4.5 and then could reproduce the problem, so I discovered the issue is related to the fix for that bug.
I had the same issue with 4.5.1 vanilla. I was able to reverse the fixes from 11520 against 4.5.1 (see attached). A 4.5.1 build with that patch applied is working fine for me.
Another user on the samba mailing list had the same issue with 4.5.0 on a freshly built domain. There may be two others. None have tried my patch, but I believe this will be reproducible with the following steps.
1. Provision a samba 4.5.1 ADS domain
2. Join two samba 4.5.1 DCs to the domain (3 DCs total)
3. Attempt to join a Linux member
4. Attempt to join a Windows 2008 R2 member
You can reproduce other issues with the following steps.
1. Provision a samba 4.4.5 ADS domain
2. Join two samba 4.4.5 DCs to the domain (3 DCs total)
3. Join a Linux member
4. Join a Windows 2008 R2 member
5. Upgrade DCs to 4.5.1
6. Run samba-tool dbcheck --cross-nsc --fix on all DCs saying yes to all replPropertyMetaData issues
7. Attempt to join a Linux member
8. Attempt to join a Windows 2008 R2 member
9. Attempt to login to the existing Windows 2008 R2 member
10. Attempt to connect via Windows 7 x64 or Windows 2008 R2 client to CIFS shares on DCs
11. Attempt to connect via ADUC to all DCs
Logs to follow.
It is even easier to reproduce. You can reproduce the SMB access issue with the following steps. I've found the SMB access issues to be best indicator of this issue.
1. Provision a 4.5.1 ADS domain
/usr/local/samba/bin/samba-tool domain provision --interactive --use-rfc2307 --dns-backend=BIND9_DLZ --domain=MEDITURE --realm=MEDITURE.DOM
Server Role (dc, member, standalone) [dc]:
DNS backend (SAMBA_INTERNAL, BIND9_FLATFILE, BIND9_DLZ, NONE) [SAMBA_INTERNAL]: BIND9_DLZ
2. Try to access SMB shares from a Windows 2008 R2 or Windows 7 x64 client. It will say "\\test-dc.mediture.dom is not accessible. You might not have permission to use this network resource. Contact the administrator of this server to find out if you have access permissions.
The request is not supported." An analysis with Process Monitor reveals that the request to the NETLOGON named pipe is failing.
Created attachment 12604 [details]
Logs w/ debug = 100
Comment on attachment 12604 [details]
Logs w/ debug = 100
[2016/10/27 09:40:16.744426, 0] ../source4/winbind/winbindd.c:47(winbindd_done)
winbindd daemon died with exit status 1
Any idea why winbindd died? what is in the winbind logs?
I included the winbind logs in the zip, but those logs don't start until after 9:40. I'm not sure.
Well after ~ week OK with the patched 4.5.1 the SID problem emerged again. I reverted to 4.4.5 and I'm OK for now.
(In reply to Arthur Ramsey from comment #5)
The logs indicate something pretty serious is wrong with the secrets.ldb or secrets.tdb files, or our handling of them.
[2016/10/27 09:40:17.061301, 0] ../lib/util/util_runcmd.c:316(samba_runcmd_io_handler)
[2016/10/27 09:40:17.067716, 0] ../lib/util/util_runcmd.c:316(samba_runcmd_io_handler)
/usr/local/samba/sbin/samba_dnsupdate: ERROR(runtime): uncaught exception - (-1073741606, 'Configuration information could not be read from the domain controller, either because the machine is unavailable or access has been denied.')
The actual error string isn't important, but this shows something went pretty wrong.
If you can get me the logs even at level 1 (this is only a level 0 log, certainly not level 100), this will give much more information.
In particular, I'm looking for a line starting:
Could not find machine account in secrets database:
Now that you mention it, I remember seeing that error and thinking it was very promising. I'm 100% I also saw "Could not find machine account in secrets database:". I forgot about it at some point when testing started pointing me in the direction of the fix for 11520. I'll see if I can reproduce that error, but I pretty sure you've found it.
I don't think it file corruption given that it is intermittent and doesn't happen with 4.4.5.
Created attachment 12608 [details]
Logs w/ debug >= 1
Attached logs from when the issues happening with debug level set to 1 or greater.
Created attachment 12609 [details]
Logs from vsc-dc02
(In reply to Arthur Ramsey from comment #9)
2016/10/08 23:08:11.425009, 0] ../source3/lib/util.c:478(reinit_after_fork)
messaging_reinit() failed: NT_STATUS_DISK_FULL
This is pretty suspect. May I ask if the disk is full?
[2016/10/21 12:57:49.910200, 0] ../source3/lib/util.c:478(reinit_after_fork)
messaging_reinit() failed: NT_STATUS_DISK_FULL
ldb: unable to dlopen /usr/local/samba/lib/ldb/dsdb_notification.so : /usr/local/samba/lib/private/libdsdb-module-samba4.so: version `SAMBA_4.5.0' not found (required by /usr/local/samba/lib/ldb/dsdb_notification.so)
ldb: unable to dlopen /usr/local/samba/lib/ldb/vlv.so : /usr/local/samba/lib/private/libsamdb-common-samba4.so: version `SAMBA_4.5.0' not found (required by /usr/local/samba/lib/ldb/vlv.so)
This indicates that this server (vsc-dc02) is not running Samba 4.5.0, but 4.5.1 however we are finding ldb modules that require 4.5.0, or is running some kind of mix.
Please clean our the /usr/local/samba/lib /usr/local/samba/bin and /usr/local/samba/sbin directories and do a clean re-install of Samba 4.5.1 as some of these errors just come from mixing up different binary versions.
Can you please post your smb.conf file?
Do you have any idmap statements in it?
If so, can you remove any other than idmap_ldb:use rfc2307 (if in use).
Also, please run 'net cache flush' before you re-test.
I have had issues with running out of disk and mixed version binaries throughout my testing, but have been aware of it and clearing those conditions. All issues reported have occurred without those conditions present. The mailing list had already suggested removing any idmap config, but hadn't mentioned doing a net cache flush. I am aware of net cache flush and do it regularly when troubleshooting though. I'll make sure I do all these things and see if it happens again, but I'm quite sure it will.
Created attachment 12610 [details]
smb.conf used while logs were collected
Does it make sense that it is intermittent if the issue is out-of-range UIDs?
I only had one account out of range: Administrator. That could explain why example the join fails. Many of the failures, including a join, have also occurred with my named account though. My named account is in range.
[root@dc01 sam.ldb.d]# ldbsearch -H DC\=MEDITURE\,DC\=DOM.ldb uidNumber=* | grep uidNumber | perl -pe 's/.*: //g' | sort -n | head
[root@dc01 sam.ldb.d]# ldbsearch -H DC\=MEDITURE\,DC\=DOM.ldb uidNumber=* | grep uidNumber | perl -pe 's/.*: //g' | sort -n | tail -n 1
I have since removed the following section per your recommendation.
idmap config *: backend = tdb
idmap config *: range = 90000001-100000000
idmap config MEDITURE: backend = ad
idmap config MEDITURE: range = 10000-49999
idmap config MEDITURE: schema mode = rfc2307
I had tried removing that before per Rowland's recommendation, but don't think I issued a net cache flush.
(In reply to Arthur Ramsey from comment #16)
Just looking at this and I noticed something, you have given the result of a search for uidNumber attributes, the top one is '500'. Does this belong to Administrator ??
If so, can I suggest you remove it and allow Administrator to be mapped to root again.
I don't want Administrator to be effectively root or even be allowed for terminal login. I just want to use it as a shared account for authenticating to Active Directory, mostly for joins, if at all. I assigned an in-range ID to Administrator, but set the shell to a non-existent one.
Marking as a duplicate of bug 12410 where we got a clear and clean reproduction of the issue and confirmation of the regression.
*** This bug has been marked as a duplicate of bug 12410 ***
I can confirm that removing the idmap config and performing a net cache flush resolves my issue. I don't think that idmap config should be causing an issue given all UIDs are now in-range. I agree I have the same issue as 12410. Thanks Andrew.
(In reply to Arthur Ramsey from comment #18)
Administrator needs to be mapped to root, just like you shouldn't have had the 'idmap config' lines in a DC smb.conf. If you want to fully fix your Samba AD setup, remove the uidNumber attribute from Administrator.