Bug 12393 - Various services respond SID structure is not valid
Summary: Various services respond SID structure is not valid
Status: RESOLVED DUPLICATE of bug 12410
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: DCE-RPCs and pipes (show other bugs)
Version: 4.5.0
Hardware: All Linux
: P5 regression (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-27 13:55 UTC by Arthur Ramsey
Modified: 2016-11-12 20:47 UTC (History)
2 users (show)

See Also:


Attachments
Reverses bug 11520 against 4.5.1 src (14.40 KB, patch)
2016-10-27 13:55 UTC, Arthur Ramsey
no flags Details
Logs w/ debug = 100 (46.36 KB, text/plain)
2016-10-27 14:56 UTC, Arthur Ramsey
no flags Details
Logs w/ debug >= 1 (4.75 MB, application/zip)
2016-10-29 00:45 UTC, Arthur Ramsey
no flags Details
Logs from vsc-dc02 (100.58 KB, application/zip)
2016-10-29 00:50 UTC, Arthur Ramsey
no flags Details
smb.conf used while logs were collected (1.29 KB, text/plain)
2016-10-29 19:06 UTC, Arthur Ramsey
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arthur Ramsey 2016-10-27 13:55:40 UTC
Created attachment 12603 [details]
Reverses bug 11520 against 4.5.1 src

I had 4 samba 4.5.0 ADS DCs.  I could connect via SMB to two of them and not to another two.  I'd get an error "The request is not supported".  I'd also get an "RPC server is unavailable" when trying to connect ADUC to the two DCs that I couldn't via SMB.

I also intermittently got an "Access Denied" message when trying to RDP to a member Windows 2008 R2 server, but nothing in the Windows event log on the member server nor in the samba logs.  I don't have many member Windows servers, but only had issues with the one.

I also got errors when trying to join Linux (winbind) or Windows 2008 R2 members both indicating a SID structure issue.

/usr/bin/net join -w MEDITURE -S dc01.mediture.dom -U Administrator
Enter Administrator's password:
Failed to join domain: failed to lookup DC info for domain 'MEDITURE.DOM' over rpc: Indicates the SID structure is not valid.
ADS join did not work, falling back to RPC...

After downgrading to 4.4.6 I had the same problems.  I downgraded again to 4.4.5 and the issues were resolved.  Prior to upgrading to 4.5.0, I was stable on 4.4.4.  I upgraded to 4.5.0 to resolve the security vulnerability and get the old password fix.

I applied the patch for bug 11520 to 4.4.5 and then could reproduce the problem, so I discovered the issue is related to the fix for that bug.

I had the same issue with 4.5.1 vanilla.  I was able to reverse the fixes from 11520 against 4.5.1 (see attached).  A 4.5.1 build with that patch applied is working fine for me. 

Another user on the samba mailing list had the same issue with 4.5.0 on a freshly built domain.  There may be two others.  None have tried my patch, but I believe this will be reproducible with the following steps.

1. Provision a samba 4.5.1 ADS domain
2. Join two samba 4.5.1 DCs to the domain (3 DCs total)
3. Attempt to join a Linux member
4. Attempt to join a Windows 2008 R2 member

You can reproduce other issues with the following steps.

1. Provision a samba 4.4.5 ADS domain
2. Join two samba 4.4.5 DCs to the domain (3 DCs total)
3. Join a Linux member
4. Join a Windows 2008 R2 member
5. Upgrade DCs to 4.5.1
6. Run samba-tool dbcheck --cross-nsc --fix on all DCs saying yes to all replPropertyMetaData issues
7. Attempt to join a Linux member
8. Attempt to join a Windows 2008 R2 member
9. Attempt to login to the existing Windows 2008 R2 member
10. Attempt to connect via Windows 7 x64 or Windows 2008 R2 client to CIFS shares on DCs
11. Attempt to connect via ADUC to all DCs

Logs to follow.
Comment 1 Arthur Ramsey 2016-10-27 14:55:29 UTC
It is even easier to reproduce.  You can reproduce the SMB access issue with the following steps. I've found the SMB access issues to be best indicator of this issue.

1. Provision a 4.5.1 ADS domain
/usr/local/samba/bin/samba-tool domain provision --interactive --use-rfc2307 --dns-backend=BIND9_DLZ --domain=MEDITURE  --realm=MEDITURE.DOM
Realm [MEDITURE.DOM]:
 Domain [MEDITURE]:
 Server Role (dc, member, standalone) [dc]:
 DNS backend (SAMBA_INTERNAL, BIND9_FLATFILE, BIND9_DLZ, NONE) [SAMBA_INTERNAL]: BIND9_DLZ

2. Try to access SMB shares from a Windows 2008 R2 or Windows 7 x64 client.  It will say "\\test-dc.mediture.dom is not accessible.  You might not have permission to use this network resource.  Contact the administrator of this server to find out if you have access permissions.

The request is not supported."  An analysis with Process Monitor reveals that the request to the NETLOGON named pipe is failing.
Comment 2 Arthur Ramsey 2016-10-27 14:56:57 UTC
Created attachment 12604 [details]
Logs w/ debug = 100

Logs attached
Comment 3 Andrew Bartlett 2016-10-28 09:47:01 UTC
Comment on attachment 12604 [details]
Logs w/ debug = 100

[2016/10/27 09:40:16.744426,  0] ../source4/winbind/winbindd.c:47(winbindd_done)
  winbindd daemon died with exit status 1

Any idea why winbindd died?  what is in the winbind logs?
Comment 4 Arthur Ramsey 2016-10-28 13:47:38 UTC
I included the winbind logs in the zip, but those logs don't start until after 9:40.  I'm not sure.
Comment 5 Arthur Ramsey 2016-10-28 17:20:10 UTC
Well after ~ week OK with the patched 4.5.1 the SID problem emerged again.  I reverted to 4.4.5 and I'm OK for now.
Comment 6 Andrew Bartlett 2016-10-28 17:48:47 UTC
(In reply to Arthur Ramsey from comment #5)

The logs indicate something pretty serious is wrong with the secrets.ldb or secrets.tdb files, or our handling of them.

[2016/10/27 09:40:17.061301,  0] ../lib/util/util_runcmd.c:316(samba_runcmd_io_handler)
  /usr/local/samba/sbin/samba_dnsupdate:     self.creds.set_machine_account(lp)
[2016/10/27 09:40:17.067716,  0] ../lib/util/util_runcmd.c:316(samba_runcmd_io_handler)
  /usr/local/samba/sbin/samba_dnsupdate: ERROR(runtime): uncaught exception - (-1073741606, 'Configuration information could not be read from the domain controller, either because the machine is unavailable or access has been denied.')

The actual error string isn't important, but this shows something went pretty wrong.

If you can get me the logs even at level 1 (this is only a level 0 log, certainly not level 100), this will give much more information. 

In particular, I'm looking for a line starting:

 Could not find machine account in secrets database: 

Thanks!
Comment 7 Arthur Ramsey 2016-10-28 18:53:29 UTC
Now that you mention it, I remember seeing that error and thinking it was very promising.  I'm 100% I also saw "Could not find machine account in secrets database:".  I forgot about it at some point when testing started pointing me in the direction of the fix for 11520.  I'll see if I can reproduce that error, but I pretty sure you've found it.
Comment 8 Arthur Ramsey 2016-10-28 19:29:19 UTC
I don't think it file corruption given that it is intermittent and doesn't happen with 4.4.5.
Comment 9 Arthur Ramsey 2016-10-29 00:45:19 UTC
Created attachment 12608 [details]
Logs w/ debug >= 1

Attached logs from when the issues happening with debug level set to 1 or greater.
Comment 10 Arthur Ramsey 2016-10-29 00:50:02 UTC
Created attachment 12609 [details]
Logs from vsc-dc02
Comment 11 Andrew Bartlett 2016-10-29 08:44:43 UTC
(In reply to Arthur Ramsey from comment #9)
2016/10/08 23:08:11.425009,  0] ../source3/lib/util.c:478(reinit_after_fork)
  messaging_reinit() failed: NT_STATUS_DISK_FULL

This is pretty suspect.  May I ask if the disk is full?
Comment 12 Andrew Bartlett 2016-10-29 08:56:47 UTC
Furthermore:  

[2016/10/21 12:57:49.910200,  0] ../source3/lib/util.c:478(reinit_after_fork)
  messaging_reinit() failed: NT_STATUS_DISK_FULL
ldb: unable to dlopen /usr/local/samba/lib/ldb/dsdb_notification.so : /usr/local/samba/lib/private/libdsdb-module-samba4.so: version `SAMBA_4.5.0' not found (required by /usr/local/samba/lib/ldb/dsdb_notification.so)
ldb: unable to dlopen /usr/local/samba/lib/ldb/vlv.so : /usr/local/samba/lib/private/libsamdb-common-samba4.so: version `SAMBA_4.5.0' not found (required by /usr/local/samba/lib/ldb/vlv.so)

This indicates that this server (vsc-dc02) is not running Samba 4.5.0, but 4.5.1 however we are finding ldb modules that require 4.5.0, or is running some kind of mix.

Please clean our the /usr/local/samba/lib /usr/local/samba/bin and /usr/local/samba/sbin directories and do a clean re-install of Samba 4.5.1 as some of these errors just come from mixing up different binary versions.
Comment 13 Andrew Bartlett 2016-10-29 09:27:33 UTC
Can you please post your smb.conf file?

Do you have any idmap statements in it?

If so, can you remove any other than idmap_ldb:use rfc2307 (if in use). 

Thanks!
Comment 14 Andrew Bartlett 2016-10-29 09:44:11 UTC
Also, please run 'net cache flush' before you re-test.
Comment 15 Arthur Ramsey 2016-10-29 15:52:38 UTC
I have had issues with running out of disk and mixed version binaries throughout my testing, but have been aware of it and clearing those conditions.  All issues reported have occurred without those conditions present.  The mailing list had already suggested removing any idmap config, but hadn't mentioned doing a net cache flush.  I am aware of net cache flush and do it regularly when troubleshooting though.  I'll make sure I do all these things and see if it happens again, but I'm quite sure it will.
Comment 16 Arthur Ramsey 2016-10-29 19:06:22 UTC
Created attachment 12610 [details]
smb.conf used while logs were collected

Does it make sense that it is intermittent if the issue is out-of-range UIDs?

I only had one account out of range: Administrator.  That could explain why example the join fails.  Many of the failures, including a join, have also occurred with my named account though.  My named account is in range.

[root@dc01 sam.ldb.d]# ldbsearch -H DC\=MEDITURE\,DC\=DOM.ldb uidNumber=* | grep uidNumber | perl -pe 's/.*: //g' | sort -n | head
500
11105
11106
11107
11108
11112
11113
11117
11125
11126

[root@dc01 sam.ldb.d]# ldbsearch -H DC\=MEDITURE\,DC\=DOM.ldb uidNumber=* | grep uidNumber | perl -pe 's/.*: //g' | sort -n | tail -n 1
21204

I have since removed the following section per your recommendation.

idmap config *: backend = tdb
idmap config *: range = 90000001-100000000

idmap config MEDITURE: backend = ad
idmap config MEDITURE: range = 10000-49999
idmap config MEDITURE: schema mode = rfc2307

I had tried removing that before per Rowland's recommendation, but don't think I issued a net cache flush.
Comment 17 Rowland Penny 2016-11-01 22:13:05 UTC
(In reply to Arthur Ramsey from comment #16)

Just looking at this and I noticed something, you have given the result of a search for uidNumber attributes, the top one is '500'. Does this belong to Administrator ??

If so, can I suggest you remove it and allow Administrator to be mapped to root again.
Comment 18 Arthur Ramsey 2016-11-01 22:35:47 UTC
I don't want Administrator to be effectively root or even be allowed for terminal login.  I just want to use it as a shared account for authenticating to Active Directory, mostly for joins, if at all.  I assigned an in-range ID to Administrator, but set the shell to a non-existent one.
Comment 19 Andrew Bartlett 2016-11-08 17:52:02 UTC
Marking as a duplicate of bug 12410 where we got a clear and clean reproduction of the issue and confirmation of the regression. 

Thanks!

*** This bug has been marked as a duplicate of bug 12410 ***
Comment 20 Arthur Ramsey 2016-11-08 18:17:59 UTC
I can confirm that removing the idmap config and performing a net cache flush resolves my issue.  I don't think that idmap config should be causing an issue given all UIDs are now in-range.  I agree I have the same issue as 12410.  Thanks Andrew.
Comment 21 Rowland Penny 2016-11-12 20:47:45 UTC
(In reply to Arthur Ramsey from comment #18)
Administrator needs to be mapped to root, just like you shouldn't have had the 'idmap config' lines in a DC smb.conf. If you want to fully fix your Samba AD setup, remove the uidNumber attribute from Administrator.