Bug 12204 - Samba fails to replicate schema 69
Samba fails to replicate schema 69
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB
4.5.0rc3
All All
: P5 normal
: 4.5
Assigned To: Andrew Bartlett
Samba QA Contact
:
Depends on:
Blocks: 10999
  Show dependency treegraph
 
Reported: 2016-09-03 00:05 UTC by Marc Muehlfeld
Modified: 2016-09-05 23:23 UTC (History)
3 users (show)

See Also:


Attachments
Level 10 debug log snippet (1.76 MB, application/x-gzip)
2016-09-03 00:05 UTC, Marc Muehlfeld
no flags Details
Proposed WIP patch for master (1.21 KB, patch)
2016-09-03 10:00 UTC, Andrew Bartlett
no flags Details
Level 10 debug log. Master with patch (4.07 MB, application/x-gzip)
2016-09-03 12:55 UTC, Marc Muehlfeld
no flags Details
WIP patch for master (improved) (1.36 KB, patch)
2016-09-03 20:12 UTC, Andrew Bartlett
no flags Details
WIP patch for master (further improved) (4.96 KB, patch)
2016-09-03 20:39 UTC, Andrew Bartlett
no flags Details
Level 10 debug log from run with improved patch (4.23 MB, application/x-gzip)
2016-09-03 21:35 UTC, Marc Muehlfeld
no flags Details
ldbsearch result (3.41 KB, application/x-bzip2)
2016-09-03 21:50 UTC, Marc Muehlfeld
no flags Details
WIP patch for master (further improved #2) (4.85 KB, patch)
2016-09-04 00:24 UTC, Andrew Bartlett
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Muehlfeld 2016-09-03 00:05:08 UTC
Created attachment 12426 [details]
Level 10 debug log snippet

Description:
Samba fails to replicate the Windows Server 2012 R2 directory schema (69) from a Windows 2008 R2 DC.



Steps to reproduce:
- Set up a Samba AD DC
- Join a 2008 R2 DC to the Samba AD
- Move the Schema Master and Infrastructure Master FSMO role to the 2008 R2 DC
  (This is necessary for the 2012 join, because the directory schema is
  updated using the WMI protocol, that Samba does not support yet)
- Join a 2012 R2 DC to the AD (select the 2008 R2 DC as replication source 
  during dcpromo)



Actual results:
During 2012 R2 updates the schema, Samba looses the connection to the 2008 R2 DC's schema partition:

CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com
	Default-First-Site-Name\WIN2008R2 via RPC
		DSA object GUID: f53c97d9-601c-4e36-94c0-6a91f20d7ddb
		Last attempt @ Sat Sep  3 01:29:34 2016 CEST failed, result 1359 (WERR_INTERNAL_ERROR)
		62 consecutive failure(s).
		Last success @ Sat Sep  3 01:05:50 2016 CEST

The log shows:
[2016/09/03 01:23:37.645846,  0, pid=1135, effective(0, 0), real(0, 0)] ../source4/dsdb/repl/replicated_objects.c:358(dsdb_repl_make_working_schema)
  ../source4/dsdb/repl/replicated_objects.c:358: dsdb_repl_resolve_working_schema() failed: WERR_INTERNAL_ERRORFailed to create working schema: WERR_INTERNAL_ERROR
[2016/09/03 01:23:37.648327,  4, pid=1135, effective(0, 0), real(0, 0)] ../source4/dsdb/repl/drepl_out_pull.c:178(dreplsrv_pending_op_callback)
  dreplsrv_op_pull_source(WERR_INTERNAL_ERROR) for CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com



Expected results:
Samba should be able to receive the schema update from the 2008 R2 schema master.



Additional information:
* Windows Server 2012 R2 DC and schema 69 support in general work with Samba 4.5. If I have a Windows 2012 R2 AD and join a Samba 4.5 DC (see Wiki) to the domain, replication works successfully and Samba receives the version 69 directory schema. It's only not working the other way around.

* Please let me know if you need the full level 10 debug log capture from the join process. It's ~250MB and I can't upload it to the ticket.
Comment 1 Andrew Bartlett 2016-09-03 06:51:32 UTC
The issue is with:

CN=Organizational-Person,CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com

Can you show that to me?

(I'll then compare with the version in the schema samba ships).
Comment 2 Andrew Bartlett 2016-09-03 10:00:18 UTC
Created attachment 12427 [details]
Proposed WIP patch for master

This patch should fix the issue shown in the logs.  Let me know if it resolves the issue!

(We should also bump up the schema in our provision to this version, but that is for another bug).
Comment 3 Marc Muehlfeld 2016-09-03 12:55:42 UTC
Created attachment 12428 [details]
Level 10 debug log. Master with patch

(In reply to Andrew Bartlett from comment #2)

I updated the Samba DCs to master and applied your patch.

When 2012 R2 now joins the domain, it ends in:
CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com
	Default-First-Site-Name\WIN2008R2 via RPC
		DSA object GUID: 14af0be5-f7a5-491e-979f-69cd32698576
		Last attempt @ Sat Sep  3 14:50:44 2016 CEST failed, result 58 (WERR_BAD_NET_RESP)
		13 consecutive failure(s).
		Last success @ Sat Sep  3 14:48:08 2016 CEST


Anyway, the level 10 debug log capture on one Samba DC was now small enough to attach it to the ticket.
Comment 4 Andrew Bartlett 2016-09-03 20:12:56 UTC
Created attachment 12435 [details]
WIP patch for master (improved)

This patch lowers the noise a bit further, but the fundamental issue is:

[2016/09/03 14:49:29.683087,  0, pid=927, effective(0, 0), real(0, 0)] ../source4/dsdb/repl/replicated_objects.c:737(dsdb_replicated_objects_convert)
  Failed to convert object CN=Organizational-Person,CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com: WERR_DS_ATT_NOT_DEF_IN_SCHEMA
[2016/09/03 14:49:29.683309,  0, pid=927, effective(0, 0), real(0, 0)] ../source4/dsdb/repl/drepl_out_helpers.c:908(dreplsrv_op_pull_source_apply_changes_trigger)
  Failed to convert objects: WERR_DS_ATT_NOT_DEF_IN_SCHEMA/NT_STATUS_INVALID_NETWORK_RESPONSE

The issue here is that the Windows 2012R2 schema is larger than the DRS replication page size.  That means that one of the new attributes on the Organizational-Person object hasn't been sent to us yet, when we try to replicate it. 

Fixing this will be harder, but to be sure can you provide the ldbsearch of me that object on all servers?

Also include the replPropertyMetaData exploded with --show-binary.

Thanks!
Comment 5 Andrew Bartlett 2016-09-03 20:39:21 UTC
Created attachment 12436 [details]
WIP patch for master (further improved)

I think this should address the issue, by increasing the replication page size for schema.
Comment 6 Marc Muehlfeld 2016-09-03 21:35:05 UTC
Created attachment 12437 [details]
Level 10 debug log from run with improved patch

(In reply to Andrew Bartlett from comment #5)
I reset my environmet, applied your improved patch, and retried the join:

CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com
	Default-First-Site-Name\WIN2008R2 via RPC
		DSA object GUID: 14af0be5-f7a5-491e-979f-69cd32698576
		Last attempt @ Sat Sep  3 23:31:10 2016 CEST failed, result 1359 (WERR_INTERNAL_ERROR)
		13 consecutive failure(s).
		Last success @ Sat Sep  3 23:26:41 2016 CEST


New level 10 debug log of the process attached.
Comment 7 Marc Muehlfeld 2016-09-03 21:50:41 UTC
Created attachment 12438 [details]
ldbsearch result

(In reply to Andrew Bartlett from comment #4)
> Fixing this will be harder, but to be sure can you provide the ldbsearch of me 
> that object on all servers?
> 
> Also include the replPropertyMetaData exploded with --show-binary.

I'm not sure if the attached is what you requested. If not, can you please tell me the command(s) to run?
Comment 8 Marc Muehlfeld 2016-09-03 21:57:10 UTC
Can we set this bug as blocker for 4.5? This release brings so much stuff for schema 69 support and it would be great if we could announce with 4.5.0, that joining a 2012 DC works (at least as experimental feature). I already wrote a detailed guide for the Wiki (not published yet) about the process and all the things to take care of.
Comment 9 Andrew Bartlett 2016-09-04 00:23:44 UTC
(In reply to Marc Muehlfeld from comment #8)
Our rules don't allow it to be a blocker, but if the wins keep coming as quickly as these patches so far, then it might get in anyway. 

The last patch had a regression, try this one.
Comment 10 Andrew Bartlett 2016-09-04 00:24:37 UTC
Created attachment 12439 [details]
WIP patch for master (further improved #2)
Comment 11 Marc Muehlfeld 2016-09-04 09:59:29 UTC
I made a first try and it seems to work now:


CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com
	Default-First-Site-Name\DC2 via RPC
		DSA object GUID: c14a774f-9732-4ec2-b9fa-2156c95c4e48
		Last attempt @ Sun Sep  4 11:51:33 2016 CEST was successful
		0 consecutive failure(s).
		Last success @ Sun Sep  4 11:51:33 2016 CEST


# ldbsearch -H /usr/local/samba/private/sam.ldb -b 'cn=Schema,cn=Configuration,dc=samdom,dc=example,dc=com' -s base objectVersion
# record 1
dn: CN=Schema,CN=Configuration,DC=samdom,DC=example,DC=com
objectVersion: 69




Let me do some further checks...
Comment 12 Andrew Bartlett 2016-09-04 19:39:59 UTC
We can't push this to master, as this seems to causes a failure in another test:

Schema-DN[CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com] objects[1197/1556] linked_values[0/0]
Schema-DN[CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com] objects[1330/1556] linked_values[0/0]
Schema-DN[CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com] objects[1463/1556] linked_values[0/0]
Schema-DN[CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com] objects[1556/1556] linked_values[0/0]
Analyze and apply schema objects
Failed to add prefixMap: operations error at ../source4/dsdb/samdb/ldb_modules/objectclass_attrs.c:355
Delete of machine account smbtorturedc was successful.
A transaction is still active in ldb context [0x2ab2d3a61700] on /home/ubuntu/autobuild/b26068/samba/bin/ab/tmp/smbtortureiynlTQ/libnet_BecomeDC.m2mJ2e/private/sam.ldb
UNEXPECTED(failure): samba4.net.api.become.dc.api.become.dc(ad_dc_ntvfs)
REASON: Exception: Exception: ../source4/torture/libnet/libnet_BecomeDC.c:114: status was NT_STATUS_UNSUCCESSFUL, expected NT_STATUS_OK: libnet_BecomeDC() failed - NT_STATUS_UNSUCCESSFUL (null)

However, we are close, and this should not be too hard to sort out.
Comment 13 Stefan Metzmacher 2016-09-05 10:41:52 UTC
(In reply to Andrew Bartlett from comment #12)

Instead of using more objects per chunk, we should cache the results
similar to libnet_vampire_cb_schema_chunk():

        if (!s->schema_part.first_object) {
                s->schema_part.object_count = object_count;
                s->schema_part.first_object = talloc_steal(s, first_object);
        } else {
                s->schema_part.object_count             += object_count;
                s->schema_part.last_object->next_object = talloc_steal(s->schema_part.last_object,
                                                                       first_object);
        }
        for (cur = first_object; cur->next_object; cur = cur->next_object) {}
        s->schema_part.last_object = cur;

        if (!c->partition->more_data) {
                return libnet_vampire_cb_apply_schema(s, c);
        }

So we only take a look at the replicated objects once we collected all of them.
Comment 14 Marc Muehlfeld 2016-09-05 19:03:50 UTC
(In reply to Marc Muehlfeld from comment #11)
> Let me do some further checks...

From the user perspective it looks good. The schema is updated. Good work.

I saw a few things about replication, but I'm not sure if this is related to this bug report. That's why I sent this to Andrew in an email first for discussion.
Comment 15 Andrew Bartlett 2016-09-05 23:23:27 UTC
(In reply to Stefan Metzmacher from comment #13)
The issue I have with that approach is that I'm not confident to make that change for 4.5 at this point.

I do agree it would be cleaner for master, and may not be that difficult to implement, once we find the right place to hang the object list on.