My production DC is running Samba 3.6.0pre1 just fine (CentOS 5.3, OpenLDAP backend). I can join any Windows XP or 7 workstations. When trying to compile and use newer samba version (3.6.0 pre2,pre3,rc1,rc2,final), I can't join new workstations to the domain. I compile it using the same command on the same server. I get no information in any logfile, and it doesn't create a log.hostname file. Whet trying to join the domaine, Windows XP asks for domain administrator user&password. Then after a minute, I got a message saying that it couldn't find a domain controler for that domain. Thanks for your help. Guillaume
Please provide a network trace of the successful and the failed attempt to join the domain, taken on the DC. See http://wiki.samba.org/index.php/Capture_Packets. Volker
Created attachment 6783 [details] clientside_domain_join_failure_3.6.0.pcap
Created attachment 6784 [details] clientside_domain_join_success_3.6.0pre1.pcap
Created attachment 6785 [details] serverside_domain_join_failure_3.6.0.pcap
Created attachment 6786 [details] serverside_domain_join_success_3.6.0pre1.pcap
Attached 4 captures (both client and server side, success on 3.6.0pre1 and failure on 3.6.0). Client name = tg1320442 Client IP = 172.20.33.100 domain=TG DC = 172.20.0.2 TRANSGENE (transgene.transgene.fr) tg1320442$ account exists in LDAP. Guillaume
(In reply to comment #1) > Please provide a network trace of the successful and the failed attempt to join > the domain, taken on the DC. See > http://wiki.samba.org/index.php/Capture_Packets. > > Volker Trace posted. Thx. Guillaume
Still not working in 3.6.1.
Guillaume, can you please test if this fix https://attachments.samba.org/attachment.cgi?id=7117 from https://bugzilla.samba.org/show_bug.cgi?id=8371 fixes your problem?
(In reply to comment #9) > Guillaume, can you please test if this fix > https://attachments.samba.org/attachment.cgi?id=7117 > from > https://bugzilla.samba.org/show_bug.cgi?id=8371 > fixes your problem? Hi, Unfortunately it doesn't. I also tried this one in addition and dit not get better results : https://attachments.samba.org/attachment.cgi?id=7118
To me it seems that in the serverside domain join failure frame 70 is not correctly being responded to. This would be the task of nmbd. Can you do a new network trace and post a debug level 10 log of nmbd for this failure? Thanks, Volker Lendecke
Created attachment 7232 [details] nmbd logging of error See line 711 for first occurence of the error.
Created attachment 7233 [details] wireshark logging of domain join error This is the wireshark capture for the logfile just submitted.
Hi, I hope my current problem fits this bug. But I guess it does. Have a look at the added log.nmbd and the wireshark capture that I have uploaded a few moments ago. I hope this problem can be solved soon... Max
I'm lost, sorry. Looked at the traces, I do see the error. I installed a fresh XP and a 3.6.1 based domain with your data (domain name and computer name) and it just works. Do you have weird locale settings on your DC? Volker
Hi. What is your definition of weird? ;) As I understand the error message "Conversion error: Illegal multibyte sequence()" correct, there should be *any* character between the brackets?! But there is nothing? Max
Hi again ... I don't know if this also is connected to this bug. But there is a problem when trying to log in to *another* XP machine which still is registered inside the domain: I get a message box indicating that the domain controller is not available or the computer account cannot be found. There is no indication of any type of error inside the log.nmbd, but inside syslog there are the following entries: Jan 6 23:27:28 server smbd[31419]: [2012/01/06 23:27:28.514480, 0] rpc_server/netlogon/srv_netlog_nt.c:976(_netr_ServerAuthenticate3) Jan 6 23:27:28 server smbd[31419]: _netr_ServerAuthenticate3: netlogon_creds_server_check failed. Rejecting auth request from client XPDATEV machine account XPDATEV$ Jan 6 23:27:28 server smbd[31419]: [2012/01/06 23:27:28.535395, 0] rpc_server/netlogon/srv_netlog_nt.c:976(_netr_ServerAuthenticate3) Jan 6 23:27:28 server smbd[31419]: _netr_ServerAuthenticate3: netlogon_creds_server_check failed. Rejecting auth request from client XPDATEV machine account XPDATEV$ As well log.xpdatev is full of stuff for this timecode. Do you think this might help? Max
I get the same error ("Illegal multibyte sequence") with locale reporting en_US.utf8 for all variables on the samba server machine. smb.conf has "dos charset = 850" and "unix charset = ISO8859-1", but I don't know if those matter in this context. Downgraded to 3.5.11 and it works fine.
(In reply to comment #11) > To me it seems that in the serverside domain join failure frame 70 is not > correctly being responded to. This would be the task of nmbd. Can you do a new > network trace and post a debug level 10 log of nmbd for this failure? > > Thanks, > > Volker Lendecke 3.6.2 doesn't fix the problem. Posting traces...
Created attachment 7304 [details] Server side 3.6.2 PDC domain join failure
Created attachment 7305 [details] Client side Windows XP domain join failure on 3.6.2 PDC
(In reply to comment #16) > Hi. What is your definition of weird? ;) As I understand the error message > "Conversion error: Illegal multibyte sequence()" correct, there should be *any* > character between the brackets?! But there is nothing? > > Max Hello, I have hit across this problem with a W2008 Server. In fact I was unable to change my user password. This would also explain why I could not (re)join the domain: the password gets changed during joining, and if this fails, all fails. After digging a while (as just this one server shows this problem, all other Windows computers I tried did work fine) I have tried to create a patch / workaround for charcnv.c. I did not test very much so far, but my Vista is working again. I will attach a diff file with my code. -- Torsten.
Created attachment 7584 [details] patch for a workaround
Making this one a blocker for 3.6.6 as several people hit this issue.
Created attachment 7585 [details] Test patch to understand the issue. Can you test the attached patch please ? Note that you'll have to apply this *AFTER* the autogenerated pidl step, as it modifies an autogenerated file. This will not be a final patch I'm just trying to understand the problem, and get more data on what is wrong. Please report back to me asap as this is a blocker for the next 3.6.x. Thanks ! Jeremy.
It looks like a bug in parsing the NBT netlogon packet, inside the function: ndr_pull_nbt_netlogon_packet(). I looked closely, and I found an interesting thing. These functions are auto-generated in 3.6.x via pidl, but have been removed from auto-generation in master with a note that: /* These responses are all handled manually, as they cannot be encoded in IDL fully See push_nbt_netlogon_response() */ Which was commit b782b5ed by Andrew Bartlett.. Curiouser and curiouser :-). This fix wasn't back-ported into v3-6-test btw. (Actually this fix was a simple comment addition, the actual fix was 2f5a1d2b1cfdbfc3d4c7c1e96d1ed061e7970f88, Manually handle the NETLOGON_SAM_LOGON_REQUEST too. With the sid structure being both optional and aligned, it was too hard to do this in just IDL. also not back-ported into 3.6.x. Looking at bug #8373 *really* closely what it looks like is that when parsing nbt_netlogon_query_for_pdc, whose idl looks like this: /* query for pdc request */ typedef struct { astring computer_name; astring mailslot_name; [flag(NDR_ALIGN2)] DATA_BLOB _pad; nstring unicode_name; netlogon_nt_version_flags nt_version; uint16 lmnt_token; uint16 lm20_token; } nbt_netlogon_query_for_pdc; we mess up on parsing out the : nstring unicode_name; field, as in the associated capture file from the bug it shows that the mailslot_name ends on an odd boundary and is then aligned with a zero byte (to match the [flag(NDR_ALIGN2)] DATA_BLOB _pad; field). We get a iconv error which I believe is due to the offset being stuck at the padding zero. So, why isn't the : [flag(NDR_ALIGN2)] DATA_BLOB _pad; having the desired effect ? The chain of IDL looks like: nbt_netlogon_query_for_pdc is enclosed by : typedef [nodiscriminant] union { [case(LOGON_REQUEST)] NETLOGON_LOGON_REQUEST logon0; [case(LOGON_SAM_LOGON_REQUEST)] NETLOGON_SAM_LOGON_REQUEST logon; [case(LOGON_PRIMARY_QUERY)] nbt_netlogon_query_for_pdc pdc; [case(NETLOGON_ANNOUNCE_UAS)] NETLOGON_DB_CHANGE uas; } nbt_netlogon_request; which itself is enclosed by : typedef [flag(NDR_NOALIGN),public] struct { netlogon_command command; [switch_is(command)] nbt_netlogon_request req; } nbt_netlogon_packet; Note the flag(NDR_NOALIGN) assignment to the nbt_netlogon_packet struct. It turns out that setting flag(NDR_NOALIGN) on a structure affects *all* enclosed sub-marshalling/unmarshalling calls when called from code marshalling/unmarshalling this struct. Looking carefully into our NBT functions I found code to hand-marshall similar structures such as : ndr_push_NETLOGON_SAM_LOGON_REQUEST() where we have : uint32_t _flags_save_DATA_BLOB = ndr->flags; ndr->flags &= ~LIBNDR_FLAG_NOALIGN; ndr_set_flags(&ndr->flags, LIBNDR_FLAG_ALIGN4); NDR_CHECK(ndr_push_DATA_BLOB(ndr, NDR_SCALARS, r->_pad)); ndr->flags = _flags_save_DATA_BLOB; We're hand-unsetting LIBNDR_FLAG_NOALIGN here, as it turns out that ndr_set_flags() only ever OR's given flags into the ndr->flags field (with some complex rules). So why do we have to do this ? Right now it turns out that when we set flag(NDR_NOALIGN) in the definition of the nbt_netlogon_packet struct, this is recursive and means that we don't align all the way down when marshalling or unmarshalling - which is what we want. Until, that is, we hit the flag(NDR_ALIGN2) on the DATA_BLOB _pad definition. The NDR_ALIGN2 bit is set in the generated code via ndr_set_flags(), but as the LIBNDR_FLAG_NOALIGN is already set from the calling code, and setting this bit does not reset the LIBNDR_FLAG_NOALIGN it means it is completely ignored when evaluating the alignment. Thus the _pad blob alignment generation has no effect, and we end up being stuck on the offset of the padding zero. Sorry for this being so long, but the upshot of all this is I think that the flags LIBNDR_FLAG_ALIGN2|LIBNDR_FLAG_ALIGN4|LIBNDR_FLAG_ALIGN8 which are collectively defined as LIBNDR_ALIGN_FLAGS, should be mutually exclusive with LIBNDR_FLAG_NOALIGN, in that when you set LIBNDR_FLAG_NOALIGN, the bits of LIBNDR_ALIGN_FLAGS should be removed, and when you set any of the LIBNDR_ALIGN_FLAGS bits, LIBNDR_FLAG_NOALIGN should be removed. If we do this correctly I think it then allows the nbt_netlogon_query_for_pdc struct to be correctly marshalled/unmarshalled by the gen_ndr generated code. I'm wondering if it also may remove the need for some of the hand-generation of this code that got put into master ? I will attach the *VERY PRELIMINARY* patch for evaluation, not that I think it's a valid one - yet !
Created attachment 7588 [details] WARNING ! Experimental patch - also to investigate the problem..
Created attachment 7589 [details] WARNING ! Experimental code ! - More elegant experimental patch.
(In reply to comment #28) > Created attachment 7589 [details] > WARNING ! Experimental code ! - More elegant experimental patch. See comment # 10, https://attachments.samba.org/attachment.cgi?id=7118 doesn't seem to fix it. (maybe the tester didn't regenerate the pidl output...) The only difference to your patch is that, it your patch resets LIBNDR_FLAG_NOALIGN when LIBNDR_FLAG_REMAINING is set. if (new_flags & LIBNDR_FLAG_REMAINING) { (*pflags) &= ~LIBNDR_ALIGN_FLAGS; } I don't know which one is the better patch (maybe yours), but the important thing is that we may need to review a lot of code.
(In reply to comment #28) > Created attachment 7589 [details] > WARNING ! Experimental code ! - More elegant experimental patch. Hello, I have tested the patch, and it seems to fix the problem. I have also tested the patch for comment #25, and this worked, too. -- Torsten
(In reply to comment #30) > (In reply to comment #28) > > Created attachment 7589 [details] [details] > > WARNING ! Experimental code ! - More elegant experimental patch. > > Hello, I have tested the patch, and it seems to fix the problem. > > I have also tested the patch for comment #25, and this worked, too. Good, could you also check if https://attachments.samba.org/attachment.cgi?id=7117 and/or https://attachments.samba.org/attachment.cgi?id=7118 also fix this (which would mean they were not tested correctly). Thanks!
(In reply to comment #31) > Good, could you also check if > https://attachments.samba.org/attachment.cgi?id=7117 > and/or > https://attachments.samba.org/attachment.cgi?id=7118 > also fix this (which would mean they were not tested correctly). > > Thanks! Hi again, I was trying attachment 7117 [details], but could not apply this (actually I am using 3.6.5 relase source). The other one 7118 did help for the problem. -- Torsten.
Im not sure if this helps at all but I run into this problem to and found out if the netbios name exceeds 8 chars you are unable to join the domain Btw: Without any applied patch Currently im only using precompiled debian testing Amd64 packages 3.6.5
(In reply to comment #28) > Created attachment 7589 [details] > WARNING ! Experimental code ! - More elegant experimental patch. Hi, This patch is working on our installation and fixes the problem, both for Windows XP & 7 joining the domain. Hope it will be sloted in next production release. Thanks ! Guillaume
(In reply to comment #32) > (In reply to comment #31) > > > Good, could you also check if > > https://attachments.samba.org/attachment.cgi?id=7117 > > and/or > > https://attachments.samba.org/attachment.cgi?id=7118 > > also fix this (which would mean they were not tested correctly). > > > > Thanks! > > Hi again, > > I was trying attachment 7117 [details], but could not apply this (actually I am using > 3.6.5 relase source). > The other one 7118 did help for the problem. Ok, thanks for testing!
Created attachment 7596 [details] Experimental patch for master and 3.6 I've discussed the problem with Günther. It seems the solution is to make all alignment related flags mutual exclusive (also the NDR_REMAINING flag). This needs QA testing to make sure that it doesn't break any unrelated code pathes.
Comment on attachment 7596 [details] Experimental patch for master and 3.6 Also applies to v3-6-test
I've been doing a lot of investigation of this (I spent the entire day yesterday on it) and I can't see any breakage. I'm going to modify your change to add my comment and then push to master and re-upload here for 3.6.next. (I think my comment is needed - look how long it took to track this down - I don't want to have to do that again :-). Jeremy.
Created attachment 7599 [details] Fix for 3.6.next Patch that went into master. Applies cleanly to 3.6.x.
Comment on attachment 7599 [details] Fix for 3.6.next Andrew, can you run wintest with the current master?
Comment on attachment 7599 [details] Fix for 3.6.next The wintest on master run was successful for test-s3.py and ran as usual (failure in DNS update) for test-s4-howto.py
Pushed to v3-6-test. Closing out bug report. Thanks!