The Samba-Bugzilla – Bug 4820
User Manager: Unable to rechange the domain
Last modified: 2008-10-24 14:48:06 UTC
When I am trying to change the domain in User Manager several times, I get "a device attached to your system is not functioning". When I try to start SAMBA in debug level 5, I see some SMB requests before displaying the message on the client.
In SAMBA 3 the bug isn't reproducable after my test, so there the domain change is working right.
I'm going to have to close this as a duplicate of the stacktrace bug, because without more info, I think it's just the same thing. If it still ocours, I'll need a network trace (pcap from wireshark).
*** This bug has been marked as a duplicate of 4821 ***
No, the bug is still reproducible. I'll see to produce a wireshark log.
Created attachment 2846 [details]
The capture of the network traffic
Here is the wireshark pcap file. 192.168.1.9 is the address of the win2k client (vmware-win2k2), 192.168.1.10 of the samba server (vmware-samba4).
Maybe also interesting: I tested lastly the NT Server Manager for Domains that seems to run fine. There are missing RPC's in SAMBA 4 but this isn't such tragic. But there the change domain dialog works without problems!
Your latest work on it seem to break the whole usermanager. I wasn't even able to see the userlist!
Works fine for me...
Previous it worked (better):
- When I start the User Manager for Domains when I'm logged in as the domain administrator, I get a message box: "Wrong parameter. Do you want to select another domain do administer?"
- Then, when I click on "Yes" I see my domain in the list, but when I want to confirm, I get "No true selection"
Created attachment 2898 [details]
When running SAMBA in valgrind I noticed some messages in the logfile.
Interesting: The same valgrind error I got also when clicking on "Replication" in the properties of the SAMBA machine in the "Server Manager for Domains".
Comment on attachment 2898 [details]
Ok, it was quite a mess! I had in my SVN work directory patches against the rgistry backend for the regtree bug. I reverted the changes and am now able to see the userlist again like you, Andrew! But the above descripted bug, when you several try to change the domain with the dialog, *remains*!
So, now after long testing (I think) I found the real issue. It is caused by the samr pipe, that sometimes doesn't seem to close in the right way. F. e. :
- You change for the first time the domain with the "Select Domain" command
- If you subsequently try to change it several times, you get the error message "unconnected device"
- But when you double click a user object, then click "OK", the samr pipe is handled correctly - you can change the domain and nothing happens anymore
Could you please one time also have a look to this bug? I've now described in the above text the cause.
But *why* is the user manager often unable to close the samr pipe and then tries to reconnect, and that fails? The best would it be to compare the results with a Windows Server, I think!
I've reproduced this, but I can't figure out what is going wrong.
In particular, the error occurs without an immediately proceeding network packet - the 'change domain' list is created, apparently successfully, and the client fails. This makes it much hard to track down - perhaps we get some expected value wrong?
I know, the bug is very tricky. I spent some hours to find a way to fix it and went over many SAMBA 4 source files to analyze the cause. It is caused by the samr pipe. I'm very assured to that. Maybe in the idl definition there could be a small error. I can only say, I tested it with SAMBA 3 and there it *works* like it should. Maybe we could do some kind of code compare?
Samba3 and Samba4 are very different in this area, but comparing outputs might help. Let me know if you find anything more!
My thesis is now that the problem could be also caused by the call dcesrv_samr_QueryDisplayInfo. Maybe it doesn't return the exact expected values and then the User Manager doesn't want to close the samr pipe. What do you think?
The problem has gone even more worse now! I'm now not at least able to switch to my domain.
Error message "Element not found!".
The problem "Element not found!" is caused by a mistake in the registry. The key "SYSTEM" is there written as "System" rather then with all letters in upcase. Since LDB seems to be case sensitive, the ldb_search routine couldn't determine the right key and the operations through the WINREG pipe failed.
Created attachment 3085 [details]
A corrected provision.reg file for the provisioning should be enough to solve this issue and similar ones.
But to me it seems, that the windows registry is case insensitive but LDB doesn't seem.
Please note, the rechange domain problem persists!
Any comment on the registry part of this bug. Have we had a regression regarding case sensitivity in the registry?
Yes, this is probably a bug in the LDB registry backend. I doubt it's a regression though, it's always been like this :-)
Not sure what the easiest way is to fix this. Is it possible to do case-insensitive searches in LDB?
I think, I've found now the right reason why this fails:
When opening a pipe, we don't set the "OpenNoRecall" flag in comparison with SAMBA 3.
I investigated the case a bit more and discovered, that some parameters aren't set in the response when a NTCreateX request is handled by SAMBA 4. SAMBA 3 does it right.
- "Create options" ("OpenNoRecall" in my case was not set if the client requested it - the 0x400000 bit in libcli/raw/smb.h under NTCREATEX_OPTIONS should be "NTCREATEX_OPTIONS_OPEN_NO_RECALL" rather then "UNKNOWN" - this says Wireshark)
- "Create action" (SAMBA 3 told me in my example 1 - "File existed and was opened")
- Response "File attributes" (SAMBA 3 told me in my example 0x80 - "Normal file")
Andrew, have you looked into this? Or who could be the right person for this pipe problem?
My only problem is the lack of time to work on the tests to prove the problem.
Good that you started now with the NTCREATEX_OPTIONS_OPEN_NO_RECALL! Hopefully the correct handling of this one fixes the issue (I think it has to do something with locking).
The NT Create & X create options are not even referenced in the IPC case, so my changes will have no effect here.
Metze's latest commits finally fixed the problem. Well done!
Sadly, I've to reopen the bug, because the same problem reappeared in newer GIT releases.
Now I know the real reason of the problem - the WINREG server. My fixes are now in the main GIT repo, so the problem is now *really* fixed.