Bug 5553 - wbinfo -t fails after upgrade from 3.0.24 to 3.0.30
Summary: wbinfo -t fails after upgrade from 3.0.24 to 3.0.30
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: winbind (show other bugs)
Version: 3.0.30
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-20 08:54 UTC by mark.cave-ayland (dead mail address)
Modified: 2008-06-26 11:54 UTC (History)
1 user (show)

See Also:


Attachments
smb.conf file for the server (2.37 KB, text/plain)
2008-06-20 09:05 UTC, mark.cave-ayland (dead mail address)
no flags Details
Log file from 3.0.24 (pre-upgrade, working) for wbinfo -t (11.65 KB, text/plain)
2008-06-20 09:13 UTC, mark.cave-ayland (dead mail address)
no flags Details
Log file from 3.0.30 (post-upgrade, fails) for wbinfo -t (14.14 KB, text/plain)
2008-06-20 09:19 UTC, mark.cave-ayland (dead mail address)
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mark.cave-ayland (dead mail address) 2008-06-20 08:54:11 UTC
Hi there,

I've just upgraded a server from 3.0.24 to 3.0.30 in order to try and resolve some outstanding idmap issues we are having, and it after the upgrade, winbind ceases to function correctly.

Running 3.0.30, "wbinfo -p" works fine, however "wbinfo -t" fails with the following message:

uk01:/var/lib/samba# wbinfo -t
checking the trust secret via RPC calls failed
error code was NT_STATUS_INVALID_HANDLE (0xc0000008)
Could not check secret

Interestingly enough, if I downgrade back to 3.0.24 then "wbinfo -t" succeeds once again, but we still experience problems with intermittent idmap lookup failures which is the reason for attempting the upgrade.

The server in question is a member of an Active Directory domain, and I can use "net ads" to enumerate users & groups which implies the basic trust is using. So I think that the issue lies somewhere within winbind.

An obfuscated smb.conf, and debug level 10 log files for 3.0.24 & 3.0.30 are included with this bug report.


Many thanks,

Mark.
Comment 1 mark.cave-ayland (dead mail address) 2008-06-20 09:05:22 UTC
Created attachment 3351 [details]
smb.conf file for the server
Comment 2 mark.cave-ayland (dead mail address) 2008-06-20 09:13:12 UTC
Created attachment 3352 [details]
Log file from 3.0.24 (pre-upgrade, working) for wbinfo -t
Comment 3 mark.cave-ayland (dead mail address) 2008-06-20 09:19:41 UTC
Created attachment 3353 [details]
Log file from 3.0.30 (post-upgrade, fails) for wbinfo -t
Comment 4 Jeremy Allison 2008-06-20 12:36:34 UTC
This is a bug I've fixed for 3.0.31. Patch is available in the linked-to bug report. Sorry for the problem, if you can't apply the patch please stay with 3.0.24 until we release 3.0.31 (soon).
Jeremy.


*** This bug has been marked as a duplicate of 5504 ***
Comment 5 mark.cave-ayland (dead mail address) 2008-06-23 07:02:58 UTC
Hi Jeremy,

Thank you for the prompt response with regard to the bug report. I have since applied the patch to 3.0.30 from bug #5504, but it doesn't resolve the issue, i.e. I still receive the NT_STATUS_INVALID_HANDLE error from "wbinfo -t", and I am unable to resolve users and groups after the upgrade.

Is there anything else I can do to provide more debugging information?


Many thanks,

Mark.
Comment 6 mark.cave-ayland (dead mail address) 2008-06-24 11:17:25 UTC
Hi Jeremy,

I've spent a while stepping through both 3.0.24 and 3.0.30 using gdb and I think the issue is something to do with the fact that find_our_domain() thinks the workgroup to which the server belongs is local, when in fact it is not. I see the following trace from 3.0.30 using gdb:


find_our_domain () at nsswitch/winbindd_util.c:629
629                     if (domain->primary)
(gdb) print domain->internal
$1 = 1
(gdb) print domain->primary
$2 = 0
(gdb) print domain->name
$3 = "BUILTIN", '\0' <repeats 248 times>
(gdb) step
628             for (domain = domain_list(); domain != NULL; domain = domain->next) {
(gdb)
629                     if (domain->primary)
(gdb) print domain->name
$4 = "EU-COMPANY", '\0' <repeats 246 times>
(gdb) print domain->internal
$5 = 1
(gdb) print domain->primary
$6 = 1


Now the Samba server is simply a member of the EU-COMPANY domain rather than being the domain controller, so shouldn't domain->internal be set to false in this case?

If I manually do a "set var domain->internal = 0" at this point in gdb and then continue, "wbinfo -t" works as intended returning the message "checking the trust secret via RPC calls succeeded" instead of failing as before. However based on this, I am not sure what the final fix for this problem should be...


Many thanks,

Mark.
Comment 7 mark.cave-ayland (dead mail address) 2008-06-25 06:03:37 UTC
Hi Jeremy,

I've added you as a CC, as any changes to this bug seem to be going to /dev/null (at least they aren't getting emailed back to me - maybe it is because I am the reporter?), and I am about to post a fix.


ATB,

Mark.
Comment 8 Jeremy Allison 2008-06-25 18:05:50 UTC
Please check out the git-repository of samba_3_0_maint for the latest winbindd source (what will be 3.0.31). I've been doing a lot of work in there since 3.0.30 and you might want to check it's still a problem in 3.0.31. If you have a change, please attach to this bug asap as we're getting ready for 3.0.31 (soon) and 3.2.0 final (July 1st).
Thanks,
Jeremy.
Comment 9 mark.cave-ayland (dead mail address) 2008-06-26 04:56:04 UTC
Hi Jeremy,

After a lot more hacking around with gdb, I have since discovered that my patch is no longer required. Originally I had some extra code in "is_internal_domain" to work around this, but as I started working through the code to justify the patch, I realised that something still wasn't quite right.

To cut a long story short, I eventually traced the problem to secrets.tdb where I found that the EU-COMPANY SID and the UK01 SID were both exactly the same! And it was this in combination with the new logic based on the recent fixes for winbind running on a PDC which was causing winbind to fail :(

I know that this server has been upgraded from an ancient version of Debian/Samba to Debian etch so perhaps this is where the problem lies? So in the end, I blew away secrets.tdb, let Samba regenerate new secrets, re-joined the domain and winbind started to work once again :)

BTW I still have 1 remaining issue where "wbinfo -u" will hang the winbind process if one of its trusted domains are unreachable - I have to kill -9 and then restart the main winbind process. Should I re-file this as a separate bug?


Many thanks,

Mark.
Comment 10 Jeremy Allison 2008-06-26 11:54:47 UTC
Yes, please log a separate bug for this. Closing this one out.
Jeremy.