Bug 5551 - winbind on PDC fails, even with patch for bug 5489
Summary: winbind on PDC fails, even with patch for bug 5489
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: winbind (show other bugs)
Version: 3.0.30
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL: http://leo.kloburg.at/tmp/winbind/
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-20 05:32 UTC by Alexander 'Leo' Bergolth
Modified: 2008-06-26 11:51 UTC (History)
0 users

See Also:


Attachments
Patch (3.95 KB, patch)
2008-06-20 19:19 UTC, Jeremy Allison
no flags Details
Replacement patch (4.61 KB, patch)
2008-06-23 18:06 UTC, Jeremy Allison
no flags Details
Patch - different approach to solving the problem. (4.17 KB, patch)
2008-06-24 13:06 UTC, Jeremy Allison
no flags Details
Patch for smbldap-tools-0.9.5 (2.19 KB, patch)
2008-06-25 13:39 UTC, Jeremy Allison
no flags Details
Final ? patch (2.84 KB, patch)
2008-06-25 14:42 UTC, Jeremy Allison
no flags Details
Additional patch (2.46 KB, patch)
2008-06-25 17:10 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander 'Leo' Bergolth 2008-06-20 05:32:44 UTC
On my PDC, wbinfo -g fails, even after having applied the patches
http://git.samba.org/?p=samba.git;a=commitdiff;h=0e407e77ad5af4bb78b1e6da247398f2677618ee
and
http://git.samba.org/?p=samba.git;a=commitdiff;h=92d067497c60c5eab43bae11e6c1236e6cf760bb

wbinfo -u, wbinfo -t and wbinfo -D succeeds, but wbinfo -g still fails with
# wbinfo -g
Error looking up domain groups

However, wbinfo -g on a domain member works without problems.


On startup with debuglevel 1, winbind logs the following errors:
==> /var/log/samba/log.winbindd <==
[2008/06/20 10:26:11, 1] nsswitch/winbindd.c:main(1001)
  winbindd version 3.0.30-2leo.fc5 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2008
[2008/06/20 10:26:11, 0] nsswitch/winbindd_cache.c:initialize_winbindd_cache(2229)
  initialize_winbindd_cache: clearing cache and re-creating with version number 1
[2008/06/20 10:26:24, 0] libsmb/clientgen.c:cli_receive_smb(111)
  Receiving SMB: Server stopped responding
[2008/06/20 10:26:37, 0] libsmb/clientgen.c:cli_receive_smb(111)
  Receiving SMB: Server stopped responding
[2008/06/20 10:26:49, 0] libsmb/clientgen.c:cli_receive_smb(111)
  Receiving SMB: Server stopped responding
[2008/06/20 10:31:49, 1] rpc_client/cli_pipe.c:cli_pipe_validate_current_pdu(625)
  cli_pipe_validate_current_pdu: RPC fault code DCERPC_FAULT_OP_RNG_ERROR received from remote machine SAMBA pipe \lsarpc fnum 0x76a1!

==> /var/log/samba/log.winbindd-idmap <==
[2008/06/20 00:31:48, 0] nsswitch/winbindd_dual.c:async_request_timeout_handler(181)
  async_request_timeout_handler: child pid 2319 is not responding. Closing connection to it.
[2008/06/20 00:31:48, 1] nsswitch/winbindd_util.c:trustdom_recv(236)
  Could not receive trustdoms
[2008/06/20 00:32:24, 1] nsswitch/idmap.c:idmap_init(377)
  Initializing idmap domains
[2008/06/20 00:37:24, 0] nsswitch/winbindd_dual.c:async_request_timeout_handler(181)
  async_request_timeout_handler: child pid 2392 is not responding. Closing connection to it.
[2008/06/20 00:37:24, 1] nsswitch/winbindd_util.c:trustdom_recv(236)
  Could not receive trustdoms
[2008/06/20 10:26:14, 1] nsswitch/idmap.c:idmap_init(377)
  Initializing idmap domains
[2008/06/20 10:31:12, 0] nsswitch/winbindd_dual.c:async_request_timeout_handler(181)
  async_request_timeout_handler: child pid 7188 is not responding. Closing connection to it.
[2008/06/20 10:31:12, 1] nsswitch/winbindd_util.c:trustdom_recv(236)
  Could not receive trustdoms


The output of winbind startup and wbinfo -g at loglevel 10 and smb.conf are available at
  http://leo.kloburg.at/tmp/winbind/

Additional info:
# wbinfo --own-domain
RK_KLBG

# wbinfo -t
checking the trust secret via RPC calls succeeded

# wbinfo -D RK_KLBG
Name              : RK_KLBG
Alt_Name          :
SID               : S-1-5-21-181998944-1107627502-2274996074
Active Directory  : No
Native            : No
Primary           : Yes
Sequence          : -1

Any hints?
--leo
Comment 1 Jeremy Allison 2008-06-20 13:13:27 UTC
Ok, you're using an idmap backend with LDAP. I also need to see the debug level 10 logs from smbd at the same time as the wbinfo call hangs. I'm trying to see if there's a recursion issue with smbd calling back into winbindd. It doesn't look like it, as the winbindd -> smbd call from the parent winbindd is hanging on authentication, and the call from the RK_KLBG winbindd child to another smbd works perfecty.

I need to see the smbd logs split out per connecting process please.

Jeremy.
Comment 2 Alexander 'Leo' Bergolth 2008-06-20 16:42:50 UTC
Thanks for your quick reply!

Do you need the smbd logd for winbindd startup or when doing a "wbinfo -g"?
(These authentication error messages appear on startup, before the "wbinfo -g" call is done.)

I've added a log of the smbd processes during a start of winbind ("winbindd -d10 -n -F -Y") with "wbinfo -g" called after the startup-messages had settled down:

  http://leo.kloburg.at/tmp/winbind/2/

If that is not informative enough, please give me a hint on how I can split the smbd-logs "per connecting process". I've only splitted smbd output per process-id now. Sorry about my ignorance but I don't know the smbd internals very well...

P.S.: It is the first time I am trying to use winbindd on a PDC. (It has never worked before on the PDC.)

Thanks,
--leo
Comment 3 Jeremy Allison 2008-06-20 19:19:22 UTC
Created attachment 3355 [details]
Patch

Can you try this patch please ? Jerry and I think this should fix the problem.
If you can confirm asap we'll get this into 3.0.31.

Thanks !

Jeremy.
Comment 4 Alexander 'Leo' Bergolth 2008-06-21 03:52:30 UTC
(In reply to comment #3)
> Can you try this patch please ? Jerry and I think this should fix the problem.

Thanks!
Unfortunately it didn't fix the problem. :(
As before, wbinfo -u works but -g doesn't:
Error looking up domain groups

I've added another set of log files at
  http://leo.kloburg.at/tmp/winbind/3/

... and included a tarball (logs.tar.gz) for easier downloading and grepping.

Thanks,
--leo
Comment 5 Alexander 'Leo' Bergolth 2008-06-21 11:43:30 UTC
Some additional info, maybe it helps:
After winbindd startup, wbinfo commands like "-u" or "-t" block until about 35 seconds after winbindd's startup.

After that, wbinfo -u and -t work (-g still doesn't) but I noticed that after winbindd ran for some time (some hours) wbinfo -t stopped working:
# wbinfo -t
checking the trust secret via RPC calls failed
error code was  (0x0)
Could not check secret

Cheers,
--leo
Comment 6 Jeremy Allison 2008-06-23 18:05:39 UTC
There was a problem with the patch. It wasn't correctly allowing the machine account to log in. So winbindd is falling back to attempting to log in with the smbadmin user. The smbadmin user is in several supplementary groups and so tries to look up the gid 500 (one of the groups that smbadmin is in). See here in the logs :

2008/06/21 10:43:19, 3] smbd/sec_ctx.c:pop_sec_ctx(356)
  pop_sec_ctx (0, 0) - sec_ctx_stack_ndx = 1
[2008/06/21 10:43:54, 5] passdb/lookup_sid.c:gid_to_sid(1406)
  gid_to_sid: winbind failed to find a sid for gid 500

Note the 35 second delay. That's where it's trying to call back into winbindd.
I'm going to post an updated version of the patch. Can you try replace the other patch with this one and let me know ?

Jeremy.
Comment 7 Jeremy Allison 2008-06-23 18:06:09 UTC
Created attachment 3363 [details]
Replacement patch
Comment 8 Jeremy Allison 2008-06-23 19:41:24 UTC
I have a further idea on how to address this generically. I need to check with Volker and Jerry and get back to you. Hopefully tomorrow.
Jeremy.
Comment 9 Alexander 'Leo' Bergolth 2008-06-24 04:04:55 UTC
(In reply to comment #6)
> Note the 35 second delay. That's where it's trying to call back into winbindd.
> I'm going to post an updated version of the patch. Can you try replace the
> other patch with this one and let me know ?

Sorry, it still doesn't work.
The delay is still there and wbinfo -g still fails.

I'll try to produce a new set of log-files in the evening.

Cheers,
--leo
Comment 10 Jeremy Allison 2008-06-24 13:06:43 UTC
Created attachment 3367 [details]
Patch - different approach to solving the problem.

Ok, this patch should prevent smbd from contacting winbindd *at all* if it's a successful machine account login from our local machine account.
Please test this one and give me logs back if it fails. I don't have the LDAP set up you have so I'm depending on you to give this a good workout, thanks !
Jeremy.
Comment 11 Alexander 'Leo' Bergolth 2008-06-24 16:26:18 UTC
(In reply to comment #10)
> Created an attachment (id=3367) [edit]
> Patch - different approach to solving the problem.
> 
> Ok, this patch should prevent smbd from contacting winbindd *at all* if it's a
> successful machine account login from our local machine account.

Thanks! Looks better now - the delay is gone. wbinfo -u or -t work immediately after winbindd startup now.

However, wbinfo -g still produces
Error looking up domain groups

... but maybe that's a different problem now?

The new logs are at
  http://leo.kloburg.at/tmp/winbind/4/

(winbindd startup was at 23:16:19, wbinfo -g was executed at 23:16:42.)

Thanks for your efforts!
--leo
Comment 12 Jeremy Allison 2008-06-24 18:13:56 UTC
This is definately a different problem from winbindd hanging contacting smbd now.

In your setup winbindd is trying to contact the LDAP server to enumerate "local groups" for the BUILTIN domain. The BUILTIN domain has a sid-prefix of S-1-5-32.

In the log file log.winbindd-22708 I can see this by the following entry:

  smbldap_search_ext: base => [ou=Groups,dc=rk-klbg,dc=at], filter => [(&(objectclass=sambaGroupMapping)(sambaGroupType=4)(sambaSID=S-1-5-32*))], scope => [2]

My guess is your LDAP server is not returning any BUILTIN groups here. There should be at least two (there are in the tdb sam case). These are :

BUILTIN\administrators -> S-1-5-32-544
BUILTIN\users -> S-1-5-32-545

as you're using an LDAP store you'll have to create these by hand I think. winbindd is failing the call as it finds zero results from the above query.

If you add these the call should complete. I think this patch is now good, and will commit to the 3.0.x and 3.2 trees to fix this bug, but I'll keep the bug open to ensure it works once you've created these entries.

Jeremy.

Comment 13 Alexander 'Leo' Bergolth 2008-06-25 05:51:55 UTC
(In reply to comment #12)
>   smbldap_search_ext: base => [ou=Groups,dc=rk-klbg,dc=at], filter =>
> [(&(objectclass=sambaGroupMapping)(sambaGroupType=4)(sambaSID=S-1-5-32*))],
>
> My guess is your LDAP server is not returning any BUILTIN groups here. There
> should be at least two (there are in the tdb sam case). These are :
> 
> BUILTIN\administrators -> S-1-5-32-544
> BUILTIN\users -> S-1-5-32-545

You are right, the ldap search above doesn't return any results.
That's because the two builtin groups do exist in LDAP but they have a different 
sambaGroupType of 5. (SID_NAME_WKN_GRP = 5, /* well-known group */) smbldap-populate (part of smbldap-tools) creates them that way.

Do you have any pointers on how to correctly create those builtin groups?
Should they all have a sambaGroupType of 4 instead of 5? Should they have a mapping to a local unix group? Are there any templates for creating those groups? Sorry for bothering you with that stuff but the only advice on creating an initial LDAP setup that I found in the Samba-Guide was to use smbldap-populate...

And why does wbinfo -g work on a domain-member? (Most of the time, see below..)

Another interesting problem:
I've just tried "wbinfo -g" on a Domain Member (running 3.0.30):
# wbinfo -g
BUILTIN\administrators
BUILTIN\users
[ called several times, returns only the two builtin groups]

# wbinfo -u
[returns all domain users]

# wbinfo -g
[now returns all domain groups too]

Maybe this is a reconnect-issue that is only handled properly when doing user enumeration?

Unfortunately I only have logs of debug-level 1:
------------------------------------------------
[2008/06/25 08:57:29, 0] libsmb/clientgen.c:cli_receive_smb(111)
  Receiving SMB: Server stopped responding
[2008/06/25 08:57:29, 0] rpc_client/cli_pipe.c:rpc_api_pipe(790)
  rpc_api_pipe: Remote machine SAMBA pipe \samr fnum 0x741areturned critical error. Error was Call returned zero bytes (EOF)
[2008/06/25 08:58:28, 1] libsmb/clientgen.c:cli_rpc_pipe_close(386)
  cli_rpc_pipe_close: cli_close failed on pipe \samr, fnum 0x741a to machine SAMBA.  Error was Call returned zero bytes (EOF)
[2008/06/25 08:58:28, 1] libsmb/clientgen.c:cli_rpc_pipe_close(386)
  cli_rpc_pipe_close: cli_close failed on pipe \NETLOGON, fnum 0x7419 to machine SAMBA.  Error was Call returned zero bytes (EOF)
------------------------------------------------

Many thanks for your help!
Cheers,
--leo
Comment 14 Alexander 'Leo' Bergolth 2008-06-25 07:50:01 UTC
Some additional notes:

(In reply to comment #13)
> You are right, the ldap search above doesn't return any results.
> That's because the two builtin groups do exist in LDAP but they have a
> different 
> sambaGroupType of 5. (SID_NAME_WKN_GRP = 5, /* well-known group */)
> smbldap-populate (part of smbldap-tools) creates them that way.

After changing the sambaGroupType of all S-1-5-32-* groups to 4, wbinfo -g returns only the BUILTIN\* groups. :(
On a domain member, I can see the domain groups as well. (They all have sambaGroupType: 2.)

The logs are at
  http://leo.kloburg.at/tmp/winbind/5/

Thanks,
--leo
Comment 15 Jeremy Allison 2008-06-25 12:27:41 UTC
The mis-numbered group type is a bug in smbldap-tools, they should be created as local groups (alias) - type 5, not type 5.

I'm still looking into the other problem.

Jeremy.
Comment 16 Jeremy Allison 2008-06-25 13:39:23 UTC
Arg - typo - meant "as type 4, not type 5".
Jeremy.
Comment 17 Jeremy Allison 2008-06-25 13:39:54 UTC
Created attachment 3369 [details]
Patch for smbldap-tools-0.9.5
Comment 18 Jeremy Allison 2008-06-25 14:42:41 UTC
Created attachment 3370 [details]
Final ? patch

This patch should fix the problem. winbindd wasn't searching it's own domain group map, I've added that now. Please test asap as I'd like to get this into Samba 3.2.0 due out July 1st.
Jeremy.
Comment 19 Alexander 'Leo' Bergolth 2008-06-25 16:48:22 UTC
Looks much better now. But still not perfect... ;-)

The domain groups are shown now but unfortunately each domain group is additionally also shown as BUILTIN now:

# wbinfo -g
BUILTIN\domain admins
BUILTIN\domain users
BUILTIN\domain computers
BUILTIN\rk
BUILTIN\zivi
[...]
BUILTIN\administrators
BUILTIN\users
[...]
domain admins
domain users
domain computers
rk
zivi
[...]

Cheers,
--leo
Comment 20 Jeremy Allison 2008-06-25 16:56:47 UTC
Yes, I figured that might be the case on my drive home. I have an idea on a quick fix...
Jeremy.
Comment 21 Jeremy Allison 2008-06-25 17:10:08 UTC
Created attachment 3371 [details]
Additional patch

This should do it. Apply this on top of the "final ? patch" and it should be good to go. I forgot to split out the functions for enumerating groups for BUILTIN and DOMAIN into separate functions so the damn thing was being called twice :-).
Thanks for your help.
Jeremy.
Comment 22 Alexander 'Leo' Bergolth 2008-06-25 17:39:28 UTC
ACK! Now everything looks fine! Thank you very much!

Do you have any hint what might have caused that strange behavior on my domaim member, described in comment #13 after "Another interesting problem":
https://bugzilla.samba.org/show_bug.cgi?id=5551#c13

(I couldn't reproduce this behavior since then.)

--leo
Comment 23 Jeremy Allison 2008-06-25 17:57:30 UTC
Looks like it could have been a transient failure in talking to the LDAP server, but without better logs it's impossible to be sure, sorry.
Thanks for so rapidly testing what ended up being a long series of patches, sorry for the trouble. These fixes are now in git and will be in 3.0.31 and also 3.2.0.
Cheers,
Jeremy.
Comment 24 Jeremy Allison 2008-06-26 11:51:47 UTC
Fixed with latest patch.
Jeremy.