Bug 6387 - idmap_ldap_unixids_to_sids() makes winbind panic when multiple mappings are found
Summary: idmap_ldap_unixids_to_sids() makes winbind panic when multiple mappings are f...
Status: ASSIGNED
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.2.11
Hardware: Other Linux
: P3 normal
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-22 04:49 UTC by Michael Adam
Modified: 2009-07-22 02:56 UTC (History)
4 users (show)

See Also:


Attachments
fix for the crashbug for v3-3-test and v3-2-test (1.70 KB, patch)
2009-05-22 05:41 UTC, Michael Adam
no flags Details
add Debug output for duplicate mappings (2.77 KB, patch)
2009-05-25 17:16 UTC, Michael Adam
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Adam 2009-05-22 04:49:20 UTC
when using idmap_ldap, and the ldap-query for sambaIdmapEntry objects produces more than one result in idmap_ldap_unixids_to_sids(), then a panic is provoked
by a failed assert in ldap_next_entry().

This is due to the fact that the "LDAPmessage *entry = NULL;" is inside the for loop over the returned entries in the search result. The easy fix is to just move the definition and initialization of entry out of the for loop. (Fix coming next.) 

That this has problem been hit very rarely is probably due to the fact that the unixids_to_sids method is only ever called via idmap_uid_to_sid() and idmap_gid_to_sid(). So the problem only occurs when the search for a mapping for a single unix-id gives multiple matching mappings as a result. The panic fix uncovers a logic error in idmap_ldap.c: there is no sufficient logic to cope with the uniqueness of the mappings returned: It just places the last mapping returned for the given ID in the result.

The corresponding crashfix had been previously applied to the idmap_ldap_sids_to_unixids_methods() in b066668b74768d9ed547f16bf7b6ba6aea5df20a by Jerry, which has uncovered the same logic bug (failing to check for uniqueness in the mappings found).

So after fixing the crash, we need to fix the logic...

This bug applies to Samba versions 3.2 and newer.

Michael
Comment 1 Michael Adam 2009-05-22 05:41:31 UTC
Created attachment 4181 [details]
fix for the crashbug for v3-3-test and v3-2-test

Fix for the crashbug pushed to master and v3-4-test.

attached: git-formatted patch that applies to v3-3-test and v3-2-test

Michael
Comment 2 Simo Sorce 2009-05-22 17:52:49 UTC
+1 it's obviously correct
Comment 3 Michael Adam 2009-05-25 17:16:00 UTC
Created attachment 4205 [details]
add Debug output for duplicate mappings

This patch adds debug statements to idmap_ldap_unixids_to_sids() and idmap_ldap_sids_to_unixids() that log warnings for duplicate mappings at level 1.

It does not change the behaviour of the functions.
Comment 4 Michael Adam 2009-05-25 17:18:34 UTC
(In reply to comment #3)
> Created an attachment (id=4205) [details]
> add Debug output for duplicate mappings
> 
> This patch adds debug statements to idmap_ldap_unixids_to_sids() and
> idmap_ldap_sids_to_unixids() that log warnings for duplicate mappings at level
> 1.
> 
> It does not change the behaviour of the functions.

Btw: the two commits of this patch have been pushed to master and v3-4-test.
I am including them here for review for inclusion in 3.3 and 3.2

Michael

Comment 5 Michael Adam 2009-05-26 03:40:52 UTC
Now for the possible change in behaviour:

The unixids_to_sids() and sids_to_unixids() operations are always called from the wrappers in idmap.c / idmap_util.c, in idmap_backends_sids_to_unixids() for the 3.2 code and in idmap_sid_to_uid() and idmap_sid_to_gid() in 3.3 and newer code.

* The unixids_to_sids() operation is always read only.
  (No new mapping called).
  So here we could as well return an error when a duplicate mapping is found,
  instead of returning a rather arbitrary choice of the available mappings.

* The sids_to_unixids() operation tries to create mappings for sids 
  that could not be looked up (return code != ID_MAPPED). 
  In the case of duplicate sid->xid mappings, we can not simply return
  ID_UNMAPPED or another status, since this would trigger the creation of
  yet another mapping for the given sid (which would usually fail, though).
  If we would like to react to a broken LDAP tree containg duplicate
  sid->xid mappings by return error instead of returning one of the mappings
  found, we could introduce an additional status, eg ID_MULTI or ID_DUPLICATE
  that would make the callers skip the creation of a new mapping.

Any opinions on this approach?

Michael
Comment 6 Volker Lendecke 2009-05-26 06:11:42 UTC
(In reply to comment #4)
> > It does not change the behaviour of the functions.
> 
> Btw: the two commits of this patch have been pushed to master and v3-4-test.
> I am including them here for review for inclusion in 3.3 and 3.2

They look right to me, although I haven't tested them. But as I can't see how they affect sane behaviour in any way, I would like to see them in 3.2 and 3.3. An LDAP tree with duplicate mappings is broken enough to justify a debug level 1 message IMO.

Volker
Comment 7 Volker Lendecke 2009-05-26 06:14:07 UTC
(In reply to comment #5)

> * The sids_to_unixids() operation tries to create mappings for sids 
>   that could not be looked up (return code != ID_MAPPED). 
>   In the case of duplicate sid->xid mappings, we can not simply return
>   ID_UNMAPPED or another status, since this would trigger the creation of
>   yet another mapping for the given sid (which would usually fail, though).
>   If we would like to react to a broken LDAP tree containg duplicate
>   sid->xid mappings by return error instead of returning one of the mappings
>   found, we could introduce an additional status, eg ID_MULTI or ID_DUPLICATE
>   that would make the callers skip the creation of a new mapping.

Is it really worth to be so specific about that particular error? Isn't it more appropriate to return something like DB_CORRUPTION or so? We might have more conditions where we can't figure out a mapping. Is it required at this level to invent specific error codes?

Just asking, returning ID_MULTI or ID_DUPLICATE certainly makes perfect sense.

Volker
Comment 8 Michael Adam 2009-05-26 06:39:40 UTC
(In reply to comment #7)
> (In reply to comment #5)
> 
> > * The sids_to_unixids() operation tries to create mappings for sids 
> >   that could not be looked up (return code != ID_MAPPED). 
> >   In the case of duplicate sid->xid mappings, we can not simply return
> >   ID_UNMAPPED or another status, since this would trigger the creation of
> >   yet another mapping for the given sid (which would usually fail, though).
> >   If we would like to react to a broken LDAP tree containg duplicate
> >   sid->xid mappings by return error instead of returning one of the mappings
> >   found, we could introduce an additional status, eg ID_MULTI or ID_DUPLICATE
> >   that would make the callers skip the creation of a new mapping.
> 
> Is it really worth to be so specific about that particular error? Isn't it more
> appropriate to return something like DB_CORRUPTION or so? We might have more
> conditions where we can't figure out a mapping. Is it required at this level to
> invent specific error codes?
> 
> Just asking, returning ID_MULTI or ID_DUPLICATE certainly makes perfect sense.

Sure, a more generric Error like ID_DB_CORRUPTION is better.

So I am going to change this for master/3.4 first.

Michael
Comment 9 Michael Adam 2009-05-28 16:25:19 UTC
I have implemented this error handling (not pushed yet),
but we have a problem with this approach:

1. Assume sids_to_unixids() finds two mappings:
   SID --> uid1 and SID --> uid2
   this sets an error and fills the cache with
   a negative cache entry.

2. Assume later uid1 is looked up.
   Furhter assume, uid1 is mapped to only one sid.
   Then the mapping uid1 --> SID is found and
   the mapping is stored in the cache in
   both directions, i.e. SID2UID and UID2SID.

3. When the same lookup for SID is done again (as in 1),
   then the cache lookup succeeds and successfully
   returns uid1.

4. After a while, the cache entries will expire, and
   a new lookup for SID will again produce a negative
   cache entry.

So there is the possibility of inconsistent behaviour
even if we introduce the error code.

Michael
Comment 10 Simo Sorce 2009-05-28 16:38:15 UTC
(In reply to comment #9)
> I have implemented this error handling (not pushed yet),
> but we have a problem with this approach:
> 
> 1. Assume sids_to_unixids() finds two mappings:
>    SID --> uid1 and SID --> uid2
>    this sets an error and fills the cache with
>    a negative cache entry.
> 
> 2. Assume later uid1 is looked up.
>    Furhter assume, uid1 is mapped to only one sid.
>    Then the mapping uid1 --> SID is found and
>    the mapping is stored in the cache in
>    both directions, i.e. SID2UID and UID2SID.
> 
> 3. When the same lookup for SID is done again (as in 1),
>    then the cache lookup succeeds and successfully
>    returns uid1.
> 
> 4. After a while, the cache entries will expire, and
>    a new lookup for SID will again produce a negative
>    cache entry.
> 
> So there is the possibility of inconsistent behaviour
> even if we introduce the error code.

I think the only sensible solution in a case like this is to delete the second mapping and log at level 0 that we have done that.

If deleting fails (AD, read only LDAP backend) then we should scream LOUDLY in the log file at level 0 IMO.

Simo.
Comment 11 Michael Adam 2009-05-29 04:33:26 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > I have implemented this error handling (not pushed yet),
> > but we have a problem with this approach:
> > 
> > 1. Assume sids_to_unixids() finds two mappings:
> >    SID --> uid1 and SID --> uid2
> >    this sets an error and fills the cache with
> >    a negative cache entry.
> > 
> > 2. Assume later uid1 is looked up.
> >    Furhter assume, uid1 is mapped to only one sid.
> >    Then the mapping uid1 --> SID is found and
> >    the mapping is stored in the cache in
> >    both directions, i.e. SID2UID and UID2SID.
> > 
> > 3. When the same lookup for SID is done again (as in 1),
> >    then the cache lookup succeeds and successfully
> >    returns uid1.
> > 
> > 4. After a while, the cache entries will expire, and
> >    a new lookup for SID will again produce a negative
> >    cache entry.
> > 
> > So there is the possibility of inconsistent behaviour
> > even if we introduce the error code.
> 
> I think the only sensible solution in a case like this is to 
> delete the second mapping and log at level 0 that we have done that.

Hmmm, but given that the order in which the results are delivert
in the answer to the ldap search request is somwhat random - what
is the right mapping to remove when more than one is found?

> If deleting fails (AD, read only LDAP backend) then we should scream
> LOUDLY in the log file at level 0 IMO.

Are you really saying that we should do the removal in the top
level, not in the backend code itself? This would mean to introduce
even another API call to the id mapping interface - one to 
modify / delete a mapping.

(As you know, I am currently more into removing the R/W methods
from the surface altogehter, putting them back under the hood
into the backends. :-)

Cheers - Michael
Comment 12 Simo Sorce 2009-05-29 10:57:29 UTC
(In reply to comment #11)
> Hmmm, but given that the order in which the results are delivert
> in the answer to the ldap search request is somwhat random - what
> is the right mapping to remove when more than one is found?

The backend should decide, for ldap for example, it should be careful to remove something in ou=idmap if it conflicts with a real user than the opposite.

> > If deleting fails (AD, read only LDAP backend) then we should scream
> > LOUDLY in the log file at level 0 IMO.
> 
> Are you really saying that we should do the removal in the top
> level, not in the backend code itself?

Not at all.

 This would mean to introduce
> even another API call to the id mapping interface - one to 
> modify / delete a mapping.

More importantly it wouldn't make sense. Upper layers have no way to determine what is the priority in trying to remove an entry.
 
> (As you know, I am currently more into removing the R/W methods
> from the surface altogehter, putting them back under the hood
> into the backends. :-)

I know :)

Comment 13 Michael Adam 2009-06-26 19:12:14 UTC
I am still brooding over how to get this right...