Repeatable. Run Samba with "security=ads". With samba running, Shutdown the AD. Then, on the Samba system, do various commands that attempt to access the AD ("id -a <global_user>" works). This results in a winbind core dump. Related to 1185. Various methods, including name_to_sid, sid_to_name, and enum_dom_groups call ads_cached_connection [nsswitch/winbindd_ads.c] to get the current ads structure. If the AD has gone down, the domain->private can contain a pointer to an incomplete ADS_STRUCT when a reconnection attempt it made. The domain->private->ld value is set, domain->private->server->realm is an empty string, and all other values are NULL. The return value from ads_cached_connection is checked to ensure its value is non-null, which it is not since domain->private points to a valid structure. However, since all the values of the ads structure (ie: ads->config.realm) are null, winbindd core dumps later when it attempts to access these values (strdup, strlen, etc).
Created attachment 444 [details] Check validity of domain->private This patch adds a check to nsswitch/winbindd_ads.c to verify that domain->private->config.realm is non-NULL before returning domain->private as the cached ads connection. The patch does not free domain->private, as several methods within winbindd_ads.c set it to NULL with an explicit comment not to free the memory. I'll look into the memory-freeing a bit more, as well as how an uninitialized ADS_STRUCT is being put into domain->private. I suspect the add_trusted_domain is doing it, but haven't verified.
John, Can you take a qucik look and see if this still exists in the lates 3.0 cvs tree (or even 3.0.2a). Thanks.
Guenther reproduced this on 3.0.2a.
Looks good John. Thanks again. Applying to the 3.0 and HEAD branches.
actually, the problem is a little more severe. ads_cached_connection() passes back a pointer and the caller can call ads_destroy() to void a connection. But domain->private still points to the free'd() structure. I need to figure out how to res4et domain->private when we call ads_destroy() so we don't have to examine its contents to figure out if it is valid or not. Temporiarily reopening....
Agreed with your previous comment. The domain->private is being set by a successful ads_cached_connection. Later, when the ads connection fails, the ads is freed in various calls via a call to ads_destroy. The ads memory is zeroed, hence the NULL values in domain->private. Since the memory pointed to by domain- >private is now unallocated, the above patch is not a viable solution. A possible, minimal impact, solution is to have ads_destroy call a new function in winbindd_util.c, passing it the ads as a parameter. The new function would checks known domains for that ads memory pointer, zeroing the domain->private if found. This should work without impact to other backends.
Created attachment 450 [details] prevent the ADS_STRUCT from being freed() unless we own it.
Seems to fix the problem for me.
domain->private must be set to NULL in your patch in file nsswitch/winbindd_ads.c around the new SMB_ASSERT, else the ads->config.realm test can access the old memory, and the ads->is_mine = True can give a SEGFAULT. Also, and I'm well aware this is a nit, it would be clearer if the "is_mine" attribute was named something else like "free_on_destroy" with comments to match.
I'll ignore the nit but you're right about the ASSERT. Fixing that and closing this bug.
Verified; fixes stated problem. Good job.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.