The bug was originally found when working with IPA on RHEL 6.5. IPA uses winbindd on the server to resolve users and groups against Active Directory and sometimes attempt to open a private pipe to winbindd fails:
Thread 1 (Thread 0x7f7f7dbf5700 (LWP 24268)):
#0 __libc_free (mem=0x30332058554e494c) at malloc.c:3714
#1 0x00007f7fa45e13b5 in winbind_open_pipe_sock (recursing=0, need_priv=0) at ../nsswitch/wb_common.c:361
#2 0x00007f7fa45e143b in winbind_write_sock (buffer=0x7f7f7dbf41a0, count=2096, recursing=0, need_priv=0) at ../nsswitch/wb_common.c:380
#3 0x00007f7fa45e15f3 in winbindd_send_request (req_type=22, need_priv=0, request=0x7f7f7dbf41a0) at ../nsswitch/wb_common.c:580
#4 0x00007f7fa45e18f2 in winbindd_request_response (req_type=22, request=0x7f7f7dbf41a0, response=0x7f7f7dbf33f0) at ../nsswitch/wb_common.c:651
#5 0x00007f7fa49e8ad9 in wbcRequestResponseInt (cmd=<value optimized out>, request=<value optimized out>, response=<value optimized out>) at ../nsswitch/libwbclient/wbclient.c:63
#6 wbcRequestResponse (cmd=<value optimized out>, request=<value optimized out>, response=<value optimized out>) at ../nsswitch/libwbclient/wbclient.c:96
#7 0x00007f7fa49ece1c in wbcLookupName (domain=<value optimized out>, name=0x7f7f0c33f1b0 "", sid=0x7f7f7dbf4aa0, name_type=0x7f7f7dbf4a9c) at ../nsswitch/libwbclient/wbc_sid.c:203
#8 0x00007f7fa4bf3d2f in ?? ()
#9 0x0000000000000000 in ?? ()
It tries to free mem=0x30332058554e494c, but this is not an address, but a string: "LINUX 30". So this was overwritten or reused because this is a double free.
Upon reviewing the code, we need to zero response struct prior to reusing it in wb_common.c:winbind_open_pipe_sock(). If winbindd_request_response() fails, the only fields that could be read out of response struct are length and result.
The change is to ZERO_STRUCT(response) prior to the second winbindd_request_response() call, this should be enough.
Created attachment 9916 [details]
Proposed patch for master
Proposed patch for master attached.
This patch is certainly okay. However, I don't see how we end up with a string in the extra data pointer. .extra_data is different from .data in the response, those are two unions that should never overlap. Do you have a hint for me how this happens?
As far as I can see, it could happen if ZERO_STRUCT was optimized out. The whole situation here looks fishy but in the expanded stacktrace with local variables I can see some old data from stack, which means it wasn't cleared up.
I'm looking at the compiled code currently to see if memset() was indeed optimized out for the response struct.
After discussion with Volker we tend to think it is manifestation of multi-threading issues as this winbind client is running as a plugin of 389-ds and could be instantiated in several threads.
I'm going to look into libwbclient to see if we can use an easy way to serialize winbind requests to a single socket in multi-threaded apps, given that the protocol is synchronous anyway. The other part is to make sure IPA plugin is using 389-ds locking primitives when accessing winbind client API.
Just a thought: If this turns out to be a threading issue that is fixed by proper mutexing, would it be worthwhile to remove that ZERO_STRUCT again? It certainly does not hurt, but it might cost future developers some hair while trying to figure out why it is required at this place. I could not see a single-threaded code path yet where it really matters.
I agree that in the way 'struct winbindd_response' is done, it might be unnecessary to ZERO_STRUCT() it again in this particular case.
However, I think it would be a good move to actually ensure clear pattern of initialized response structure for each call to winbindd. Relying on the fact that a sequence of requests (like interface version and the pipe path) always write to different parts of the response struct is a fragile practice.
If it happens to be a multi-threading issue, I'm going to rework libwbclient code as we discussed with Volker to at least serialize through the single socket and then the code could be removed.
(Jeremy was a bit fast with the commit ;)
Hey, in my defense I saw *no downside* to that patch :-). I can't see any way it would hurt the code path.
(In reply to comment #7)
> Hey, in my defense I saw *no downside* to that patch :-). I can't see any way
> it would hurt the code path.
Sure, it does not hurt the functionality. If you feel it clarifies the code, it is perfectly fine to leave it in. I am just not able to see the code path where this really makes a difference. Have you investigated that closer?
IMHO it makes the code safer, as we do a SAFE_FREE below. Now I know there's no way to get to the code if the request/response fails, but there's always a possibility of someone adding a 'goto fail' jump later.
Personally, if a stack pointer gets SAFE_FREE'd or TALLOC_FREE'd I *always* initialize to NULL when going into scope - protects me from mistakes later.