While researching winbindd core dumps, I came across some minor things that valgrind warned about. Hopefully these (small) patches are useful.
First problem: complaints about uninitialised value. These occur in the event timeout handling. ==22053== Conditional jump or move depends on uninitialised value(s) ==22053== at 0x11A0DB: timeval_min (in /usr/sbin/winbindd) ==22053== by 0x152503: event_add_to_select_args (in /usr/sbin/winbindd) ==22053== by 0xB5810: schedule_async_request (in /usr/sbin/winbindd) ==22053== by 0x8A05D: init_child_connection (in /usr/sbin/winbindd) ==22053== by 0xB6FD4: async_domain_request (in /usr/sbin/winbindd) ==22053== by 0x8A2E2: add_trusted_domains (in /usr/sbin/winbindd) ==22053== by 0x8A864: rescan_trusted_domains (in /usr/sbin/winbindd) ==22053== by 0x7F2A8: main (in /usr/sbin/winbindd) ==22053== Conditional jump or move depends on uninitialised value(s) ==22053== at 0x11A0F3: timeval_min (in /usr/sbin/winbindd) ==22053== by 0x152503: event_add_to_select_args (in /usr/sbin/winbindd) ==22053== by 0xB5810: schedule_async_request (in /usr/sbin/winbindd) ==22053== by 0x8A05D: init_child_connection (in /usr/sbin/winbindd) ==22053== by 0xB6FD4: async_domain_request (in /usr/sbin/winbindd) ==22053== by 0x8A2E2: add_trusted_domains (in /usr/sbin/winbindd) ==22053== by 0x8A864: rescan_trusted_domains (in /usr/sbin/winbindd) ==22053== by 0x7F2A8: main (in /usr/sbin/winbindd) ==22068== Conditional jump or move depends on uninitialised value(s) ==22068== at 0x11A0DB: timeval_min (in /usr/sbin/winbindd) ==22068== by 0x152503: event_add_to_select_args (in /usr/sbin/winbindd) ==22068== by 0xB5810: schedule_async_request (in /usr/sbin/winbindd) ==22068== by 0xBA20F: do_async (in /usr/sbin/winbindd) ==22068== by 0xBEE10: winbindd_sid2gid_async (in /usr/sbin/winbindd) ==22068== by 0x9F74F: sid2gid_lookupsid_recv (in /usr/sbin/winbindd) ==22068== by 0xB9C91: lookupsid_recv (in /usr/sbin/winbindd) ==22068== by 0xB76EB: do_async_recv (in /usr/sbin/winbindd) ==22068== by 0xB68E4: async_reply_recv (in /usr/sbin/winbindd) ==22068== by 0x7DF4C: rw_callback (in /usr/sbin/winbindd) ==22068== by 0x7F508: main (in /usr/sbin/winbindd)
Created attachment 4851 [details] Initialize event timeout
Created attachment 4852 [details] Free password before feeding it to smb_xstrdup Patch fixes this warning: ==25936== 27 bytes in 1 blocks are possibly lost in loss record 77 of 338 ==25936== at 0x48237D8: malloc (vg_replace_malloc.c:195) ==25936== by 0x4AB8B8F: strdup (in /lib/libc-2.7.so) ==25936== by 0x225A1C: smb_xstrdup (in /usr/sbin/winbindd) ==25936== by 0x2AA911: fetch_ldap_pw (in /usr/sbin/winbindd) ==25936== by 0x53B60D: ??? (in /usr/sbin/winbindd) ==25936== by 0x53C489: ??? (in /usr/sbin/winbindd) ==25936== by 0x53CC3A: smbldap_search_suffix (in /usr/sbin/winbindd) ==25936== by 0x53D844: smbldap_search_domain_info (in /usr/sbin/winbindd) ==25936== by 0x1E69EA: pdb_init_ldapsam (in /usr/sbin/winbindd) ==25936== by 0x1D3C58: make_pdb_method_name (in /usr/sbin/winbindd) ==25936== by 0x1D40DD: ??? (in /usr/sbin/winbindd) ==25936== by 0x1D43DA: pdb_sid_to_id (in /usr/sbin/winbindd)
Created attachment 4853 [details] Free smbldap_state when freeing idmap_alloc_ldap When idmap_ldap_alloc_init() TALLOC_FREE's idmap_alloc_ldap, it should also free idmap_alloc_ldap->smbldap_state, shouldn't it? I'm still unsure whether or not smbldap_free_struct() should be called even when NT_STATUS_IS_OK( ret ) is true here. Or does that happen from idmap_ldap_close_destructor()?
The first patch is obviously correct. I'm not 100% sure about the second and third ones. The second one: Shouldn't we do the SAFE_FREE at the beginning of the if-block to also catch all the error returns? The third one: I think it is not valid for the first "goto done;" error condition (smbldap_init failed) to do the free. I think that belongs into the get_credentials() failure branch only. Comments? Volker
The second patch is not correct. The actual issue is that the "bind_dn" and the "bind_secret" elements are not freed when smbldap_close() is called, and are allocated in fetch_ldap_pw().. I'll fix this for master and 3.5.0. Jeremy.
Created attachment 4859 [details] Correct fix for "Free password before feeding it to smb_xstrdup" This is the fix I'm going to add to master and 3.5.x for the memory leak. ==25936== 27 bytes in 1 blocks are possibly lost in loss record 77 of 338 ==25936== at 0x48237D8: malloc (vg_replace_malloc.c:195) ==25936== by 0x4AB8B8F: strdup (in /lib/libc-2.7.so) ==25936== by 0x225A1C: smb_xstrdup (in /usr/sbin/winbindd) ==25936== by 0x2AA911: fetch_ldap_pw (in /usr/sbin/winbindd) ==25936== by 0x53B60D: ??? (in /usr/sbin/winbindd) ==25936== by 0x53C489: ??? (in /usr/sbin/winbindd) ==25936== by 0x53CC3A: smbldap_search_suffix (in /usr/sbin/winbindd) ==25936== by 0x53D844: smbldap_search_domain_info (in /usr/sbin/winbindd) ==25936== by 0x1E69EA: pdb_init_ldapsam (in /usr/sbin/winbindd) ==25936== by 0x1D3C58: make_pdb_method_name (in /usr/sbin/winbindd) ==25936== by 0x1D40DD: ??? (in /usr/sbin/winbindd) ==25936== by 0x1D43DA: pdb_sid_to_id (in /usr/sbin/winbindd) I'd appreciate it if you also test it. Thanks ! Jeremy.
(In reply to comment #7) > Correct fix for "Free password before feeding it to smb_xstrdup" > I'd appreciate it if you also test it. I've tested it with 3.4.2: the leak seems gone. There still are some others though; I'll see if I can provide some more fixes/hints, if you're interested. Regards, roel
Created attachment 4861 [details] Valgrind output of winbindd -i after single connect with smbclient -L Valgrind output produced by running winbindd -i under valgrind and connecting to the samba server once, with "smbclient -L localhost -U user%pass".
Thanks for testing. I'll take a look at these. I'm guessing it's not shutting down correctly from the look of things. Jeremy.
Ok, I'm extremely confused by these valgrind errors: ==22205== 11 bytes in 1 blocks are possibly lost in loss record 39 of 339 ==22205== at 0x48237D8: malloc (vg_replace_malloc.c:195) ==22205== by 0x4AB8B8F: strdup (in /lib/libc-2.7.so) ==22205== by 0x539F13: smbldap_set_creds (smbldap.c:1855) ==22205== by 0x55F6FC: get_credentials (idmap_ldap.c:142) ==22205== by 0x55FEC8: idmap_ldap_db_init (idmap_ldap.c:879) ==22205== by 0x559E9F: idmap_init_domain (idmap.c:318) ==22205== by 0x55A789: idmap_find_domain (idmap.c:361) ==22205== by 0x55BC06: idmap_new_mapping (idmap.c:668) ==22205== by 0x55C302: idmap_sid_to_gid (idmap_util.c:289) ==22205== by 0x1C6C45: winbindd_dual_sid2gid (winbindd_idmap.c:374) ==22205== by 0x1BDD89: schedule_async_request (winbindd_dual.c:453) ==22205== by 0x1C221F: do_async (winbindd_async.c:83) which seem to be when the secret is being copied inside smbldap_set_creds(). After my patch smbldap_set_creds() should look like this: 1840 bool smbldap_set_creds(struct smbldap_state *ldap_state, bool anon, const char *dn, const char *secret) 1841 { 1842 ldap_state->anonymous = anon; 1843 1844 /* free any previously set credential */ 1845 1846 SAFE_FREE(ldap_state->bind_dn); 1847 if (ldap_state->bind_secret) { 1848 /* make sure secrets are zeroed out of memory */ 1849 memset(ldap_state->bind_secret, '\0', strlen(ldap_state->bind_secret)); 1850 SAFE_FREE(ldap_state->bind_secret); 1851 } 1852 1853 if ( ! anon) { 1854 ldap_state->bind_dn = SMB_STRDUP(dn); 1855 ldap_state->bind_secret = SMB_STRDUP(secret); 1856 } 1857 1858 return True; 1859 } When ldap_state->bind_secret is assigned to in line 1855 IT SHOULD HAVE BEEN FREED at line 1850 (which is only triggered if ldap_state->bind_secret contains a non-NULL value). I can't see how these memory losses are occurring. Can you check if your smbldap_set_creds() looks identical to this one please ? Jeremy.
(In reply to comment #11) > I can't see how these memory losses are occurring. Can you check if your > smbldap_set_creds() looks identical to this one please ? I'm on a week's holiday now; I'll get back to this next week. Thanks, roel
(In reply to comment #11) > I can't see how these memory losses are occurring. Can you check if your > smbldap_set_creds() looks identical to this one please ? Yes, it does. Maybe valgrind is wrong here? (It does say "possibly lost" ..) I'm still looking if I can provide you with a proper testcase with debugging, but so far I haven't found it.
Created attachment 4888 [details] smbldap_state is not freed when verify_idpool() fails
(In reply to comment #5) > The third one: I think it is not valid for the first "goto done;" error > condition (smbldap_init failed) to do the free. I think that belongs into the > get_credentials() failure branch only. You are right. I posted a new patch which calls smbldap_free_struct() only when verify_idpool() fails. Regards, roel
reassigning to Jeremy, since he was looking at this bug...
I have some more information about what happens here, related to the original crashes I have been trying to debug. The following is what I believe is happening: A client does an idmap-related request to winbindd (for example: wbinfo --allocate-gid) Because I haven't configured an idmap, the idmap initialization fails. That causes idmap_ldap_alloc_init() to free idmap_alloc_ldap. However, there still is an event for smbldap_idle_fn(). When this event is run, I get either a crash or an infinite loop, continually calling run_events() with an event for smbldap_idle_fn().
Created attachment 4993 [details] level 11 winbind debug log of crash This is a part of the debug log of winbindd, at the time of the crash. There are some additional debug statements in there to help clarify things for myself, I hope they're not confusing for you.
(In reply to comment #17) > I have some more information about what happens here, related to the original > crashes I have been trying to debug. > The following is what I believe is happening: > > A client does an idmap-related request to winbindd (for example: wbinfo > --allocate-gid) > Because I haven't configured an idmap, the idmap initialization fails. That > causes idmap_ldap_alloc_init() to free idmap_alloc_ldap. However, there still > is an event for smbldap_idle_fn(). When this event is run, I get either a crash > or an infinite loop, continually calling run_events() with an event for > smbldap_idle_fn(). infinite loops in smbldap_idle_fn sound familiar, see bug #6894 >
Created attachment 5424 [details] Initialize event timeout, 3.5.0rc3 3.5.0 still has the uninitialized timer.
(In reply to comment #11) > Ok, I'm extremely confused by these valgrind errors: > [snip] > which seem to be when the secret is being copied inside smbldap_set_creds(). I think valgrind gives this error, not because ldap_state->bind_dn and bind_secret are assigned to while still in use, but because ldap_state->bind_dn and bind_secret are not free'd when ldap_state becomes unaddressable. With 3.5.0rc3 I can still reproduce it, not identically, but with a similar code path: ==11147== 77 bytes in 3 blocks are definitely lost in loss record 7 of 32 ==11147== at 0x482281D: malloc (vg_replace_malloc.c:207) ==11147== by 0x4AB4B8F: strdup (in /lib/libc-2.7.so) ==11147== by 0x510D95: smbldap_set_creds (smbldap.c:1947) ==11147== by 0x5129FD: another_ldap_try (smbldap.c:1110) ==11147== by 0x513699: smbldap_search_ext (smbldap.c:1406) ==11147== by 0x513E4A: smbldap_search_suffix (smbldap.c:1726) ==11147== by 0x514CF5: smbldap_search_domain_info (smbldap_util.c:280) ==11147== by 0x11A492: pdb_init_ldapsam (pdb_ldap.c:6617) ==11147== by 0x106EE8: make_pdb_method_name (pdb_interface.c:156) ==11147== by 0x10733D: pdb_get_methods_reload (pdb_interface.c:193) ==11147== by 0x1073EE: pdb_enum_trusteddoms (pdb_interface.c:1931) ==11147== by 0xC9565: sam_trusted_domains (winbindd_passdb.c:653) With 3.5.0rc3 I can trigger this warning by running winbind under valgrind and at some point signaling it with ctrl-C. In that case, pdb_init_ldapsam, which has allocated ldap_state, returns without having free'd ldap_state->bind_dn and bind_secret. During normal operation I haven't seen it so far, so it might be just something that occurs when shutting down - e.g. not something to worry about. Since Jeremy's patch is already included, the only possibly useful thing in this bug report is the tiny event-timeout patch. Regards, roel
Created attachment 5429 [details] git-am fix for 3.5.0 final. I think this needs to be in 3.5.0 final. Volker & metze please review. Jeremy.
Created attachment 5430 [details] git-am fix for 3.4.7. Similar patch for 3.4.7. Jeremy.
Karo, please pull into the release branches. Zero risk from my point of view, your call if it goes into 3.5.0 or 3.5.1. Thanks, Volker
Pushed to v3-4-test and v3-5-test. Closing out bug report. Thanks!