Bug 12736 - Memory leak in ntlm_auth
Summary: Memory leak in ntlm_auth
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Tools (show other bugs)
Version: 4.6.2
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Stefan Metzmacher
QA Contact: Samba QA Contact
URL: https://lists.samba.org/archive/samba...
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-07 09:26 UTC by Stefan Metzmacher
Modified: 2019-01-03 19:40 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2017-04-07 09:26:14 UTC
From Rebecca Gellman rebecca at starfleet-net.co.uk:

Hi, 

I wish to tell you a tale of investigation and suspicion of a memory
leak in the ntlm_auth program (as compiled from
source3/utils/ntlm_auth.c). 

So in our system, we use ntlm_auth to provide both Kerberos and NTLM
authentication for transparent (and not-so transparent) web proxy
connections. In this particular instance, Kerberos is the auth of
choice, but I believe this is relevant to NTLM too. 

We spawn up a pool of ntlm_auth processes on demand. If a process is not
available when a connection comes in, we create a new one up to a
maximum. 

The base software version is 4.2.10, as shipped with Debian 8 (Jessie). 

Now that the particulars of the environment are out of the way..... 

So we have a customer. And they phoned in a support call that their
system was falling over. We dive in remotely, and see each ntlm_auth
process is running at an impressive 1.7GB of memory in the "VIRT" column
of top. EEP! 

Suspecting a memory leak, we compiled up version 4.2.10 with a call to
talloc_enable_leak_reporting_full() in source3/utils/ntlm_auth.c (why
isn't this a command line option at the least?), along with a mod to our
own processes to send CTRL+D to ntlm_auth prior to terminating the
process, and gathering the report on STDERR. 

Left it running for a day and examined the report. Each process had on
average over 19,000 DATA_BLOBs of around 1.5K in size, reportedly
created at lib/util/base64.c line 35, the function
base64_decode_data_blob_talloc(). 

After digging through the code, this function is called on (pseudo-code)
&line_of_STDIN[3] to decode the base64 blob input to a binary blob.
There's then a load of logic surrounding the 2-letter command code and
processing gensec, etc, etc, etc. 

Now... the crux of the matter. If I compare with an earlier (3.x.x)
branch, we see that data_blob_free() is called on this returned blob
whenever the command handler returns. In 4.x.x this code has been
refactored, and the function manage_gensec_request() doesn't always call
data_blob_free(). In fact, it rarely calls it at all. 

There's also a couple of returns that omit talloc_free(mem_ctx) that
seems to precede every other return in this function. 

So, on this hunch, I added in the missing data_blob_free() calls and the
two talloc_free(mem_ctx) calls, and rebuilt. Now the same customer
system is sitting happily at a low memory usage, not showing so much of
a hint of a memory leak. 

I've attached the patch we used to "correct" ntlm_auth.c. I'd be
interested to here some technical commentary from knowledge Samba bods
:) 

Regards 

Rebecca Gellman (On behalf of SmoothWall)
Comment 1 Björn Jacke 2019-01-03 17:01:03 UTC
this fix went to master with this commit:

commit e999b798c6484de3cddad988406f97fc4cc7af79
Author: Stefan Metzmacher <metze@samba.org>
Date:   Tue Apr 4 11:52:56 2017 +0200

That means this is fixed in 4.7 and later

Please make sure to close bugs when they are fixed and/or assign them to Karolin to include them in the release branches.