Bug 8724 - Memory leak in parent smbd on connection; CVE-2012-0817
Summary: Memory leak in parent smbd on connection; CVE-2012-0817
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: All All
: P5 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-26 18:40 UTC by Jeremy Allison
Modified: 2012-03-16 23:38 UTC (History)
1 user (show)

See Also:


Attachments
Fix for 3.6.x (1.70 KB, patch)
2012-01-26 22:42 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2012-01-26 18:40:32 UTC
Reported by Ira (site details elided).

Repeated connect/disconnect will degrade service to an unacceptable level, and
cause the host to max out CPU utilization.

The first part of this problem is shown in dos.patch enclosed in this mail.  I
am not fully convinced that this patch is right.  But the basic thing I
discovered is that each time you connect to the main smbd, it "leaks" into the
smb_server_conn a string copy of the host connecting, and 0.0.0.0.  Over
hundreds and thousands of connections, this yields a DOS.  The machine spends
so much time freeing memory as sessions just connect and disconnect that it
slows the machine to a crawl.

The patch represents what I have so far to mitigate it.  As you'll see I don't
have a full mitigation yet.

Each set of graphs from here on, will come in 3's.  1 for CPU, 1 for the number
of connections, and one for the number of system calls vs. lock acquisition
misses.

Case 1: This shows our older version of Samba, it a cut off master at
23ad6919a1e5f16d02e22adcf36ea7f039a9eeea with local patching.  The CPU use
actually most closely tracks the op rate for that machine.  Connections appear
to have little to no bearing on CPU use.

Case 2: This shows a version of samba taken about 2 weeks before the freeze of
3.6.2.  You will note that compared to the graphs for #1, the CPU is now
pegged, the connections per second is now quite low, and the contention over
mutexes is now higher than the actual number of system calls.  (Very rare on
our systems. In fact this is the first time I've ever seen it.)

Case 3: (note this is on a 12 hr timeline, the others are 24hrs.) This shows
the effect of the patch I wrote.  You'll see that the issue is clearly not
fixed.  But, that it is far better than in the previous cases.  The crossing of
syscall and mutex is still occuring, at the end.  But the connections per
second and CPU use are much improved.
Comment 1 Jeremy Allison 2012-01-26 22:42:05 UTC
Created attachment 7259 [details]
Fix for 3.6.x

Found by Ira. Fixes by him and me.

Jeremy.
Comment 2 Jeremy Allison 2012-01-26 22:42:25 UTC
Opening up...
Comment 3 Jeremy Allison 2012-01-27 18:08:19 UTC
Re-assigning to Karolin for 3.6.3. Karolin, we need to get a CVE number first.

Jeremy.
Comment 4 Jeremy Allison 2012-01-28 05:03:23 UTC
Karolin - number is CVE-2012-0817
Comment 5 Karolin Seeger 2012-01-29 20:18:38 UTC
Pushed.
Patch is included in 3.6.3.
Closing out bug report.

Thanks!