Reported by Ira (site details elided).
Repeated connect/disconnect will degrade service to an unacceptable level, and
cause the host to max out CPU utilization.
The first part of this problem is shown in dos.patch enclosed in this mail. I
am not fully convinced that this patch is right. But the basic thing I
discovered is that each time you connect to the main smbd, it "leaks" into the
smb_server_conn a string copy of the host connecting, and 0.0.0.0. Over
hundreds and thousands of connections, this yields a DOS. The machine spends
so much time freeing memory as sessions just connect and disconnect that it
slows the machine to a crawl.
The patch represents what I have so far to mitigate it. As you'll see I don't
have a full mitigation yet.
Each set of graphs from here on, will come in 3's. 1 for CPU, 1 for the number
of connections, and one for the number of system calls vs. lock acquisition
Case 1: This shows our older version of Samba, it a cut off master at
23ad6919a1e5f16d02e22adcf36ea7f039a9eeea with local patching. The CPU use
actually most closely tracks the op rate for that machine. Connections appear
to have little to no bearing on CPU use.
Case 2: This shows a version of samba taken about 2 weeks before the freeze of
3.6.2. You will note that compared to the graphs for #1, the CPU is now
pegged, the connections per second is now quite low, and the contention over
mutexes is now higher than the actual number of system calls. (Very rare on
our systems. In fact this is the first time I've ever seen it.)
Case 3: (note this is on a 12 hr timeline, the others are 24hrs.) This shows
the effect of the patch I wrote. You'll see that the issue is clearly not
fixed. But, that it is far better than in the previous cases. The crossing of
syscall and mutex is still occuring, at the end. But the connections per
second and CPU use are much improved.
Created attachment 7259 [details]
Fix for 3.6.x
Found by Ira. Fixes by him and me.
Re-assigning to Karolin for 3.6.3. Karolin, we need to get a CVE number first.
Karolin - number is CVE-2012-0817
Patch is included in 3.6.3.
Closing out bug report.