Reported by Ira (site details elided). Repeated connect/disconnect will degrade service to an unacceptable level, and cause the host to max out CPU utilization. The first part of this problem is shown in dos.patch enclosed in this mail. I am not fully convinced that this patch is right. But the basic thing I discovered is that each time you connect to the main smbd, it "leaks" into the smb_server_conn a string copy of the host connecting, and 0.0.0.0. Over hundreds and thousands of connections, this yields a DOS. The machine spends so much time freeing memory as sessions just connect and disconnect that it slows the machine to a crawl. The patch represents what I have so far to mitigate it. As you'll see I don't have a full mitigation yet. Each set of graphs from here on, will come in 3's. 1 for CPU, 1 for the number of connections, and one for the number of system calls vs. lock acquisition misses. Case 1: This shows our older version of Samba, it a cut off master at 23ad6919a1e5f16d02e22adcf36ea7f039a9eeea with local patching. The CPU use actually most closely tracks the op rate for that machine. Connections appear to have little to no bearing on CPU use. Case 2: This shows a version of samba taken about 2 weeks before the freeze of 3.6.2. You will note that compared to the graphs for #1, the CPU is now pegged, the connections per second is now quite low, and the contention over mutexes is now higher than the actual number of system calls. (Very rare on our systems. In fact this is the first time I've ever seen it.) Case 3: (note this is on a 12 hr timeline, the others are 24hrs.) This shows the effect of the patch I wrote. You'll see that the issue is clearly not fixed. But, that it is far better than in the previous cases. The crossing of syscall and mutex is still occuring, at the end. But the connections per second and CPU use are much improved.
Created attachment 7259 [details] Fix for 3.6.x Found by Ira. Fixes by him and me. Jeremy.
Opening up...
Re-assigning to Karolin for 3.6.3. Karolin, we need to get a CVE number first. Jeremy.
Karolin - number is CVE-2012-0817
Pushed. Patch is included in 3.6.3. Closing out bug report. Thanks!