Bug 8827 - Increasing memory consumption of samba drepl and rpc server
Increasing memory consumption of samba drepl and rpc server
Status: RESOLVED FIXED
Product: Samba 4.0
Classification: Unclassified
Component: Other
4.0 alpha 18
All All
: P5 normal
: ---
Assigned To: Andrew Bartlett
samba4-qa@samba.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-26 16:07 UTC by Arvid Requate
Modified: 2012-08-19 05:50 UTC (History)
2 users (show)

See Also:


Attachments
example output of --leak-report (24.49 KB, text/plain)
2012-03-26 16:07 UTC, Arvid Requate
no flags Details
leak report of drepl and rpc server (23.80 KB, text/plain)
2012-04-10 10:12 UTC, Arvid Requate
no flags Details
leak report with one of the processes owning 2GB of memory (drepl?) (29.17 KB, application/octet-stream)
2012-05-11 06:39 UTC, Arvid Requate
no flags Details
leak report with two of the processes owning 1.5GB of memory (rpc_server and drepl?) (44.03 KB, application/octet-stream)
2012-05-11 06:43 UTC, Arvid Requate
no flags Details
My proposed fix (currently under test) (22.96 KB, patch)
2012-08-14 06:35 UTC, Andrew Bartlett
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arvid Requate 2012-03-26 16:07:57 UTC
Created attachment 7406 [details]
example output of --leak-report

The memory usage of the drepl and rpc server processes increases over time, maybe indicating memory leaks. Output of --leak-report is attached. Is there any other useful information to extract to help tracking this down?
Comment 1 Arvid Requate 2012-04-10 10:12:51 UTC
Created attachment 7446 [details]
leak report of drepl and rpc server
Comment 2 Arvid Requate 2012-05-11 06:39:32 UTC
Created attachment 7551 [details]
leak report with one of the processes owning 2GB of memory (drepl?)
Comment 3 Arvid Requate 2012-05-11 06:43:06 UTC
Created attachment 7552 [details]
leak report with two of the processes owning 1.5GB of memory (rpc_server and drepl?)

As I understand this might be caused by long running memory contexts? The full leak report does not reveal any insight to me yet. Is there any other information that could be useful to narrow this down?
Comment 4 Arvid Requate 2012-05-15 16:11:14 UTC
As Metze explained to me, similar behavior was seen before

https://lists.samba.org/archive/samba-technical/2010-November/074397.html

Maybe the dynamic mmap thresholding in glibc/eglibc doesn't play well here, which dynamically increases the mmap threshold on each free() *and* increases the MALLOC_TRIM_THRESHOLD to twice the adjusted MALLOC_MMAP_THRESHOLD.

See e.g. http://sources.redhat.com/ml/libc-alpha/2006-03/msg00033.html

To experiment it might be useful to disable this dynamic behavior by setting the environment variable MALLOC_TRIM_THRESHOLD_ e.g. to the static default of 128*1024 for the samba process environment.
Comment 5 Arvid Requate 2012-06-05 15:37:24 UTC
Tests with Metzes malloc-reclaim.c example code show that adjustment of the malloc TRIM and/or MMAP thresholds does not reduce the problem.
 
shell# gcc -o malloc-reclaim malloc-reclaim.c 
shell# MALLOC_TRIM_THRESHOLD_=1 MALLOC_MMAP_THRESHOLD_=1 ./malloc-reclaim

On the contrary, while using using MMAP more or less exclusively makes free() actually return the freed memory immediately, malloc allocates memory pages only in 4k granularity, which isn't efficient in the first place and results in a much larger memory footprint.

Fixing only the TRIM threshold might help in some cases, but not for the specific example of the malloc-reclaim.c example code, because this is explicetely designed to create a large and lasting hole in the heap, while TRIM only acts on unused chunks on the top of the heap. So I guess unfortunately Tridge might be right, advising to reorder the allocation strategy. In this case drepl and rpc_server would need some scrutiny.
Comment 6 Andrew Bartlett 2012-08-09 06:57:30 UTC
Can we get that with --leak-report-full rather than just --leak-report?

Thanks,
Comment 7 Andrew Bartlett 2012-08-14 06:35:33 UTC
Created attachment 7765 [details]
My proposed fix (currently under test)

Based on a clear log provided by Ricky Nace, I was easily able to debug this during this afternoon.
Comment 8 Stefan Metzmacher 2012-08-14 06:40:44 UTC
Comment on attachment 7765 [details]
My proposed fix (currently under test)

Hi Andrew, could we use ldb_dn_copy() instead of talloc_reference() please?
Comment 9 Andrew Bartlett 2012-08-14 08:00:40 UTC
A reasonable suggestion.  I'll include this in my next autobuild.
Comment 10 Andrew Bartlett 2012-08-19 05:50:15 UTC
With additional patches now in master, this is now fixed.

See:

2e1ab13f6ebb2c2cf746457d4783fe9bc5e86de0
a7b8e9f562780dc6a3487644623decd1cff226e2
3c8d8f206b79280604cb79f263e74aa2b681726e