Bug 13362 - Possible memory leak in the Samba process
Summary: Possible memory leak in the Samba process
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.7.7
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-29 10:57 UTC by Szombathelyi György
Modified: 2018-10-09 07:26 UTC (History)
4 users (show)

See Also:


Attachments
smbcontrol pool-usage (736.72 KB, application/x-bzip)
2018-05-07 12:58 UTC, Szombathelyi György
no flags Details
patch (4.68 KB, patch)
2018-05-07 15:00 UTC, Volker Lendecke
no flags Details
Patch (3.85 KB, patch)
2018-05-09 06:05 UTC, Volker Lendecke
no flags Details
Patch with correct tags (3.96 KB, patch)
2018-05-09 06:08 UTC, Volker Lendecke
metze: review+
Details
git log -p -2 (3.68 KB, text/plain)
2018-05-09 12:51 UTC, Volker Lendecke
no flags Details
Pool dump with the patch (1.38 MB, application/x-bzip)
2018-05-09 18:12 UTC, Szombathelyi György
no flags Details
patch with cherry-pick info (4.30 KB, patch)
2018-05-15 09:01 UTC, Volker Lendecke
vl: review? (metze)
slow: review? (jra)
abartlet: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Szombathelyi György 2018-03-29 10:57:05 UTC
Some samba processes are constantly growing in an AD DC role.

ps -eF

.
.
.
root      1178  1032  0 133105 28108  2 Mar20 ?        00:31:34  \_ /usr/sbin/samba --foreground --no-process-group
root      1201  1178  0 358267 935100 3 Mar20 ?        01:00:40  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1206  1178  0 361063 946304 1 Mar20 ?        01:01:08  |   \_ /usr/sbin/samba --foreground --n
nmo-process-group
root      1229  1178  0 360594 944484 4 Mar20 ?        01:01:07  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1230  1178  0 359761 941164 6 Mar20 ?        01:00:49  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1447  1178  0 358211 934964 6 Mar20 ?        01:00:50  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12119  1178  0 133210 37052  3 12:34 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12310  1178  0 133210 37180  7 12:35 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12433  1178  0 133210 37052  3 12:36 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12709  1178  0 133210 37180  6 12:38 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12779  1178  0 133210 37180  3 12:38 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12873  1178  0 133210 37180  3 12:39 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     12962  1178  0 133210 37180  2 12:39 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13047  1178  0 133210 37052  1 12:40 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13227  1178  0 133210 37180  6 12:41 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13253  1178  0 133210 37180  3 12:41 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13256  1178  0 133210 37984  3 12:41 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13399  1178  0 133210 37180  0 12:41 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13413  1178  0 133210 37180  3 12:42 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13769  1178  0 133210 37180  7 12:44 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13781  1178  0 133210 37052  6 12:44 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13805  1178  0 133210 37244  7 12:44 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
root     13901  1178  1 133210 38012  6 12:44 ?        00:00:00  |   \_ /usr/sbin/samba --foreground --no-process-group
.
.
.
The 5 samba processes (at the top, PID 1201,1206,1229,1230,1447), which started 9 days ago are eating nearly 1 G of RAM now.
Comment 1 Szombathelyi György 2018-04-04 14:05:28 UTC
After 2 workdays, it grew to 1.3 Gigs

root      1201  1178  0 453647 1312524 3 Mar20 ?       01:25:08  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1206  1178  0 455710 1320944 0 Mar20 ?       01:25:20  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1229  1178  0 456662 1324728 3 Mar20 ?       01:25:44  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1230  1178  0 454144 1314644 1 Mar20 ?       01:25:05  |   \_ /usr/sbin/samba --foreground --no-process-group
root      1447  1178  0 452970 1309832 1 Mar20 ?       01:25:18  |   \_ /usr/sbin/samba --foreground --no-process-group


However, seems these processes are there because of nslcd, when restarted it, the samba processes above were also gone. But it's strange if a persistent LDAP connection causes samba to grow and grow.
Comment 2 Szombathelyi György 2018-05-07 10:53:48 UTC
I've managed to reproduce it deterministically:

1. set up samba AD, use nslcd on the machine to resolve users and groups from Samba AD (I know it is not a recommended setup).
2. # while true; do groups someuser > /dev/null; done
3. after some minutes, watch Samba LDAP processes grow. Restart nslcd to reclaim the memory of Samba.
Comment 3 Volker Lendecke 2018-05-07 11:50:54 UTC
Can you try running

samba-tool <pid> pool-usage > /tmp/pool.txt

with <pid> being the large process and attach the output. Please be aware that the output might be large.

Thanks
Comment 4 Szombathelyi György 2018-05-07 12:30:01 UTC
Yepp, it's huge (20M), but mostly filled with this:
                       ldb_module: paged_results      contains 6420400 bytes in 149571 blocks (ref 0) 0x55e3fc8ff590
                            struct private_data            contains 6420334 bytes in 149569 blocks (ref 0) 0x55e3fd93a7d0
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e4027675f0
                                    74784                          contains      6 bytes in   1 blocks (ref 0) 0x55e40282e520
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e4027bcf20
                                    74783                          contains      6 bytes in   1 blocks (ref 0) 0x55e4027bafa0
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e4027c0630
                                    74782                          contains      6 bytes in   1 blocks (ref 0) 0x55e4027eed80
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e402827f80
                                    74781                          contains      6 bytes in   1 blocks (ref 0) 0x55e402836800
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e402827a30
                                    74780                          contains      6 bytes in   1 blocks (ref 0) 0x55e40281fe90
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e402809f30
                                    74779                          contains      6 bytes in   1 blocks (ref 0) 0x55e4027f74a0
                                struct results_store           contains     86 bytes in   2 blocks (ref 0) 0x55e4027df1b0
                                    74778                          contains      6 bytes in   1 blocks (ref 0) 0x55e4027a8620

(The largest sequence no.(?) now is 74784 and just counting).
Comment 5 Szombathelyi György 2018-05-07 12:32:07 UTC
First dump:
ldb_module: paged_results      contains 6420400 bytes in 149571 blocks (ref 0) 0x55e3fc8ff590

After 4 minutes:
ldb_module: paged_results      contains 8184776 bytes in 190603 blocks (ref 0) 0x55e3fc8ff590
Comment 6 Volker Lendecke 2018-05-07 12:51:45 UTC
Can you please compress (bzip2 -9) such an output and attach it to this bug?
Comment 7 Szombathelyi György 2018-05-07 12:58:28 UTC
Created attachment 14180 [details]
smbcontrol pool-usage
Comment 8 Volker Lendecke 2018-05-07 15:00:47 UTC
Created attachment 14181 [details]
patch

Completely untested patch (I'm still working on it). Uploading just for my own reference. Stay tuned, this will be fixed.
Comment 9 Szombathelyi György 2018-05-07 19:51:16 UTC
(In reply to Volker Lendecke from comment #8)
Thanks!
I'm not really compiling Samba myself, but will report back if a fix will appear in a released version.
Comment 10 Volker Lendecke 2018-05-09 06:05:43 UTC
Created attachment 14184 [details]
Patch

The attach patch survived all our tests. Can you try that?
Comment 11 Volker Lendecke 2018-05-09 06:08:56 UTC
Created attachment 14185 [details]
Patch with correct tags

Now with correct Bug: tagging
Comment 12 Szombathelyi György 2018-05-09 09:37:22 UTC
Tried to apply it, but seems it is not for good for the 4.7 series.
Comment 13 Volker Lendecke 2018-05-09 09:40:06 UTC
What did you exactly try?

git am -3 patch.txt

in a 4.7 branch does it well. What would be the most convenient way for you to apply this patch?
Comment 14 Szombathelyi György 2018-05-09 10:07:35 UTC
Just tried to add this to the source rpm build in Fedora. This has 4.7.7. Rpmbuild  tries to apply it with the patch utility.
Comment 15 Volker Lendecke 2018-05-09 10:38:57 UTC
Probably you need to add "-p1" to the %patch line somewhere
Comment 16 Szombathelyi György 2018-05-09 10:45:39 UTC
Unfortunately, -p1 was already there
+ /usr/bin/patch -p1 -s --fuzz=2 --no-backup-if-mismatch
2 out of 4 hunks FAILED -- saving rejects to file lib/ldb/modules/paged_results.c.rej
5 out of 6 hunks FAILED -- saving rejects to file lib/ldb/modules/paged_results.c.rej

Maybe I'll try it with git, just wanted to avoid to checkout the whole tree.
Comment 17 Volker Lendecke 2018-05-09 11:31:35 UTC
That's weird. Sorry, out of ideas.
Comment 18 Szombathelyi György 2018-05-09 12:37:42 UTC
(In reply to Volker Lendecke from comment #17)
If you already applied to the 4.7 branch via git am -3, can you post git log -p -2 please?
Comment 19 Volker Lendecke 2018-05-09 12:51:59 UTC
Created attachment 14188 [details]
git log -p -2
Comment 20 Szombathelyi György 2018-05-09 13:04:21 UTC
(In reply to Volker Lendecke from comment #19)
Just had to reverse the order of the two commits, and now it compiles! Will install it after office hours, thanks!
Comment 21 Szombathelyi György 2018-05-09 18:10:25 UTC
Tested the patch, still not perfect:

                      ldb_module: paged_results      contains 13566682 bytes in 314433 blocks (ref 0) 0x560768c31a50
                            struct private_data            contains 13566616 bytes in 314431 blocks (ref 0) 0x56076982c3e0
                                struct results_store           contains     87 bytes in   2 blocks (ref 0) 0x56076bdfaa40
                                    157215                         contains      7 bytes in   1 blocks (ref 0) 0x56076bdff520
                                struct results_store           contains     87 bytes in   2 blocks (ref 0) 0x56076bdd8810
                                    157214                         contains      7 bytes in   1 blocks (ref 0) 0x56076be18080
                                struct results_store           contains     87 bytes in   2 blocks (ref 0) 0x56076bdedd00
                                    157213                         contains      7 bytes in   1 blocks (ref 0) 0x56076bd919a0
                                struct results_store           contains     87 bytes in   2 blocks (ref 0) 0x56076bdc1fb0
                                    157212                         contains      7 bytes in   1 blocks (ref 0) 0x56076bc8a0b0
                                struct results_store           contains     87 bytes in   2 blocks (ref 0) 0x56076be18120
                                    157211                         contains      7 bytes in   1 blocks (ref 0) 0x56076be01b20
                                struct results_store           contains     87 bytes in   2 blocks (ref 0) 0x56076bd896e0
                                    157210                         contains      7 bytes in   1 blocks (ref 0) 0x56076be1b2d0


.
.
.
lots of results_store
Comment 22 Szombathelyi György 2018-05-09 18:12:46 UTC
Created attachment 14190 [details]
Pool dump with the patch
Comment 23 Volker Lendecke 2018-05-11 08:12:11 UTC
Strange. I have just written a little reproducing torture test and without the patch got exactly the pool dump you sent. Tons and tons of results_store entries. With the patch that number was limited to 10 in a rolling fashion. Can you double-check that you really did run with the patch applied?
Comment 24 Szombathelyi György 2018-05-11 21:16:59 UTC
(In reply to Volker Lendecke from comment #23)
Just realized that this RPM uses an external libldb, so I have to rebuild the libldb package, sorry for the misinformation.
Comment 25 Volker Lendecke 2018-05-13 18:44:23 UTC
Can you update the defect when you've been able to rebuild libldb? Thanks!
Comment 26 Volker Lendecke 2018-05-13 19:02:24 UTC
Comment on attachment 14185 [details]
Patch with correct tags

Stefan, do we want to push this even without the confirmation? It survived an autobuild and a manual test by me.
Comment 27 Szombathelyi György 2018-05-14 08:12:21 UTC
(In reply to Volker Lendecke from comment #25)
Just built the new libldb rpm, will try it tonight.
Comment 28 Szombathelyi György 2018-05-14 18:36:01 UTC
So running the tests for a while now, and seems the memory usage is stable. The pool dump shows no more results_store than 10. Thanks!
Comment 29 Stefan Metzmacher 2018-05-15 07:56:15 UTC
Comment on attachment 14185 [details]
Patch with correct tags

Looks good, thanks! It's already in master...
Comment 30 Volker Lendecke 2018-05-15 09:01:32 UTC
Created attachment 14201 [details]
patch with cherry-pick info
Comment 31 Volker Lendecke 2018-05-15 09:01:50 UTC
Comment on attachment 14201 [details]
patch with cherry-pick info

builds on 4.7 and 4.8 btw
Comment 32 Karolin Seeger 2018-09-26 09:05:34 UTC
Anyone volunteering for review?
We should add the patch to the 4.8 release branch.
Comment 33 Andrew Bartlett 2018-09-27 00:49:58 UTC
Comment on attachment 14201 [details]
patch with cherry-pick info

Sorry I missed this.  Reviewed and agreed for 4.7 and 4.8
Comment 34 Karolin Seeger 2018-09-27 09:12:27 UTC
(In reply to Andrew Bartlett from comment #33)
Pushed to autobuild-v4-{8,7}-test.
Comment 35 Karolin Seeger 2018-10-09 07:26:20 UTC
(In reply to Karolin Seeger from comment #34)
Pushed to both branches.
Closing out bug report.

Thanks!