Bug 10469 - Memory Leak In Main Process
Summary: Memory Leak In Main Process
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other (show other bugs)
Version: 4.1.5
Hardware: x64 Linux
: P5 critical (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
: 10519 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-02-25 12:55 UTC by Mike Scholes
Modified: 2016-07-31 18:50 UTC (History)
1 user (show)

See Also:


Attachments
Results from memory script (187.67 KB, text/plain)
2014-05-29 07:11 UTC, Mike Scholes
no flags Details
Results From Work (40.45 KB, text/plain)
2014-05-30 11:48 UTC, Mike Scholes
no flags Details
Patch for v4-1-test (1.58 KB, patch)
2014-07-17 07:57 UTC, Stefan Metzmacher
vl: review+
Details
Patch for v4-0-test (1.58 KB, patch)
2014-07-17 07:58 UTC, Stefan Metzmacher
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Scholes 2014-02-25 12:55:06 UTC
I have 2 64 bit samba 4.1.5-7 installations on different sites (not linked) running in AD mode using the packages from Sernet. Both installed onto CentOS 6.5 fully updated as of 24/2/2014. The servers don't do any file transfers, have only one machine logging into the domain for admin purposes. Their sole purpose at the moment is authentication for our Zarafa mail servers. Their memory usage increases throughout the day on one server (8GB RAM) by as much as 4GB per day. I have to killall samba each night. There seems to be a memory leak somewhere but I am not an expert in diagnosing this problem.
Comment 1 Mike Scholes 2014-04-24 14:09:40 UTC
Upgraded to 4.1.7 and the problem persists.
Comment 2 Volker Lendecke 2014-04-24 14:14:01 UTC
A first try might be the output of

smbcontrol <pid> pool-usage

for such a big process. It might be large, so you might compress it before sending it to us.
Comment 3 Mike Scholes 2014-04-24 14:35:42 UTC
I get "No replies received"
Comment 4 Volker Lendecke 2014-04-24 14:41:11 UTC
Is your target process busy in the meantime? If so, you might want to increase the timeout with -t <something>
Comment 5 Mike Scholes 2014-04-24 14:45:23 UTC
Increased the log level to 3 but nothing helpful...

smbcontrol 18357 pool-usage -t 10
Registered MSG_REQ_POOL_USAGE
Registered MSG_REQ_DMALLOC_MARK and LOG_CHANGED
No replies received

smbcontrol 18357 pool-usage -t 100
Registered MSG_REQ_POOL_USAGE
Registered MSG_REQ_DMALLOC_MARK and LOG_CHANGED
No replies received
Comment 6 Mike Scholes 2014-04-24 14:48:15 UTC
The server is virtually idle

Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3829144k total,  3503692k used,   325452k free,   342760k buffers
Swap:  3964920k total,    85752k used,  3879168k free,  1465664k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18357 root      20   0  816m 302m 3508 S  0.0  8.1  14:03.88 samba
Comment 7 Volker Lendecke 2014-04-24 15:06:18 UTC
Crap, sorry, my fault. smbcontrol is using the wrong messaging mechanism for "samba". Need to find out whether this is possible for the AD DC.
Comment 8 Volker Lendecke 2014-04-24 15:09:42 UTC
Andrew, do you have an idea how to get the talloc hierarchy out of the AD controller at runtime?
Comment 9 Mike Scholes 2014-04-29 09:11:43 UTC
Any more on this?

Thanks.
Comment 10 Andrew Bartlett 2014-04-29 21:12:12 UTC
If you run samba with --leak-report-full then on exit, it will print out the full remaining talloc tree when you kill it.  This can be a very, very large amount of potentially sensitive data (depending on what is leaked).

The best way we have found to chase this down is to use XZ compression:
tar --xz -cf logs.tar.xz samba.log

You can then attach that as a private attachment here, or for greater security gpg encrypt it to my GPG key below: 

sec   4096R/C8021865 2012-07-04 [expires: 2018-07-03]
      Key fingerprint = 8160 9BF8 5375 BA5E 510C  CEA1 FE00 1D44 C802 1865
uid                  Andrew Bartlett <abartlet@abartlet.net>
uid                  Andrew Bartlett <abartlet@samba.org>
uid                  Andrew Bartlett <abartlet@ozlabs.org>
uid                  Andrew Bartlett <abartlet@catalyst.net.nz>
ssb   4096R/D899268D 2012-07-04
Comment 11 Mike Scholes 2014-05-02 10:14:56 UTC
Send you the log via email.
Comment 12 Andrew Bartlett 2014-05-02 19:01:18 UTC
Please fix this, or I can't help you.

Sorry,

                   The mail system

<mike@scholes-software.com>: host mail.scholes-software.com[87.81.241.169]
    said: 550-This mail has been classified as spam and has not been delivered,
    if you 550-believe this is an error please seek an alternative method of
    communication 550 and inform the recipient of the problem. (in reply to end
    of DATA command)
Comment 13 Mike Scholes 2014-05-05 19:14:48 UTC
OK I've whitelisted you.

Thanks.

Mike.
Comment 14 Mike Scholes 2014-05-08 10:02:06 UTC
Hi, any luck? I haven't received any emails yet.
Comment 15 Andrew Bartlett 2014-05-08 10:25:11 UTC
Honestly, after you bounced mail from me an Ricky who I asked to help you, we both gave up.

I'll try and look again after SambaXP, in between I need to pull some rabbits out of the hat.
Comment 16 Mike Scholes 2014-05-08 14:49:38 UTC
That's OK, restarting every night is working and we don't have any immediate plan for 24 hour working.
Comment 17 Mike Scholes 2014-05-23 10:54:20 UTC
Hi

Any chance for another look at this?
Comment 18 Volker Lendecke 2014-05-27 11:24:00 UTC
(In reply to comment #17)
> Hi
> 
> Any chance for another look at this?

Feel free to send it to me.
Comment 19 Mike Scholes 2014-05-29 07:11:27 UTC
Created attachment 9993 [details]
Results from memory script
Comment 20 Mike Scholes 2014-05-29 07:11:41 UTC
I've attached the results of the script that Ricky sent me. This has been running on my home server with only one user (me) using it to authenticate my Zarafa mail server.

I have also joined another Samba AD to our main server in work, same OS and it doesn't suffer from the memory leak. No authentication is directed at that machine yet. So I'm still thinking it's a winbind leak.
Comment 21 Mike Scholes 2014-05-30 11:48:58 UTC
Created attachment 9995 [details]
Results From Work
Comment 22 Mike Scholes 2014-05-30 11:50:13 UTC
Also added results for a day and a half from our work server. This has 60 users authenticating from our Zarafa server and again no files transfers.
Comment 23 Mike Scholes 2014-06-02 11:42:46 UTC
Thinking about this again the only authentication we are doing is using ldap/AD methods so perhaps nothing to do with winbind.
Comment 24 Mike Scholes 2014-06-06 13:02:02 UTC
Any thoughts?
Comment 25 Mike Scholes 2014-07-14 13:57:12 UTC
Has this one been abandoned now because the bug remains?
Comment 26 Volker Lendecke 2014-07-14 14:13:52 UTC
(In reply to comment #25)
> Has this one been abandoned now because the bug remains?

Well, you might want to talk to whoever sent you the script. This contains no useful information where the memory leak might be. And as according to the history in this bug it is really difficult to get decent logs, yes, this has been pretty much abandoned.

If you get us the logs that Andrew requested, we might say more. Alternatively, you might contact Andrew Bartlett directly for help.
Comment 27 Mike Scholes 2014-07-14 14:21:03 UTC
Andrew was sent the logs.
Comment 28 Volker Lendecke 2014-07-14 14:34:44 UTC
(In reply to comment #27)
> Andrew was sent the logs.

Then it's pretty much upon Andrew I believe. Sorry for the confusion.
Comment 29 Andrew Bartlett 2014-07-15 08:57:54 UTC
The logs as sent sadly don't have any useful information, except to cofirm the process size (we need to know *where* it is). 

Try getting the leaky process under gdb and running:

p talloc_report_full(0, stderr)

Andrew Bartlett
Comment 30 Volker Lendecke 2014-07-15 09:00:02 UTC
(In reply to comment #29)
> The logs as sent sadly don't have any useful information, except to cofirm the
> process size (we need to know *where* it is). 
> 
> Try getting the leaky process under gdb and running:
> 
> p talloc_report_full(0, stderr)
> 
> Andrew Bartlett

Andrew, maybe this is the time to add the smbcontrol pool-usage thingy to the "samba" program? That's much less intrusive and easier to use for users.

Volker
Comment 31 Mike Scholes 2014-07-15 10:15:10 UTC
Thanks.

Volker, I think you are right, I wouldn't know how to do what Andrew asked for. I'm just a regular IT Admin.

Mike
Comment 32 Stefan Metzmacher 2014-07-16 14:46:46 UTC
*** Bug 10519 has been marked as a duplicate of this bug. ***
Comment 33 Stefan Metzmacher 2014-07-16 14:47:45 UTC
(In reply to comment #32)
> *** Bug 10519 has been marked as a duplicate of this bug. ***

I think I've found the bug, the fix is on its way to master.
Comment 34 Stefan Metzmacher 2014-07-17 07:57:47 UTC
Created attachment 10117 [details]
Patch for v4-1-test
Comment 35 Stefan Metzmacher 2014-07-17 07:58:13 UTC
Created attachment 10118 [details]
Patch for v4-0-test
Comment 36 Karolin Seeger 2014-07-17 18:23:47 UTC
Pushed to autobuild-v4-[0|1]-test.
Comment 37 Karolin Seeger 2014-07-27 10:08:15 UTC
(In reply to comment #36)
> Pushed to autobuild-v4-[0|1]-test.

Pushed to both branches.
Closing out bug report.

Thanks!
Comment 38 Mike Scholes 2014-08-11 12:13:59 UTC
Just installed 4.1.11 and it has made no difference, the memory leak is still present.
Comment 39 Mike Scholes 2014-08-15 07:26:10 UTC
Is there anything in the latest version I can do to help diagnose this problem?
Comment 40 Volker Lendecke 2014-08-15 08:49:17 UTC
Well, you might want to find someone who can log into your machine and do the required analysis by printing the full talloc hierarchy with gdb.
Comment 41 Mike Scholes 2014-08-15 09:11:52 UTC
Unfortunately I don't know anyone that can do this. Anyone there can point to instructions?
Comment 42 Volker Lendecke 2014-08-15 09:41:47 UTC
(In reply to comment #41)
> Unfortunately I don't know anyone that can do this. Anyone there can point to
> instructions?

Andrew has sent instructions under comment 29.

Volker
Comment 43 Mike Scholes 2014-08-15 09:50:48 UTC
p talloc_report_full(0, stderr)

No symbol "talloc_report_full" in current context.
Comment 44 Andrew Bartlett 2016-07-30 01:35:49 UTC
Thanks to Volker's messaging work

smbcontrol <pid> pool-usage 

now works against the samba process.

If this still happens with Samba 4.5.0rc1 or any Samba version since 4.3, then please run that command and give us the output.

In the meantime, I'm going to mark this as NEEDINFO, because we can't really go any further with what we have.

Sorry,
Comment 45 Mike Scholes 2016-07-31 11:40:37 UTC
I'm using 4.2.13-SerNet-RedHat-22.el7 as 4.3 became a paid for product. I am not working for the company any more.  

smbcontrol <pid> pool-usage produces no output regardless of the pid I enter. 

I have recently migrated all my servers to xenserver and am now running samba on CentOS 7. I am not seeing any memory leak on this system even though I have the same systems authenticating against Samba.

For me the problem has gone away
Comment 46 Andrew Bartlett 2016-07-31 18:50:31 UTC
Thanks.  Closing as fixed (for want of a a better tag) as you are unable to still reproduce.