Bug 16065 - Memory leak in DRS when replication fails
Summary: Memory leak in DRS when replication fails
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.23.6
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2026-04-30 17:25 UTC by Andreas Hasenack
Modified: 2026-05-15 10:21 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Hasenack 2026-04-30 17:25:57 UTC
There seems to be a memory leak in the replication component of samba AD. We got a bug filed[1] against samba 4.19.5, and I also reproduced the leak with 4.23.6.

In comments #5[2], #6[3], and #7[4] a user goes into a bit more details, showing talloc structures and more closely where the leak is thought to be. Comment #7[4] also has an experimental patch[5]:

Index: samba-4.22.3+dfsg/source4/librpc/rpc/dcerpc_connect.c
===================================================================
--- samba-4.22.3+dfsg.orig/source4/librpc/rpc/dcerpc_connect.c
+++ samba-4.22.3+dfsg/source4/librpc/rpc/dcerpc_connect.c
@@ -824,6 +824,8 @@ static void continue_connect(struct comp
 	pc.interface    = s->table;
 	pc.creds        = s->credentials;
 	pc.resolve_ctx  = lpcfg_resolve_context(s->lp_ctx);
+	/* cspert - link resolve_ctx to context c so that it doesn't leak */
+	talloc_steal(c, pc.resolve_ctx);
 
 	transport = dcerpc_binding_get_transport(s->binding);
 

The way I reproduced it was to deploy a DC as usual (samba-tool provision ...), and later provision a second DC joining the first one with just "samba-tool domain join example.com DC".

To trigger the leak, after I made sure replication was working (by creating users, removing them, checking them on the second DC), I just stopped the second DC, triggering repeated errors on the first one:

==== INBOUND NEIGHBORS ====

CN=Configuration,DC=example,DC=internal
        Default-First-Site-Name\R-DC2 via RPC
                DSA object GUID: 0f87281a-a969-42de-af80-dc39faf7d9da
                Last attempt @ Thu Apr 30 17:14:18 2026 UTC failed, result 1225 (WERR_CONNECTION_REFUSED)
                338 consecutive failure(s).
                Last success @ Wed Apr 29 13:04:15 2026 UTC

DC=DomainDnsZones,DC=example,DC=internal
        Default-First-Site-Name\R-DC2 via RPC
                DSA object GUID: 0f87281a-a969-42de-af80-dc39faf7d9da
                Last attempt @ Thu Apr 30 17:14:18 2026 UTC failed, result 1225 (WERR_CONNECTION_REFUSED)
                338 consecutive failure(s).
                Last success @ Wed Apr 29 13:04:15 2026 UTC

CN=Schema,CN=Configuration,DC=example,DC=internal
        Default-First-Site-Name\R-DC2 via RPC
                DSA object GUID: 0f87281a-a969-42de-af80-dc39faf7d9da
                Last attempt @ Thu Apr 30 17:14:18 2026 UTC failed, result 1225 (WERR_CONNECTION_REFUSED)
                338 consecutive failure(s).
                Last success @ Wed Apr 29 13:04:15 2026 UTC

DC=ForestDnsZones,DC=example,DC=internal
        Default-First-Site-Name\R-DC2 via RPC
                DSA object GUID: 0f87281a-a969-42de-af80-dc39faf7d9da
                Last attempt @ Thu Apr 30 17:14:18 2026 UTC failed, result 1225 (WERR_CONNECTION_REFUSED)
                338 consecutive failure(s).
                Last success @ Wed Apr 29 13:04:15 2026 UTC

DC=example,DC=internal
        Default-First-Site-Name\R-DC2 via RPC
                DSA object GUID: 0f87281a-a969-42de-af80-dc39faf7d9da
                Last attempt @ Thu Apr 30 17:14:18 2026 UTC failed, result 1225 (WERR_CONNECTION_REFUSED)
                338 consecutive failure(s).
                Last success @ Wed Apr 29 13:04:15 2026 UTC

And surely enough, slowly, the amount of memory consumed by the drepl process started to increase.

It went from 175880[VSZ] 69212[RSS] to 237552[VSZ] 130084[RSS] after a day, and right now on an idle container is the process consuming most CPU and MEM.

The bug report shows that the problem went unnoticed until OOM killed the process, so even if it's a slow leak, it eventually becomes a problem.

I checked the samba bugzilla for similar issues, both open and closed, but didn't spot any, so I'm filing this bug.


1. https://bugs.launchpad.net/ubuntu/+source/samba/+bug/2121024
2. https://bugs.launchpad.net/ubuntu/+source/samba/+bug/2121024/comments/5
3. https://bugs.launchpad.net/ubuntu/+source/samba/+bug/2121024/comments/6
4. https://bugs.launchpad.net/ubuntu/+source/samba/+bug/2121024/comments/7
5. https://launchpadlibrarian.net/819267562/fix-dcerpc-connect-memleak.patch