Bug 15982 - Exhaustion of memory on one node causes all VIPs been dropped
Summary: Exhaustion of memory on one node causes all VIPs been dropped
Status: NEEDINFO
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.12.7
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Martin Schwenke
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2026-01-23 09:41 UTC by Peng Sun
Modified: 2026-01-25 04:01 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peng Sun 2026-01-23 09:41:59 UTC
Due to the fact that other services were deployed on the CTDB cluster nodes, there was a memory leak issue, which led to the exhaustion of memory on that node. And then, all nodes discard CTDB VIPs.

Reproduction steps:
1. simulate memory exhaustion with testing tool stress-ng
2. Periodically observe the result of command 'ctdb status'
Comment 1 Martin Schwenke 2026-01-25 04:01:20 UTC
Samba 4.12.7 is no longer supported.  Please only report bugs against supported versions.

However...

What would you like CTDB to do if a node runs out of memory?

If you check https://ctdb.samba.org/manpages/ctdb-script.options.5.html, you can see that CTDB has a script option to control what a node should do if it runs out of memory (or disk space).  The default for memory usage is:

  CTDB_MONITOR_MEMORY_USAGE=80

This means a warning will be logged if memory usage reaches 80%.

With this setting, the behaviour is somewhat random when memory is exhausted.  If CTDB gets stuck in recovery for a long time because of no memory, then it will drop all IPs.  Nodes may crash due to unchecked memory allocations - if you find one of these then please report it as a bug.  Can you please check your logs to see why all IPs are dropped?

If you set:

  CTDB_MONITOR_MEMORY_USAGE=80:99

Then a warning will be logged at 80% memory usage and the node will be marked unhealthy if memory usage reaches 99%.

With this configuration, you have more control about what happens when a single node runs out of memory (or goes close).  To ensure that recovery/failover can occur, you could set the unhealthy limit lower than 99%.

Please try different values of this variable with a supported version of Samba and let me know if the behaviour is improved.

Thanks...