Due to the fact that other services were deployed on the CTDB cluster nodes, there was a memory leak issue, which led to the exhaustion of memory on that node. And then, all nodes discard CTDB VIPs. Reproduction steps: 1. simulate memory exhaustion with testing tool stress-ng 2. Periodically observe the result of command 'ctdb status'
Samba 4.12.7 is no longer supported. Please only report bugs against supported versions. However... What would you like CTDB to do if a node runs out of memory? If you check https://ctdb.samba.org/manpages/ctdb-script.options.5.html, you can see that CTDB has a script option to control what a node should do if it runs out of memory (or disk space). The default for memory usage is: CTDB_MONITOR_MEMORY_USAGE=80 This means a warning will be logged if memory usage reaches 80%. With this setting, the behaviour is somewhat random when memory is exhausted. If CTDB gets stuck in recovery for a long time because of no memory, then it will drop all IPs. Nodes may crash due to unchecked memory allocations - if you find one of these then please report it as a bug. Can you please check your logs to see why all IPs are dropped? If you set: CTDB_MONITOR_MEMORY_USAGE=80:99 Then a warning will be logged at 80% memory usage and the node will be marked unhealthy if memory usage reaches 99%. With this configuration, you have more control about what happens when a single node runs out of memory (or goes close). To ensure that recovery/failover can occur, you could set the unhealthy limit lower than 99%. Please try different values of this variable with a supported version of Samba and let me know if the behaviour is improved. Thanks...