Bug 15935 - Crash in ctdbd on failed updateip
Summary: Crash in ctdbd on failed updateip
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.21.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-10-15 21:16 UTC by Martin Schwenke
Modified: 2025-11-07 13:24 UTC (History)
1 user (show)

See Also:


Attachments
Patch for v4-23-test (7.63 KB, patch)
2025-10-17 08:25 UTC, Martin Schwenke
anoopcs: review+
Details
Patch for v4-22-test (7.63 KB, patch)
2025-10-17 08:26 UTC, Martin Schwenke
anoopcs: review+
Details
Patch for v4-21-test (12.63 KB, patch)
2025-10-17 08:28 UTC, Martin Schwenke
anoopcs: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2025-10-15 21:16:42 UTC
> 2025-10-13T18:41:44.211218-03:00 adm-gw1 ctdbd[479406]: Node became HEALTHY. Ask recovery master to reallocate IPs
> 2025-10-13T18:41:44.732792-03:00 adm-gw1 ctdb-recoverd[479490]: Unassigned IP 192.168.45.235 can be served by this node
> 2025-10-13T18:41:44.732964-03:00 adm-gw1 ctdb-recoverd[479490]: IP 192.168.45.235 incorrectly on an interface    

The IP address isn't assigned to this node, but ctdbd uses bind(2) to check if the IP address is local (assuming ip_nonlocal_bind=0) and it can bind, so (considering the assumption) the address must be local.

> 2025-10-13T18:41:44.732987-03:00 adm-gw1 ctdb-recoverd[479490]: Trigger takeoverrun
> 2025-10-13T18:41:44.733160-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover run starting
> 2025-10-13T18:41:44.769369-03:00 adm-gw1 ctdbd[479406]: ../../ctdb/server/ctdb_takeover.c:797 Doing updateip for IP 192.168.45.235 already on an interface
> 2025-10-13T18:41:44.769448-03:00 adm-gw1 ctdbd[479406]: Update of IP 192.168.45.235/16 from interface __none__ to ens18    

ctdbd decides that since the address is local, it has to do an "updateip" instead of a "takeip" to make the intended change.

> 2025-10-13T18:41:44.788619-03:00 adm-gw1 ctdb-eventd[479407]: 10.interface: ERROR: Unable to determine interface for IP 192.168.45.235
> 2025-10-13T18:41:44.788689-03:00 adm-gw1 ctdb-eventd[479407]: updateip event failed
> 2025-10-13T18:41:44.788847-03:00 adm-gw1 ctdbd[479406]: Failed update of IP 192.168.45.235 from interface __none__ to ens18    

However, the 10.interface event script can't find an interface with the IP address assigned, so it fails.

> 2025-10-13T18:41:44.788945-03:00 adm-gw1 ctdbd[479406]: ===============================================================
> 2025-10-13T18:41:44.788966-03:00 adm-gw1 ctdbd[479406]: INTERNAL ERROR: Signal 11: Segmentation fault in  () () pid 479406 (4.21.3)
> 2025-10-13T18:41:44.788985-03:00 adm-gw1 ctdbd[479406]: If you are running a recent Samba version, and if you think this problem is not yet 
> fixed in the latest versions, please consider reporting this bug, see 
> https://wiki.samba.org/index.php/Bug_Reporting
> 2025-10-13T18:41:44.789003-03:00 adm-gw1 ctdbd[479406]: ===============================================================
> 2025-10-13T18:41:44.789016-03:00 adm-gw1 ctdbd[479406]: PANIC (pid 479406): Signal 11: Segmentation fault in 4.21.3
> 2025-10-13T18:41:44.789489-03:00 adm-gw1 ctdbd[479406]: BACKTRACE: 21 stack frames:    

The stack trace isn't useful but, at a guess, it crashes here:

		/*
		 * All we can do is reset the old interface
		 * and let the next run fix it
		 */
		ctdb_vnn_unassign_iface(ctdb, state->vnn);
		state->vnn->iface = state->old;
		state->vnn->iface->references++;

This is because state->old is NULL.

That bug is still there in subsequent versions.  However, it should no longer happen on Linux (and possibly other platforms) in CTDB >= 4.22 because the check for an IP address no longer (only) depends on bind(2).

A fix is required in ctdb_do_updateip_callback().  However, this would almost certainly result in ctdbd repeatedly trying the failed updateip and banning the node involved.  So, a workaround should also be applied to 10.interface.script.
Comment 1 Samba QA Contact 2025-10-17 06:29:03 UTC
This bug was referenced in samba master:

d08f9ebd2755671d30c73a4e979029d353848828
a98ffb96efc4a9ea2110c654860a4ba3896ab3d5
01d3d25c0139a3dd49a2322a9416698d08733377
0e73781bf84a1e8e596d8be3f55eeb5f8f927990
Comment 2 Martin Schwenke 2025-10-17 08:25:14 UTC
Created attachment 18760 [details]
Patch for v4-23-test

Original 4 commit cherry-pick cleanly.  Compiles cleanly.  Relevant tests pass.
Comment 3 Martin Schwenke 2025-10-17 08:26:39 UTC
Created attachment 18761 [details]
Patch for v4-22-test

Original 4 commit cherry-pick cleanly.  Compiles cleanly.  Relevant tests pass.  

Functionally identical to the patch for v4-23-test.
Comment 4 Martin Schwenke 2025-10-17 08:28:12 UTC
Created attachment 18762 [details]
Patch for v4-21-test

Requires an additional reformatting commit from master so the 4 commits of interest cherry-pick cleanly.  Compiles cleanly.  Relevant tests pass.
Comment 5 Anoop C S 2025-10-17 09:07:24 UTC
Re-assigning to Jule for inclusion in 4.23, 4.22 and 4.21, thanks.
Comment 6 Jule Anger 2025-10-22 09:51:41 UTC
Pushed to autobuild-v4-{23,22,21}-test.
Comment 7 Samba QA Contact 2025-10-22 11:17:03 UTC
This bug was referenced in samba v4-21-test:

cb080ee6277137ffed14da3dc42228eb8f4ee084
605972c5dd7c1133b76b6c45bbe8fe8f72503ee5
604e1ab09c6a187eabf016a877dc73f1f948ccab
93152dcbc7d1ada4323c535cbf5b05d9bb5f1064
Comment 8 Samba QA Contact 2025-10-27 14:32:03 UTC
This bug was referenced in samba v4-22-test:

36b489ce2ac321f9bbddd8150bb94d663e4d44cb
0af32c6b70a59314e04b2e9a7668d3b2f1d3a65c
38938918715ef77f70b469a55e3217c592ec478f
c78caf6c40e6aab31c32198904e87b386060cdb1
Comment 9 Samba QA Contact 2025-11-03 14:57:11 UTC
This bug was referenced in samba v4-23-test:

1b084f149a06a7c9cee651c43ed7bfdb9111f8b6
c49aa8718a35f313254728f8301ef53e6bc0ebfd
8f1032fb959d903290e120f25c8abcd1a9165fec
b6cb1d223db25886d3c4357e0b29ca461996a89d
Comment 10 Jule Anger 2025-11-04 08:16:21 UTC
Closing out bug report.

Thanks!
Comment 11 Samba QA Contact 2025-11-07 13:24:24 UTC
This bug was referenced in samba v4-23-stable (Release samba-4.23.3):

1b084f149a06a7c9cee651c43ed7bfdb9111f8b6
c49aa8718a35f313254728f8301ef53e6bc0ebfd
8f1032fb959d903290e120f25c8abcd1a9165fec
b6cb1d223db25886d3c4357e0b29ca461996a89d