CTDB seems to trip Cicso's Endpoint Protection - this protection prevents an IP from showing up on multiple ports.
During an IP failover, ctdb sends gARPs and "tickles".. this causes the Cisco Endpoint Protection to lock out the NIC.
The issue seems to be that while the gARP come out of the nics involved in the IP failover, the "TICKLE"s come out on the management nic (default route) and seem to be a trigger for Cisco. (we have tcpdumped to see)
Is this a known behavior? is there any way to adjust the tickle behaviour?
Can you please check out the POLICY ROUTING section in the ctdb(7) manual page? Also the 13.per_ip_routing section in the ctdb-script.options(5) manual page? I hope that this lets you configure sending the tickle ACKs via correct interfaces. The main thing will be to set CTDB_PER_IP_ROUTING_CONF to point to a configuration file and populating that file. If you already have custom routing in place then you may need to tweak some of the other settings to avoid collisions.
Sending gARPs involves constructing a much lower level packet, including details like the interface name. However, the tickle ACKs are higher level and are routed according to the system's IP routing configuration.
Please let me know if you continue to have problems or if the above makes things work as expected.
Note that I have been waiting for the day that a clever enough network security device is seen to block these packets. One person's fail-over feature is another person's denial of service tool... ;-)
What version of Samba/CTDB are you using?
I've configured a 2 node test environment to reproduce this issue.
The networking is configured as follows:
[root@pxc_cluster1_node1 /]# ip addr
1171: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 02:42:a9:fe:64:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.100.1/24 scope global eth0
valid_lft forever preferred_lft forever
1172: eth1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 02:42:0a:80:09:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.122.200/24 scope global eth1
valid_lft forever preferred_lft forever
1173: eth2@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 02:42:0a:80:07:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.42.3.1/24 brd 10.42.3.255 scope global eth2
valid_lft forever preferred_lft forever
[root@pxc_cluster1_node1 /]# ip route
default via 192.168.122.1 dev eth1
10.42.3.0/24 dev eth2 proto kernel scope link src 10.42.3.1
169.254.100.0/24 dev eth0 proto kernel scope link src 169.254.100.1
192.168.122.0/24 dev eth1 proto kernel scope link src 192.168.122.200
[root@pxc_cluster1_node1 /]# ip rule
0: from all lookup local
10000: from 10.42.3.1 lookup ctdb.10.42.3.1
32766: from all lookup main
32767: from all lookup default
[root@pxc_cluster1_node1 /]# ip route show table ctdb.10.42.3.1
default via 10.42.3.250 dev eth2
10.42.3.0/24 dev eth2
[root@pxc_cluster1_node1 /]# ctdb ip -v
Public IPs on node 0
10.42.3.1 node active[eth2] available[eth2] configured[eth2]
10.42.3.2 node active available[eth2] configured[eth2]
When running a packet capture during a failover, I see the TCP tickles on interface eth1, instead of the desired eth2.
Is it possible that the TCP tickle gets send before the policy routing rule is built? Therefore there is no rule for that source IP and it gets routed to the default table?
Where could I look to confirm this theory?
We're running Samba/CTDB 4.10.14.
The tickle ACK should always be sent after the routing is set up.
The routing is set up by the "takeip" event in 13.per_ip_routing.script. After all of the "takeip" events complete for enabled scripts, control returns to ctdbd and ctdb_do_takeip_callback() (from ctdb/server/ctdb_takeover.c) is called. The call chain goes:
You could try to check the timing by adding (say):
echo "Adding route"
just before the "ip route add" command in add_routing_for_ip() in 13_per_ip_routing.script. You would have to be using high-resolution timestamps in your logs to be able to compare times. The timing may not be completely accurate because I think all of the output from an event script is gathered and logged on completion. However, you would still expect this to happen before the tickle ACK is sent (you'd check the time of this from your packet capture).
The routing looks sane. You could check that it behaves as expected via something like:
ip route get <some-client-IP> from 10.42.3.1
You could even add a variation of that instead of the above echo to the event script, but after the "ip route add" command.
There are potential corner cases where the route is not created in "takeip" but is created later in "ipreallocated". This really shouldn't come into play in this case. I also don't think it should happen anymore, since we set promote_secondaries on relevant interfaces before deleting IP addresses.
I assume the tickle ACK has the desired source IP address in it?
Please let us know what you find...
Created attachment 16162 [details]
TCP Tickle packet capture 1
(In reply to Martin Schwenke from comment #4)
Using your suggested approach, it does look like the routing is successfully being built before the TCP tickle is sent:
2020/08/03 14:09:18.326341 ctdb-eventd: 13.per_ip_routing: Adding route
2020/08/03 14:09:18.326383 ctdb-eventd: 13.per_ip_routing: 10.42.2.129 from 10.42.3.1 via 10.42.3.250 dev eth2
2020/08/03 14:09:18.326391 ctdb-eventd: 13.per_ip_routing: cache
I'm not sure if the start of the TCP tickle would be the "TCP Window Update" (14:09:18.349696983) or the "TCP Dup ACK" (14:09:19.454640091) - either way, both those timestamps are after the timestamp in the log. tcp_tickle_timing_1.txt is a filtered export of the interesting frames.
I can confirm the desired source IP is in the TCP tickle.
I notice that takeip event causes "$CTDB gratarp ..." to be called. But the updateip event calls the tickle_tcp_connections() shell function. I make the leap in assumption that the gratarp sub-command calls ctdb_control_send_arp() (in ctdb/server/ctdb_takeover.c). This function appears to be responsible for sending TCP tickles as well as the gARPs.
What's the difference between the TCP tickles in the ctdb_control_send_arp() C function and the tickle_tcp_connections() shell function?
(In reply to Dan Foster from comment #6)
> I notice that takeip event causes "$CTDB gratarp ..." to be called. But the
> updateip event calls the tickle_tcp_connections() shell function. I make the
> leap in assumption that the gratarp sub-command calls ctdb_control_send_arp()
> (in ctdb/server/ctdb_takeover.c). This function appears to be responsible for
> sending TCP tickles as well as the gARPs.
"ctdb gratarp" causes ctdb_control_send_gratious_arp() from ctdb_takeover.c to be called. This just sends a gratuitous ARP (but no tickle ACK).
The "updateip" event is only used for failover between interfaces on the same node. For example, you can have something like this in the public_addresses file:
> What's the difference between the TCP tickles in the ctdb_control_send_arp() C
> function and the tickle_tcp_connections() shell function?
tickle_tcp_connections() is only used for "updateip". It can gather the current TCP connections to the address being used from the current node and sends tickle ACKs for them. Tickle ACKs are only sent once.
ctdb_control_send_arp() uses connection information gathered by ctdbd for SMB connections and by the monitor event in 60.nfs.script for NFS connections. The connection data is transferred between nodes so that a takeover node can send tickle ACKs even when the original node has gone away (but clients still think the connections are active... until TCP timeout occurs). In this case tickle ACKs are sent 3 times.
I have an experimental branch where I have reimplemented all of this logic to use a connection tracking daemon, which can trigger gARPs and tickle ACKs from event scripts. It requires some fundamental infrastructure changes in CTDB before it can be integrated.
However, none of that explains the problem. That is explained in the next comment... ;-)
I tested this using the ctdb_killtcp executable (found in .../libexec/ctdb/ctdb_killtcp - note that the interface argument is ignored on Linux and is only used for packet capture on some other platforms). This uses the same packet construction and sending code as ctdb_control_send_arp() in the daemon. I confirmed that source routing does not seem to be taken into account when routing the packet. :-(
The Linux raw(7) manual page says:
If IP_HDRINCL is specified and the IP header has a nonzero destination
address, then the destination address of the socket is used to route
the packet. When MSG_DONTROUTE is specified, the destination address
should refer to a local interface, otherwise a routing table lookup is
done anyway but gatewayed routes are ignored.
I think this is trying to say that only the destination address is used to route the packet and the source address is not taken into account.
I thought we had tested this but perhaps a slight variant was tested. The Linux kernel's handling of routing for packets sent via raw sockets doesn't look to have changed in this regard.
I think that after all these years you are the first person to clearly identify this as a bug. I have seen previous reports of tickle ACKs not being delivered but have never had clear data to show that they might not be routed as expected. So thanks for chasing this!
The bad news is that I can't think of a fix. When I get time I will take a closer look at how the Linux kernel routes packets sent from raw sockets (net/ipv4/raw.c:raw_sendmsg() - the calls to flowi4_init_output() and ip_route_output_flow() look most relevant) to see if this confirms what raw(7) says.
Sorry for sending you off into the policy routing stuff. I thought that would make it "just work".
Hmmm... dumb question? Can you work around this by using some network routes to tell the nodes how to route to the client networks? I guess this depends on how many client networks you have! I just tested this using a host route for a destination IP address and the tickle ACK packets went out via the designated interface.
(In reply to Martin Schwenke from comment #8)
Thanks for taking the time to narrow down the problem and show it is a bug.
Unfortunately, I suspect the problem is outside my skill-set the help drive this forward. Let me know if you find any time to look at it.
Thanks for suggested workaround, I will have to go and understand the client network topology more to see if this is possible.
(In reply to Dan Foster from comment #9)
I am clearly unaware of your network topology but I wonder if you could flip the workaround. That is, could you change the default route so that it points to the client networks (and perhaps other infrastructure)? I'm not sure what is beyond 192.168.122.1 on eth1, but I wonder if that could handled by network routes.
I'll put this back into NEEDINFO state until I find out how you progressed with a workaround. Once I know more I'll document the limitation where tickle ACKs use the default route and suggest workarounds. It is always worth basing workarounds on real use cases rather than whatever a developer can dream up... :-)