Bug 11211 - [CTDB] Public Address not responding to ARP requests
Summary: [CTDB] Public Address not responding to ARP requests
Status: RESOLVED INVALID
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: 2.5.1
Hardware: x64 Linux
: P5 normal
Target Milestone: ---
Assignee: Amitay Isaacs
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-11 10:13 UTC by Ben Alexander
Modified: 2016-09-12 09:07 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Alexander 2015-04-11 10:13:17 UTC
Strange bug is preventing one of my nodes (10.0.20.21) from accessing the public IP (10.0.20.30) of the CTDB cluster. The arp request for the public address appears to be ignored. The other two nodes have the correct ARP entry
The three nodes are identical and were based on the instructions found here: http://community.redhat.com/blog/2014/11/up-and-running-with-ovirt-3-5-part-two/ (with minor changes to test ovirt 3.5.2 on CentOS 7.1)

_ISSUE_
# ip -s neighbour list
10.0.20.22 dev ovirtmgmt lladdr c0:3f:d5:63:83:fa ref 1 used 4025/0/4025 probes 4 REACHABLE
10.0.20.23 dev ovirtmgmt lladdr c0:3f:d5:64:0a:0b ref 1 used 4025/0/30 probes 4 REACHABLE
10.0.20.30 dev ovirtmgmt  used 3206/3245/3203 probes 6 FAILED

_ENVIRONMENT_
# ctdb status
Number of nodes:3
pnn:0 10.0.20.21       OK (THIS NODE)
pnn:1 10.0.20.22       OK
pnn:2 10.0.20.23       OK
...

# cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core) 

# ctdb version
CTDB version: 2.5.1

_TROUBLESHOOTING_
- SELinux and firewalld have been disabled on all three nodes during testing
- I have rotated the public address across the three nodes, same result (even when on this troublesome host).
- # tcpdump -i ovirtmgmt -vv -nn arp
tcpdump: listening on ovirtmgmt, link-type EN10MB (Ethernet), capture size 65535 bytes
19:51:43.225065 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.20.30 tell 10.0.20.21, length 28
19:51:44.227060 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.20.30 tell 10.0.20.21, length 28
19:51:52.223422 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.20.30 tell 10.0.20.21, length 28
19:51:53.225062 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.20.30 tell 10.0.20.21, length 28
19:51:54.227080 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.20.30 tell 10.0.20.21, length 28
- # ip -s neighbour flush dev ovirtmgmt
- # ip neighbour delete 10.0.20.30 dev ovirtmgmt
- # shutdown -r now

Any advice would be great.
Comment 1 Amitay Isaacs 2015-04-21 06:09:35 UTC
Looks like you are using the same subnet for CTDB's management IP addresses (10.0.20.21/22/23) and public IP addresses (10.0.20.30).

Usually management network should be separate from public IP network.  Are you using same Ethernet interface or different Ethernet interfaces?

Some more information about your CTDB configuration would be useful:
 - /etc/ctdb/nodes
 - public addresses file
 - CTDB configuration
Comment 2 Martin Schwenke 2016-09-12 09:07:24 UTC
Required information not received from submitter, so closing as invalid.

Method for resolving this would be:

1. Use "ctdb ip" to confirm that CTDB thinks it is hosting the
   address.

2. Use "ip addr show" to confirm that the address is actually on the
   expected network interface.

If both of these are as expected then there's nothing CTDB can do.
The network stack should respond to ARPs.

If one or both are not as expected then would need to look at logs and
do other debugging to determine why...