Bug 6716 - No IP takeover to non-recmaster nodes
No IP takeover to non-recmaster nodes
Status: RESOLVED INVALID
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb
unspecified
x64 Linux
: P3 normal
: ---
Assigned To: Michael Adam
Michael Adam
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-13 04:38 UTC by Christoph Schmidt
Modified: 2016-08-10 09:11 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Schmidt 2009-09-13 04:38:32 UTC
Hello,

maybe I found a problem: Only the Recovery Master seems to be able to takeover the public IP from other nodes.

Details:
I forced the recmaster node to change to state "unhealthy" by
 a) killing winbindd
 b) unplugging the network cable
In none of this two cases the puplic IP was taken over to the second node (non recmaster). When doing the same at the second node, the IP was taken over to the first node as expected. Then I switched the recmaster to the second node and repeated the tests with the same results. I could reproduce this behaviour in two very different environments.

Test environment 1:
- 2 nodes x86_64 based
  + node 1 with RHEL 5.3
  + node 2 with SLES 10 SP2
- IBM GPFS 3.2.1.11
- Samba 3.2.14 (from Sernet, but recompiled with clustering support)
- CTDB 1.0.88/1.0.86/1.0.84

Test environment 2 (near production):
- 2 nodes IBM p5 based (RHEL 5.3 ppc64)
- IBM GPFS 3.2.1.12
- Samba 3.2.14 (from Sernet, but recompiled with clustering support)
- CTDB 1.0.88

/etc/sysconfig/ctdb:
CTDB_RECOVERY_LOCK="/gpfs/fs1/ctdb-x86.lck"
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
CTDB_MANAGES_SAMBA=yes
CTDB_MANAGES_WINBIND=yes
CTDB_NODES=/etc/ctdb/nodes
CTDB_DBDIR=/var/lib/ctdb
CTDB_DBDIR_PERSISTENT=/var/lib/ctdb/persistent
CTDB_EVENT_SCRIPT_DIR=/etc/ctdb/events.d
CTDB_LOGFILE=/var/log/ctdb/log.ctdb
CTDB_DEBUGLEVEL=2

/etc/ctdb/nodes:
192.168.136.4
192.168.136.7

/etc/ctdb/public_addresses:
192.168.136.152/24 bond1
192.168.136.153/24 bond1

The configuration files are identical on all cluster nodes. 

Please let me know, if I could provide further informations (logs, traces, etc.).

Christoph
Comment 1 Christoph Schmidt 2009-09-22 14:29:42 UTC
Changed state to "invalid" because of improper combination of CTDB (1.0.88) and Samba (3.2.14).
Comment 2 Michael Adam 2009-09-22 14:41:41 UTC
Hi - Thanks for taking the time to report this.

This bug report is not invalid:
The action (b) taken - unplugging a network cable - does not depend on samba.
ctdb should in this case take car of failing over IPs in that case.

Cheers - Michael
Comment 3 Martin Schwenke 2016-08-10 09:11:42 UTC
I think the problem here was "unplugging *the* network cable".  Recommended networking configuration is to have a private network so CTDB can communicate between nodes and a public network.  In this case it looks like only one network was used.  If this network is unplugged then the recovery master can't communicate with other nodes...

So, I'm going to mark this old bug as invalid...  :-)