I have configured ocfs2 on a 2-node sles10-sp2-cluster. The recover-lock-file is on the ocfs2-mounted /data-prod. It's /data-prod/ctdb/reclock.lock, which is accessible from both nodes. Startin ctdbd on the first node "leenderts" goes fine. But when starting it on the second node "nodtvedt", I get this error in log.ctdb on "leenderts": 2009/10/02 23:04:13.073230 [27137]: Taking out recovery lock from recovery daemon 2009/10/02 23:04:13.073278 [27137]: Take the recovery lock 2009/10/02 23:04:13.073319 [27137]: Recovery lock taken successfully 2009/10/02 23:04:13.073360 [27137]: Recovery lock taken successfully by recovery daemon 2009/10/02 23:04:13.181695 [27137]: ctdb_control error: 'managed to lock reclock file from inside daemon' 2009/10/02 23:04:13.181728 [27137]: ctdb_control error: 'managed to lock reclock file from inside daemon' 2009/10/02 23:04:13.181741 [27137]: Async operation failed with ret=-1 res=-1 opcode=16 2009/10/02 23:04:13.181752 [27137]: Async wait failed - fail_count=1 2009/10/02 23:04:13.181801 [27137]: server/ctdb_recoverd.c:282 Unable to set recovery mode. Recovery failed. 2009/10/02 23:04:13.181813 [27137]: server/ctdb_recoverd.c:1396 Unable to set recovery mode to normal on cluster They keep repeeting about once a second or so. On the other node, nodtvedt, I get: 2009/10/02 22:45:15.650092 [20267]: ERROR: recovery lock file /data-prod/ctdb/reclock.lock not locked when recovering! in log.ctdb, repeating about once per second. Shouldn't ocfs2-1.4 support having the recovery-lock-file ? I have not yet configured samba, as I won't ctdb to work first.
I found out it's version 1.0.89 of ctdb, not 1.0.71 as I first reported.
Hi Erik, thanks for your bug report! Up front, I have to say, that I have not yet run ocfs2 myself. (In reply to comment #0) > I have configured ocfs2 on a 2-node sles10-sp2-cluster. > The recover-lock-file is on the ocfs2-mounted /data-prod. It's > /data-prod/ctdb/reclock.lock, which is accessible from both nodes. > Startin ctdbd on the first node "leenderts" goes fine. > But when starting it on the second node "nodtvedt", I get this error in > log.ctdb on "leenderts": > > 2009/10/02 23:04:13.073230 [27137]: Taking out recovery lock from recovery > daemon > 2009/10/02 23:04:13.073278 [27137]: Take the recovery lock > 2009/10/02 23:04:13.073319 [27137]: Recovery lock taken successfully > 2009/10/02 23:04:13.073360 [27137]: Recovery lock taken successfully by > recovery daemon > 2009/10/02 23:04:13.181695 [27137]: ctdb_control error: 'managed to lock > reclock file from inside daemon' > 2009/10/02 23:04:13.181728 [27137]: ctdb_control error: 'managed to lock > reclock file from inside daemon' > 2009/10/02 23:04:13.181741 [27137]: Async operation failed with ret=-1 res=-1 > opcode=16 > 2009/10/02 23:04:13.181752 [27137]: Async wait failed - fail_count=1 > 2009/10/02 23:04:13.181801 [27137]: server/ctdb_recoverd.c:282 Unable to set > recovery mode. Recovery failed. > 2009/10/02 23:04:13.181813 [27137]: server/ctdb_recoverd.c:1396 Unable to set > recovery mode to normal on cluster > > > They keep repeeting about once a second or so. > > On the other node, nodtvedt, I get: > > 2009/10/02 22:45:15.650092 [20267]: ERROR: recovery lock file > /data-prod/ctdb/reclock.lock not locked when recovering! > > in log.ctdb, repeating about once per second. > > Shouldn't ocfs2-1.4 support having the recovery-lock-file ? > I have not yet configured samba, as I won't ctdb to work first. Ok, you have a problem with the posix fcntl byte range lock support on your file system > I have not yet configured samba, as I won't ctdb to work first. Right, this is best. Did you run the ping_pong test? see: http://wiki.samba.org/index.php/Ping_pong The ping_pong program is included in the ctdb package in newer versions. What is your kernel version? I think the kernel version is critical for telling whether sufficient support for posix locks on ocfs2 is available on your system. Jim: Can you give details on that? Cheers - Michael
The sles10-sp2 level of ocfs2 does not support cluster-aware posix locking. This was added later in the 1.4 series. On sles, it's not available until sles11...with HAE.