Bug 6777 - unable to handle recovery-lock-file properly on ocfs2-1.4
unable to handle recovery-lock-file properly on ocfs2-1.4
Status: RESOLVED INVALID
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb
unspecified
x64 Linux
: P3 regression
: ---
Assigned To: Michael Adam
Michael Adam
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-02 16:15 UTC by Erik Sørnes
Modified: 2009-11-24 07:37 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Sørnes 2009-10-02 16:15:04 UTC
I have configured ocfs2 on a 2-node sles10-sp2-cluster.
The recover-lock-file is on the ocfs2-mounted /data-prod. It's /data-prod/ctdb/reclock.lock, which is accessible from both nodes.
Startin ctdbd on the first node "leenderts" goes fine.
But when starting it on the second node "nodtvedt", I get this error in log.ctdb on "leenderts":

2009/10/02 23:04:13.073230 [27137]: Taking out recovery lock from recovery daemon
2009/10/02 23:04:13.073278 [27137]: Take the recovery lock
2009/10/02 23:04:13.073319 [27137]: Recovery lock taken successfully
2009/10/02 23:04:13.073360 [27137]: Recovery lock taken successfully by recovery daemon
2009/10/02 23:04:13.181695 [27137]: ctdb_control error: 'managed to lock reclock file from inside daemon'
2009/10/02 23:04:13.181728 [27137]: ctdb_control error: 'managed to lock reclock file from inside daemon'
2009/10/02 23:04:13.181741 [27137]: Async operation failed with ret=-1 res=-1 opcode=16
2009/10/02 23:04:13.181752 [27137]: Async wait failed - fail_count=1
2009/10/02 23:04:13.181801 [27137]: server/ctdb_recoverd.c:282 Unable to set recovery mode. Recovery failed.
2009/10/02 23:04:13.181813 [27137]: server/ctdb_recoverd.c:1396 Unable to set recovery mode to normal on cluster


They keep repeeting about once a second or so.

On the other node, nodtvedt, I get: 

2009/10/02 22:45:15.650092 [20267]: ERROR: recovery lock file /data-prod/ctdb/reclock.lock not locked when recovering!

in log.ctdb, repeating about once per second.

Shouldn't ocfs2-1.4 support having the recovery-lock-file ?
I have not yet configured samba, as I won't ctdb to work first.
Comment 1 Erik Sørnes 2009-10-02 16:36:02 UTC
I found out it's version 1.0.89 of ctdb, not 1.0.71 as I first reported.
Comment 2 Michael Adam 2009-11-20 18:22:51 UTC
Hi Erik,

thanks for your bug report!

Up front, I have to say, that I have not yet run ocfs2 myself.

(In reply to comment #0)
> I have configured ocfs2 on a 2-node sles10-sp2-cluster.
> The recover-lock-file is on the ocfs2-mounted /data-prod. It's
> /data-prod/ctdb/reclock.lock, which is accessible from both nodes.
> Startin ctdbd on the first node "leenderts" goes fine.
> But when starting it on the second node "nodtvedt", I get this error in
> log.ctdb on "leenderts":
> 
> 2009/10/02 23:04:13.073230 [27137]: Taking out recovery lock from recovery
> daemon
> 2009/10/02 23:04:13.073278 [27137]: Take the recovery lock
> 2009/10/02 23:04:13.073319 [27137]: Recovery lock taken successfully
> 2009/10/02 23:04:13.073360 [27137]: Recovery lock taken successfully by
> recovery daemon
> 2009/10/02 23:04:13.181695 [27137]: ctdb_control error: 'managed to lock
> reclock file from inside daemon'
> 2009/10/02 23:04:13.181728 [27137]: ctdb_control error: 'managed to lock
> reclock file from inside daemon'
> 2009/10/02 23:04:13.181741 [27137]: Async operation failed with ret=-1 res=-1
> opcode=16
> 2009/10/02 23:04:13.181752 [27137]: Async wait failed - fail_count=1
> 2009/10/02 23:04:13.181801 [27137]: server/ctdb_recoverd.c:282 Unable to set
> recovery mode. Recovery failed.
> 2009/10/02 23:04:13.181813 [27137]: server/ctdb_recoverd.c:1396 Unable to set
> recovery mode to normal on cluster
> 
> 
> They keep repeeting about once a second or so.
> 
> On the other node, nodtvedt, I get: 
> 
> 2009/10/02 22:45:15.650092 [20267]: ERROR: recovery lock file
> /data-prod/ctdb/reclock.lock not locked when recovering!
> 
> in log.ctdb, repeating about once per second.
> 
> Shouldn't ocfs2-1.4 support having the recovery-lock-file ?
> I have not yet configured samba, as I won't ctdb to work first.

Ok, you have a problem with the posix fcntl byte range lock support on your file system

> I have not yet configured samba, as I won't ctdb to work first.

Right, this is best.
Did you run the ping_pong test?
see: http://wiki.samba.org/index.php/Ping_pong
The ping_pong program is included in the ctdb package in newer versions.

What is your kernel version?

I think the kernel version is critical for telling whether sufficient support for posix locks on ocfs2 is available on your system.

Jim: Can you give details on that?

Cheers - Michael
Comment 3 Jim McDonough 2009-11-24 07:37:42 UTC
The sles10-sp2 level of ocfs2 does not support cluster-aware posix locking.  This was added later in the 1.4 series.  On sles, it's not available until sles11...with HAE.