Created attachment 6401 [details] client-handle-transient-connection-errors.patch The CTDB Linux-HA/Pacemaker resource agent uses the ctdb client utility to monitor the status of ctdbd on a given node, it does this using the "ctdb ping" command. Under heavy load connections to the ctdbd unix domain socket may fail with EAGAIN, subsequently the error is immediately returned to the calling shell. The following changes since commit b4cb9a7163251828a1fd69a6ebb3ee370d7643e8: vacuum: close fd leak (2010-08-05 10:43:48 +1000) are available in the git repository at: git://oss.sgi.com/ddiss/ctdb 112b-client-conn-retry-rb1 David Disseldorp (1): client: handle transient connection errors client/ctdb_client.c | 32 ++++++++++++++++++++++++++++---- 1 files changed, 28 insertions(+), 4 deletions(-)
Hi Ronnie, this looks reasonable to me. Could you review it? Cheers - Michael
Hi David, sorry for the long delay... I am taking back the bug, since Ronnies seems to be unaware of bugzilla.samba.org ... :-P The patch looks reasonable, but a couple of comments after looking over it together with Metze (mostly stylistic / cosmetic): * could you rename "nsecs" ? (I just sounds like nano secons ...) e.g. to seconds * could you use "seconds *= 2" or "seconds = seconds * 2" instead of "seconds <<= 1" for clarity?... * add braces: if ((ret == 0) || (errno != EAGAIN)) { break; } * maybe spare an indentation by using if (seconds > CONNECT_MAXSLEE/2) { break; } * don't throw away the errno information from the debug message further down (this is not cosmetic!) * close the socket _after_ the debug statement (to log the correct errno) (not cosmetic either!) If you adapt the patch this way, I'll be happy to push it! :-) Cheers - Michael
(In reply to comment #2) > Hi David, > > sorry for the long delay... No problem, thanks for the review :) ... > If you adapt the patch this way, I'll be happy to push it! :-) I've pushed a commit with the above suggestions (all except the spared indentation) to: The following changes since commit 14a1bf4fff4ce4a48704d09324e4757fbf043f92: Version 1.0.114.2: Start a new series of versions 1.0.114.X based on 1.0.114 (2011-05-20 16:21:04 +0200) are available in the git repository at: git://oss.sgi.com/ddiss/ctdb 114-client-conn-retry-rb2 David Disseldorp (1): client: handle transient connection errors client/ctdb_client.c | 35 ++++++++++++++++++++++++++++++----- 1 files changed, 30 insertions(+), 5 deletions(-)
Created attachment 6609 [details] ctdb-client-connect-retry.patch
Pushed to master, thanks! I will add it to the 1.0.114 branch. I think Ronnie should also add it to the 1.2 branch...