ctdb-1.0.114.2 samba-3.4.3 (with On rare occasions while the cluster is under heavy load we are seeing the recovery daemon shutdown as a result of encountering "Invalid packet of length 0" errors. Adding Rusty's "Report client for queue errors" patch and dissecting the packet buffers we observe the following: Bad packet data block 1: line number 129 ========================================= 20:37:19: client/ctdb_client.c:789 control timed out. reqid:52007 opcode:9 dstnode:4026531841 20:37:19: client/ctdb_client.c:900 ctdb_control_recv failed 20:37:19: client/ctdb_client.c:1214 ctdb_control for getdbmap failed ret:-1 res:-1 20:37:19: server/ctdb_recoverd.c:818 Unable to get dbids from local node 20:37:19: to-ctdbd: Invalid packet of length 0 (nread = 13800, totread = 21776, partlen = 15920) # packet A, byte offset 0, char number 48 58 5d 00 00 length; = 23896 /* bogus - next header is at offset 15872 */ 42 44 54 43 ctdb_magic; = CTDB 01 00 00 00 ctdb_version; = 1 06 f8 3a 12 generation; = 0x123af806 05 00 00 00 operation; = CTDB_REQ_MESSAGE 00 00 00 00 destnode; = 0 01 00 00 00 srcnode; = 1 00 00 00 00 reqid; = 0 # packet B, byte offset 15872 != (packetA + packetA.length), char number 47664 30 00 00 00 length; = 48 42 44 54 43 ctdb_magic; = CTDB 01 00 00 00 ctdb_version; = 1 06 f8 3a 12 generation; = 0x123af806 08 00 00 00 operation; = CTDB_REPLY_CONTROL 00 00 00 00 destnode; = 0 01 00 00 00 srcnode; = 1 28 cb 00 00 reqid; = 0xcb28 # packet C, byte offset 15920 == (packetB + packetB.length), char number 47808 10 55 00 00 length; = 21776 42 44 54 43 ctdb_magic; = CTDB 01 00 00 00 ctdb_version; = 1 06 f8 3a 12 generation; = 0x123af806 05 00 00 00 operation; = CTDB_REQ_MESSAGE 00 00 00 00 destnode; = 0 01 00 00 00 srcnode; = 1 00 00 00 00 reqid; = 0 # END, byte offset 37696 == (packetC + packetC.length), char number 113136 Bad packet data block 2: line number 259 ========================================= 20:42:34: to-ctdbd: Invalid packet of length 0 (nread = 2192, totread = 2192, partlen = 0) 00 00 00 00 00 00 49 44 2f 30 3a 32 34 31 36 33 2f 31 30 30 00 1f 00 00 00 00 00 00 00 0f 00 00 ... Buffer is Completely bogus, no CTDB magic to be seen Bad packet data block 3: line number 375 ========================================= 21:02:43: client/ctdb_client.c:789 control timed out. reqid:96097 opcode:9 dstnode:4026531841 21:02:43: client/ctdb_client.c:900 ctdb_control_recv failed 21:02:43: client/ctdb_client.c:1214 ctdb_control for getdbmap failed ret:-1 res:-1 21:02:43: server/ctdb_recoverd.c:818 Unable to get dbids from local node 21:02:43: to-ctdbd: Invalid packet of length 0 (nread = 13360, totread = 22696, partlen = 15920) # packet A, byte offset 0, char number 48 a8 62 00 00 length; = 25256 42 44 54 43 ctdb_magic; = CTDB 01 00 00 00 ctdb_version; = 1 a6 b7 2c 54 generation; = 0x542cb7a6 05 00 00 00 operation; = CTDB_REQ_MESSAGE 00 00 00 00 destnode; = 0 01 00 00 00 srcnode; = 1 00 00 00 00 reqid; = 0 # packet B, byte offset 15872 != (packetA + packetA.length), char number 47664 30 00 00 00 length; = 48 42 44 54 43 ctdb_magic; = CTDB 01 00 00 00 ctdb_version; = 1 a6 b7 2c 54 generation; = 0x542cb7a6 08 00 00 00 operation; = CTDB_REPLY_CONTROL 00 00 00 00 destnode; = 0 01 00 00 00 srcnode; = 1 62 77 01 00 reqid; = 0x00017762 # packet C, byte offset 15920 == (packetB + packetB.length), char number 47808 a8 58 00 00 length; = 22696 42 44 54 43 ctdb_magic; = CTDB 01 00 00 00 ctdb_version; = 1 a6 b7 2c 54 generation; = 0x542cb7a6 05 00 00 00 operation; = CTDB_REQ_MESSAGE 00 00 00 00 destnode; = 0 01 00 00 00 srcnode; = 1 00 00 00 00 reqid; = 0 # END, byte offset 38616 == (packetC + packetC.length), char number 115896 Bad packet data block 4: line number 480 ========================================= 03:47:26: to-ctdbd: Invalid packet of length 0 (nread = 2288, totread = 2288, partlen = 0) 00 00 00 00 00 00 49 44 2f 30 3a 32 33 30 38 31 2f 31 30 30 00 1f 00 00 00 00 00 00 00 0f 00 00 ... Buffer is Completely bogus, no CTDB magic to be seen
Created attachment 6715 [details] ctdb log with dumped buffers for bad packets
Following discussion on IRC; Rusy and Ronnie have encountered the same issue however report it fixed with the vacuuming rewrite, more specifically commit 3930c7796b72bbf275bbca8aaeceec3e705a964b. This commit is included in ctdb-1.0.114.2.
The underlying problem appears to be the core ctdb IO layer, and AFAICT it's been there for a long time - queue_io_read() is not safe for reentry via the queue callback. Consider the following scenario: - ctdbd sends 2 CTDB_SRVID_VACUUM_FETCH messages to the recovery daemon (recoverd), the payload of both messages is large, resulting in a multiple write() attempts in queue_io_write(). - 1.5 (not yet the full 2) packets worth of data arrives at the other end of the pipe for processing by recoverd. - recoverd queue_io_read() pulls the full 1.5 packets worth of data off the pipe. - the first packet is pulled into a buffer and handed to queue->callback(). - In the meantime the remaining 0.5 of the second CTDB_SRVID_VACUUM_FETCH message arrives. - the recoverd vacuum_fetch_handler() calls ctdb_ctrl_getdbname() from within the queue->callback(). - ctdb_ctrl_getdbname() sends a CTDB_CONTROL_GET_DBNAME message and *reenters* queue_io_read() awaiting the response. - On reentry to queue_io_read() the 0.5 CTDB_SRVID_VACUUM_FETCH packet already pulled off the wire is unaccounted for. The remaining 0.5 packet available is processed as if it were a new header, with the first four bytes assumed to be the packet length. - At this point the packet stream is wrongly aligned and IPC comes to a grinding halt with "Invalid packet of length 0" errors.
Created attachment 6737 [details] Patch to make queue_io_read() safe for reentry The following changes since commit 14a1bf4fff4ce4a48704d09324e4757fbf043f92: Version 1.0.114.2: Start a new series of versions 1.0.114.X based on 1.0.114 (2011-05-20 16:21:04 +0200) are available in the git repository at: git://oss.sgi.com/ddiss/ctdb 114.2-bso8319-queue_io_read David Disseldorp (1): io: Make queue_io_read() safe for reentry common/ctdb_io.c | 121 +++++++++++++++++++++++------------------------------ 1 files changed, 53 insertions(+), 68 deletions(-)
(In reply to comment #4) > Created attachment 6737 [details] > Patch to make queue_io_read() safe for reentry Oops, looks like I got a bit carried away and forgot to add checks for packet buffer allocation failures. New patch to come...
Created attachment 6738 [details] Patch to make queue_io_read() safe for reentry The following changes since commit 14a1bf4fff4ce4a48704d09324e4757fbf043f92: Version 1.0.114.2: Start a new series of versions 1.0.114.X based on 1.0.114 (2011-05-20 16:21:04 +0200) are available in the git repository at: git://oss.sgi.com/ddiss/ctdb 114.2-bso8319-queue_io_read_rb1 David Disseldorp (1): io: Make queue_io_read() safe for reentry common/ctdb_io.c | 130 ++++++++++++++++++++++++++---------------------------- 1 files changed, 62 insertions(+), 68 deletions(-)
Created attachment 6742 [details] smbclient workload used to trip the bug
Created attachment 6743 [details] Patch to make queue_io_read() safe for reentry - rb2 The previous patch incorrectly allowed for a 0 byte read if only the packet length bytes were available - fixed in this patch version (rb2). -- The following changes since commit 14a1bf4fff4ce4a48704d09324e4757fbf043f92: Version 1.0.114.2: Start a new series of versions 1.0.114.X based on 1.0.114 (2011-05-20 16:21:04 +0200) are available in the git repository at: git://oss.sgi.com/ddiss/ctdb 114.2-bso8319-queue_io_read_rb2 David Disseldorp (1): io: Make queue_io_read() safe for reentry common/ctdb_io.c | 129 ++++++++++++++++++++++++++---------------------------- 1 files changed, 62 insertions(+), 67 deletions(-)
Awaiting review, adding Rusty and Ronnie to cc.
Thanks. Patch has been merged to master and 1.0.114 as well as my 1.2 branch
(In reply to comment #10) > Thanks. > > Patch has been merged to master and 1.0.114 as well as my 1.2 branch Thanks Ronnie, closing.