Bug 13168 - ctdb reported trasactions lock held error
Summary: ctdb reported trasactions lock held error
Status: RESOLVED INVALID
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: 4.5.0rc
Hardware: x64 Linux
: P5 normal
Target Milestone: ---
Assignee: Amitay Isaacs
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-27 01:43 UTC by wangzhe
Modified: 2019-03-20 00:24 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wangzhe 2017-11-27 01:43:28 UTC
hi ,


  Recentlly, I was trying some samba experiments witch  ctdb (version 4.5) , came across a wired error.  Would you please help to check  this error and fix this? 


There  are 4nodes with 1st one(pnn:0) deleted in the cluster. The 4th node(pnn:3) is cluster recovery master .  According to all log.ctdb, it worked like this: 
    First, master node(pnn:3) started a recovery. During data recovery time, “recoveryd” process in the 3rd node(pnn:2) was restarted for some timeout problem. At the same time, the 2nd node(pnn:1) reported some lock db errors and finished, however it kept reporting "tdb trasactions lock held" error from then on.  It seems like: the transaction lock was held by tdb handle after do-recovery failed?
   The whole cluster couldn't recovery to normal until I restart ctdb service in 2nd node (pnn:1).
   Following is related log, hope it's helpful for you .


node2(pnn:1)
2017/11/07 16:55:23.651379 [ 7810]: ../ctdb/server/ctdb_recover.c:931 Recovery mode set to ACTIVE

2017/11/07 16:55:24.290731 [ 7810]: Recovery has started

2017/11/07 16:55:24.421339 [ 7810]: Freeze db: smbXsrv_client_global.tdb

2017/11/07 16:55:24.422196 [ 7810]: Freeze db: printer_list.tdb

2017/11/07 16:55:24.422881 [ 7810]: Freeze db: smbXsrv_open_global.tdb

2017/11/07 16:55:24.423463 [ 7810]: Freeze db: leases.tdb

2017/11/07 16:55:24.424132 [ 7810]: Freeze db: locking.tdb

2017/11/07 16:55:24.424797 [ 7810]: Freeze db: brlock.tdb

2017/11/07 16:55:24.425352 [ 7810]: Freeze db: smbXsrv_tcon_global.tdb

2017/11/07 16:55:24.425969 [ 7810]: Freeze db: smbXsrv_version_global.tdb

2017/11/07 16:55:24.426539 [ 7810]: Freeze db: g_lock.tdb

2017/11/07 16:55:24.427103 [ 7810]: Freeze db: serverid.tdb

2017/11/07 16:55:24.427722 [ 7810]: Freeze db: smbXsrv_session_global.tdb

2017/11/07 16:55:24.428304 [ 7810]: Freeze db: ctdb.tdb

2017/11/07 16:55:24.429260 [ 7810]: Freeze db: registry.tdb

2017/11/07 16:55:24.429836 [ 7810]: Freeze db: passdb.tdb

2017/11/07 16:55:24.430404 [ 7810]: Freeze db: account_policy.tdb

2017/11/07 16:55:24.430976 [ 7810]: Freeze db: secrets.tdb

2017/11/07 16:55:24.431578 [ 7810]: Freeze db: share_info.tdb

2017/11/07 16:55:24.433290 [ 7810]: Freeze db: group_mapping.tdb

2017/11/07 16:55:24.682885 [ 7810]: Starting traverse on DB smbXsrv_session_global.tdb (id 276926)

2017/11/07 16:55:24.991773 [ 7810]: Ending traverse on DB smbXsrv_session_global.tdb (id 276926), records 6

2017/11/07 16:55:25.048252 [ 7810]: Thaw db: ctdb.tdb generation 56210978

2017/11/07 16:55:25.048299 [ 7810]: Release freeze handle for db ctdb.tdb

2017/11/07 16:55:25.048348 [ 7810]: Incorrect transaction commit id 0x4136f701 for smbXsrv_version_global.tdb

2017/11/07 16:55:25.048395 [ 7810]: Thaw db: secrets.tdb generation 1731632864

2017/11/07 16:55:25.048408 [ 7810]: Release freeze handle for db secrets.tdb

2017/11/07 16:55:25.048840 [ 7810]: Transaction not started on registry.tdb

2017/11/07 16:55:25.048911 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for brlock.tdb

2017/11/07 16:55:25.048953 [ 7810]: Thaw db: smbXsrv_tcon_global.tdb generation 1731632864

2017/11/07 16:55:25.048964 [ 7810]: Release freeze handle for db smbXsrv_tcon_global.tdb

2017/11/07 16:55:25.057919 [ 7810]: Thaw db: account_policy.tdb generation 1731632864

2017/11/07 16:55:25.057934 [ 7810]: Release freeze handle for db account_policy.tdb

2017/11/07 16:55:25.058253 [ 7810]: ../ctdb/server/ctdb_recover.c:759 DB push not started

2017/11/07 16:55:25.058269 [ 7810]: Thaw db: passdb.tdb generation 1731632864

2017/11/07 16:55:25.058290 [ 7810]: Release freeze handle for db passdb.tdb

2017/11/07 16:55:25.058754 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for locking.tdb

2017/11/07 16:55:25.058818 [ 7810]: Thaw db: group_mapping.tdb generation 1731632864

2017/11/07 16:55:25.058826 [ 7810]: Release freeze handle for db group_mapping.tdb

2017/11/07 16:55:25.059187 [ 7810]: Incorrect transaction commit id 0x4136f701 for smbXsrv_open_global.tdb

2017/11/07 16:55:25.059207 [ 7810]: ../ctdb/server/ctdb_recover.c:759 DB push not started

2017/11/07 16:55:25.059231 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for leases.tdb

2017/11/07 16:55:25.059266 [ 7810]: Failed to mark (all lock) database smbXsrv_client_global.tdb

2017/11/07 16:55:25.059274 [ 7810]: ../ctdb/server/ctdb_recover.c:725 Failed to get lock on entire db - failing

2017/11/07 16:55:25.059289 [ 7810]: Thaw db: share_info.tdb generation 1731632864

2017/11/07 16:55:25.059295 [ 7810]: Release freeze handle for db share_info.tdb

2017/11/07 16:55:25.059643 [ 7810]: pnn 1 Invalid reqid 273729 in ctdb_reply_control

2017/11/07 16:55:25.157762 [ 7810]: pnn 1 Invalid reqid 273422 in ctdb_reply_control

2017/11/07 16:55:30.539163 [ 7810]: Starting traverse on DB smbXsrv_session_global.tdb (id 277031)

2017/11/07 16:55:30.541076 [ 7810]: Ending traverse on DB smbXsrv_session_global.tdb (id 277031), records 0

2017/11/07 16:55:36.060158 [ 7810]: Starting traverse on DB smbXsrv_session_global.tdb (id 277134)

2017/11/07 16:55:36.062713 [ 7810]: Ending traverse on DB smbXsrv_session_global.tdb (id 277134), records 0

2017/11/07 16:55:41.564586 [ 7810]: Starting traverse on DB smbXsrv_session_global.tdb (id 277225)

2017/11/07 16:55:41.566581 [ 7810]: Ending traverse on DB smbXsrv_session_global.tdb (id 277225), records 0

2017/11/07 16:55:47.032900 [ 7810]: Starting traverse on DB smbXsrv_session_global.tdb (id 277328)

2017/11/07 16:55:47.034860 [ 7810]: Ending traverse on DB smbXsrv_session_global.tdb (id 277328), records 0

2017/11/07 16:55:52.515324 [ 7810]: Starting traverse on DB smbXsrv_session_global.tdb (id 277419)

2017/11/07 16:55:52.516872 [ 7810]: Ending traverse on DB smbXsrv_session_global.tdb (id 277419), records 0

2017/11/07 16:55:54.450854 [ 7810]: Freeze db: smbXsrv_session_global.tdb frozen

2017/11/07 16:55:54.450854 [ 7810]: Freeze db: smbXsrv_session_global.tdb frozen

2017/11/07 16:55:54.450892 [ 7810]: Freeze db: serverid.tdb frozen

2017/11/07 16:55:54.450922 [ 7810]: Freeze db: registry.tdb frozen

2017/11/07 16:55:54.450975 [ 7810]: Freeze db: group_mapping.tdb

2017/11/07 16:55:54.451827 [ 7810]: Freeze db: share_info.tdb

2017/11/07 16:55:54.452477 [ 7810]: Freeze db: secrets.tdb

2017/11/07 16:55:54.453193 [ 7810]: Freeze db: smbXsrv_client_global.tdb frozen

2017/11/07 16:55:54.453218 [ 7810]: Freeze db: account_policy.tdb

2017/11/07 16:55:54.453931 [ 7810]: Freeze db: smbXsrv_open_global.tdb frozen

2017/11/07 16:55:54.453970 [ 7810]: Freeze db: smbXsrv_tcon_global.tdb

2017/11/07 16:55:54.454619 [ 7810]: Freeze db: brlock.tdb frozen

2017/11/07 16:55:54.454642 [ 7810]: Freeze db: printer_list.tdb frozen

2017/11/07 16:55:54.454656 [ 7810]: Freeze db: passdb.tdb

2017/11/07 16:55:54.455324 [ 7810]: Freeze db: locking.tdb frozen

2017/11/07 16:55:54.455354 [ 7810]: Freeze db: leases.tdb frozen

2017/11/07 16:55:54.455372 [ 7810]: Freeze db: g_lock.tdb frozen

2017/11/07 16:55:54.455407 [ 7810]: Freeze db: smbXsrv_version_global.tdb frozen

2017/11/07 16:55:54.455422 [ 7810]: Freeze db: ctdb.tdb

2017/11/07 16:55:54.456134 [ 7810]: Freeze db: smbXsrv_session_global.tdb frozen

2017/11/07 16:55:54.456159 [ 7810]: Freeze db: serverid.tdb frozen

2017/11/07 16:55:54.456176 [ 7810]: Freeze db: registry.tdb frozen

2017/11/07 16:55:54.456193 [ 7810]: Freeze db: smbXsrv_client_global.tdb frozen

2017/11/07 16:55:54.456222 [ 7810]: Freeze db: smbXsrv_open_global.tdb frozen

2017/11/07 16:55:54.456242 [ 7810]: Freeze db: brlock.tdb frozen

2017/11/07 16:55:54.456262 [ 7810]: Freeze db: printer_list.tdb frozen

2017/11/07 16:55:54.456381 [ 7810]: Freeze db: locking.tdb frozen

2017/11/07 16:55:54.456406 [ 7810]: Freeze db: leases.tdb frozen

2017/11/07 16:55:54.456418 [ 7810]: Freeze db: g_lock.tdb frozen

2017/11/07 16:55:54.456428 [ 7810]: Freeze db: smbXsrv_version_global.tdb frozen

2017/11/07 16:55:54.456458 [ 7810]: Freeze db: group_mapping.tdb frozen

2017/11/07 16:55:54.457044 [ 7810]: Freeze db: share_info.tdb frozen

2017/11/07 16:55:54.457113 [ 7810]: Freeze db: secrets.tdb frozen

2017/11/07 16:55:54.457223 [ 7810]: Freeze db: account_policy.tdb frozen

2017/11/07 16:55:54.457262 [ 7810]: Freeze db: smbXsrv_tcon_global.tdb frozen

2017/11/07 16:55:54.457279 [ 7810]: Freeze db: passdb.tdb frozen

2017/11/07 16:55:54.457340 [ 7810]: Freeze db: ctdb.tdb frozen

2017/11/07 16:55:54.463038 [ 7810]: Recovery has started

2017/11/07 16:55:54.611802 [ 7810]: Freeze db: smbXsrv_client_global.tdb frozen

2017/11/07 16:55:54.611839 [ 7810]: Freeze db: printer_list.tdb frozen

2017/11/07 16:55:54.611882 [ 7810]: Freeze db: smbXsrv_open_global.tdb frozen

2017/11/07 16:55:54.611901 [ 7810]: Freeze db: leases.tdb frozen

2017/11/07 16:55:54.612001 [ 7810]: Freeze db: locking.tdb frozen

2017/11/07 16:55:54.612020 [ 7810]: Freeze db: brlock.tdb frozen

2017/11/07 16:55:54.612032 [ 7810]: Freeze db: smbXsrv_tcon_global.tdb frozen

2017/11/07 16:55:54.612043 [ 7810]: Freeze db: smbXsrv_version_global.tdb frozen

2017/11/07 16:55:54.612060 [ 7810]: Freeze db: g_lock.tdb frozen

2017/11/07 16:55:54.612072 [ 7810]: Freeze db: serverid.tdb frozen

2017/11/07 16:55:54.612091 [ 7810]: Freeze db: smbXsrv_session_global.tdb frozen

2017/11/07 16:55:54.612228 [ 7810]: Freeze db: ctdb.tdb frozen

2017/11/07 16:55:54.612243 [ 7810]: Freeze db: registry.tdb frozen

2017/11/07 16:55:54.612261 [ 7810]: Freeze db: passdb.tdb frozen

2017/11/07 16:55:54.612274 [ 7810]: Freeze db: account_policy.tdb frozen

2017/11/07 16:55:54.612284 [ 7810]: Freeze db: secrets.tdb frozen

2017/11/07 16:55:54.612309 [ 7810]: Freeze db: share_info.tdb frozen

2017/11/07 16:55:54.612325 [ 7810]: Freeze db: group_mapping.tdb frozen

2017/11/07 16:55:54.612503 [ 7810]: tdb(/var/lib/ctdb/smbXsrv_client_global.tdb.1): tdb_transaction_start: cannot start a transaction with locks held

2017/11/07 16:55:54.612513 [ 7810]: Failed to start transaction for db smbXsrv_client_global.tdb

2017/11/07 16:55:54.628082 [ 7810]: Freeze db: smbXsrv_client_global.tdb frozen

2017/11/07 16:55:54.649846 [ 7810]: tdb(/var/lib/ctdb/smbXsrv_client_global.tdb.1): tdb_transaction_start: cannot start a transaction with locks held

2017/11/07 16:55:54.649868 [ 7810]: Failed to start transaction for db smbXsrv_client_global.tdb

2017/11/07 16:55:54.671991 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for leases.tdb

2017/11/07 16:55:54.672029 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for locking.tdb

2017/11/07 16:55:54.705465 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for brlock.tdb

2017/11/07 16:55:54.705496 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for smbXsrv_tcon_global.tdb

2017/11/07 16:55:54.707799 [ 7810]: ../ctdb/server/ctdb_recover.c:692 DB push already started for g_lock.tdb

2017/11/07 16:55:54.712518 [ 7810]: Freeze db: smbXsrv_client_global.tdb frozen

2017/11/07 16:55:55.025606 [ 7810]: tdb(/var/lib/ctdb/smbXsrv_client_global.tdb.1): tdb_transaction_start: cannot start a transaction with locks held

2017/11/07 16:55:55.025628 [ 7810]: Failed to start transaction for db smbXsrv_client_global.tdb

2017/11/07 16:55:55.025677 [ 7810]: Thaw db: printer_list.tdb generation 2095075608

2017/11/07 16:55:55.025688 [ 7810]: Release freeze handle for db printer_list.tdb

2017/11/07 16:55:55.026385 [ 7810]: Thaw db: smbXsrv_open_global.tdb generation 2095075608

2017/11/07 16:55:55.026401 [ 7810]: Release freeze handle for db smbXsrv_open_global.tdb

2017/11/07 16:55:55.026450 [ 7810]: Thaw db: leases.tdb generation 2095075608

2017/11/07 16:55:55.026459 [ 7810]: Release freeze handle for db leases.tdb

2017/11/07 16:55:55.027301 [ 7810]: Thaw db: locking.tdb generation 2095075608

2017/11/07 16:55:55.027320 [ 7810]: Release freeze handle for db locking.tdb

2017/11/07 16:55:55.386982 [ 7810]: Thaw db: ctdb.tdb generation 2095075608                                                                                 







node3(pnn:2)
2017/11/07 16:55:23.660873 [18278]: ../ctdb/server/ctdb_recover.c:931 Recovery mode set to ACTIVE

2017/11/07 16:55:24.300223 [18278]: Recovery has started

2017/11/07 16:55:24.430801 [18278]: Freeze db: smbXsrv_client_global.tdb

2017/11/07 16:55:24.431699 [18278]: Freeze db: printer_list.tdb

2017/11/07 16:55:24.432382 [18278]: Freeze db: smbXsrv_open_global.tdb

2017/11/07 16:55:24.432919 [18278]: Freeze db: leases.tdb

2017/11/07 16:55:24.433483 [18278]: Freeze db: locking.tdb

2017/11/07 16:55:24.434052 [18278]: Freeze db: brlock.tdb

2017/11/07 16:55:24.435375 [18278]: Freeze db: smbXsrv_tcon_global.tdb

2017/11/07 16:55:24.436203 [18278]: Freeze db: smbXsrv_version_global.tdb

2017/11/07 16:55:24.436777 [18278]: Freeze db: g_lock.tdb

2017/11/07 16:55:24.437360 [18278]: Freeze db: serverid.tdb

2017/11/07 16:55:24.437941 [18278]: Freeze db: smbXsrv_session_global.tdb

2017/11/07 16:55:24.438496 [18278]: Freeze db: ctdb.tdb

2017/11/07 16:55:24.439094 [18278]: Freeze db: registry.tdb

2017/11/07 16:55:24.439717 [18278]: Freeze db: passdb.tdb

2017/11/07 16:55:24.440329 [18278]: Freeze db: account_policy.tdb

2017/11/07 16:55:24.441271 [18278]: Freeze db: secrets.tdb

2017/11/07 16:55:24.441975 [18278]: Freeze db: share_info.tdb

2017/11/07 16:55:24.442941 [18278]: Freeze db: group_mapping.tdb

2017/11/07 16:55:55.485578 [recoverd:10848]: ../ctdb/client/ctdb_client.c:1070 control timed out. reqid:305820 opcode:303 dstnode:4026531841

2017/11/07 16:55:55.485637 [recoverd:10848]: ../ctdb/client/ctdb_client.c:1272 ctdb_control_recv failed

2017/11/07 16:55:55.485663 [recoverd:10848]: ../ctdb/client/ctdb_client.c:2550 ctdb_control for get revovery daemon pid failed

2017/11/07 16:55:55.485677 [recoverd:10848]: failed to get recovery daemon pid from paraent

2017/11/07 16:55:55.485689 [recoverd:10848]: Recovery daemon 10848 is invalid ,we will exit

2017/11/07 16:55:55.485704 [recoverd:10848]: CTDB recoverd: shutting down

2017/11/07 16:56:35.692001 [18278]: client call timeout, client pid: 19740, client reqid: 381, state reqid: 329157

2017/11/07 16:56:35.692014 [18278]: Recovery daemon (pid:10848) is no longer running. Trying to restart recovery daemon.

2017/11/07 16:56:35.692023 [18278]: Restarting recovery daemon

2017/11/07 16:56:35.692029 [18278]: Shutting down recovery daemon

2017/11/07 16:56:35.692035 [18278]: ctdb_kill: trying to kill(10848, 15) a process that does not exist

2017/11/07 16:56:35.692674 [18278]: Handling event took 71 seconds! 

2017/11/07 16:56:35.694770 [recoverd:32371]: monitor_cluster starting

2017/11/07 16:56:36.101274 [18278]: Starting traverse on DB smbXsrv_session_global.tdb (id 329493)

2017/11/07 16:56:40.692506 [18278]: 192.168.50.229:4379: connected to 192.168.50.226:4379 - 2 connected

2017/11/07 16:56:46.251652 [recoverd:32371]: Initial recovery master set - forcing election

2017/11/07 16:56:46.251944 [18278]: This node (2) is now the recovery master

2017/11/07 16:56:46.754262 [18278]: Remote node (3) is now the recovery master

2017/11/07 16:56:49.754345 [recoverd:32371]: Election period ended

2017/11/07 16:56:55.692268 [18278]: client call timeout, client pid: 19740, client reqid: 381, state reqid: 329157

2017/11/07 16:56:56.101757 [18278]: ../ctdb/server/ctdb_traverse.c:313 Traverse all timeout on database:smbXsrv_session_global.tdb

2017/11/07 16:56:56.101817 [18278]: Ending traverse on DB smbXsrv_session_global.tdb (id 329493), records 0

2017/11/07 16:56:56.101836 [18278]: ../ctdb/server/ctdb_traverse.c:641 Traverse cancelled by client disconnect for database:0x6b06a26d

2017/11/07 16:57:01.856741 [18278]: Starting traverse on DB smbXsrv_session_global.tdb (id 329584)

2017/11/07 16:57:15.692971 [18278]: client call timeout, client pid: 19740, client reqid: 381, state reqid: 329157

2017/11/07 16:57:15.696210 [18278]: dead count reached for node 3

2017/11/07 16:57:15.696218 [18278]: 192.168.50.229:4379: node 192.168.50.230:4379 is dead: 1 connected

2017/11/07 16:57:15.696226 [18278]: Tearing down connection to dead node :3



node4(pnn:3) master:
2017/11/07 16:55:24.293438 [recoverd:18166]: recovery: set recovery mode to ACTIVE

2017/11/07 16:55:24.293482 [17979]: Recovery has started

2017/11/07 16:55:24.423344 [recoverd:18166]: recovery: start_recovery event finished

2017/11/07 16:55:24.423496 [recoverd:18166]: recovery: updated VNNMAP

2017/11/07 16:55:24.423513 [recoverd:18166]: recovery: recover database 0x477d2e20

2017/11/07 16:55:24.423522 [recoverd:18166]: recovery: recover database 0x5bcfcbd7

2017/11/07 16:55:24.423544 [recoverd:18166]: recovery: recover database 0x66f71b8c

2017/11/07 16:55:24.423552 [recoverd:18166]: recovery: recover database 0x06916e77

2017/11/07 16:55:24.423560 [recoverd:18166]: recovery: recover database 0x7a19d84d

2017/11/07 16:55:24.423568 [recoverd:18166]: recovery: recover database 0x4e66c2b2

2017/11/07 16:55:24.423576 [recoverd:18166]: recovery: recover database 0x68c12c2c

2017/11/07 16:55:24.423583 [recoverd:18166]: recovery: recover database 0x521b7544

2017/11/07 16:55:24.423592 [recoverd:18166]: recovery: recover database 0x4d2a432b

2017/11/07 16:55:24.423599 [recoverd:18166]: recovery: recover database 0x9ec2a880

2017/11/07 16:55:24.423607 [recoverd:18166]: recovery: recover database 0x6b06a26d

2017/11/07 16:55:24.423618 [recoverd:18166]: recovery: recover database 0x6645c6c4

2017/11/07 16:55:24.423626 [recoverd:18166]: recovery: recover database 0x6cf2837d

2017/11/07 16:55:24.423634 [recoverd:18166]: recovery: recover database 0x3ef19640

2017/11/07 16:55:24.423642 [recoverd:18166]: recovery: recover database 0x2ca251cf

2017/11/07 16:55:24.423649 [recoverd:18166]: recovery: recover database 0x7132c184

2017/11/07 16:55:24.423657 [recoverd:18166]: recovery: recover database 0xc3078fba

2017/11/07 16:55:24.423665 [recoverd:18166]: recovery: recover database 0xa1413774

2017/11/07 16:55:24.424044 [17979]: Freeze db: smbXsrv_client_global.tdb

2017/11/07 16:55:24.425078 [17979]: Freeze db: printer_list.tdb

2017/11/07 16:55:24.425912 [17979]: Freeze db: smbXsrv_open_global.tdb

2017/11/07 16:55:24.426670 [17979]: Freeze db: leases.tdb

2017/11/07 16:55:24.427401 [17979]: Freeze db: locking.tdb

2017/11/07 16:55:24.427969 [17979]: Freeze db: brlock.tdb

2017/11/07 16:55:24.428702 [17979]: Freeze db: smbXsrv_tcon_global.tdb

2017/11/07 16:55:24.429390 [17979]: Freeze db: smbXsrv_version_global.tdb

2017/11/07 16:55:24.429950 [17979]: Freeze db: g_lock.tdb

2017/11/07 16:55:24.430552 [17979]: Freeze db: serverid.tdb

2017/11/07 16:55:24.431205 [17979]: Freeze db: smbXsrv_session_global.tdb

2017/11/07 16:55:24.432050 [17979]: Freeze db: ctdb.tdb

2017/11/07 16:55:24.432803 [17979]: Freeze db: registry.tdb

2017/11/07 16:55:24.433429 [17979]: Freeze db: passdb.tdb

2017/11/07 16:55:24.434020 [17979]: Freeze db: account_policy.tdb

2017/11/07 16:55:24.434582 [17979]: Freeze db: secrets.tdb

2017/11/07 16:55:24.435274 [17979]: Freeze db: share_info.tdb

2017/11/07 16:55:24.436050 [17979]: Freeze db: group_mapping.tdb

2017/11/07 16:55:24.447420 [recoverd:18166]: recovery: Pulled 0 records for db smbXsrv_client_global.tdb from node 1

2017/11/07 16:55:24.448877 [recoverd:18166]: recovery: Pulled 1 records for db printer_list.tdb from node 1

2017/11/07 16:55:24.448905 [recoverd:18166]: recovery: Pull persistent db ctdb.tdb from node 1 with seqnum 0x0

2017/11/07 16:55:24.448943 [recoverd:18166]: recovery: Pull persistent db registry.tdb from node 1 with seqnum 0x43

2017/11/07 16:55:24.448971 [recoverd:18166]: recovery: Pull persistent db passdb.tdb from node 1 with seqnum 0x1

2017/11/07 16:55:24.448998 [recoverd:18166]: recovery: Pulled 0 records for db smbXsrv_open_global.tdb from node 1

2017/11/07 16:55:24.449025 [recoverd:18166]: recovery: Pulled 0 records for db leases.tdb from node 1

2017/11/07 16:55:24.449040 [recoverd:18166]: recovery: Pulled 0 records for db locking.tdb from node 1

2017/11/07 16:55:24.449054 [recoverd:18166]: recovery: Pull persistent db account_policy.tdb from node 1 with seqnum 0x1

2017/11/07 16:55:24.449096 [recoverd:18166]: recovery: Pull persistent db secrets.tdb from node 1 with seqnum 0x1

2017/11/07 16:55:24.449112 [recoverd:18166]: recovery: Pulled 0 records for db brlock.tdb from node 1

2017/11/07 16:55:24.449138 [recoverd:18166]: recovery: Pull persistent db share_info.tdb from node 1 with seqnum 0x1

2017/11/07 16:55:24.449337 [recoverd:18166]: recovery: Pulled 4 records for db smbXsrv_tcon_global.tdb from node 1

2017/11/07 16:55:24.449367 [recoverd:18166]: recovery: Pull persistent db group_mapping.tdb from node 1 with seqnum 0x0

2017/11/07 16:55:24.449690 [recoverd:18166]: recovery: Pulled 1 records for db smbXsrv_version_global.tdb from node 1

2017/11/07 16:55:24.449789 [recoverd:18166]: recovery: Pulled 0 records for db g_lock.tdb from node 1

2017/11/07 16:55:24.450218 [recoverd:18166]: recovery: Pulled 22 records for db serverid.tdb from node 1

2017/11/07 16:55:24.450301 [recoverd:18166]: recovery: Pulled 0 records for db smbXsrv_client_global.tdb from node 2

2017/11/07 16:55:24.450528 [recoverd:18166]: recovery: Pulled 6 records for db smbXsrv_session_global.tdb from node 1

2017/11/07 16:55:24.450629 [recoverd:18166]: recovery: Pulled 1 records for db printer_list.tdb from node 2

2017/11/07 16:55:24.450821 [recoverd:18166]: recovery: Pulled 1 records for db ctdb.tdb from node 1

2017/11/07 16:55:24.450925 [recoverd:18166]: recovery: Pulled 0 records for db smbXsrv_open_global.tdb from node 2

2017/11/07 16:55:24.451397 [recoverd:18166]: recovery: Pulled 0 records for db smbXsrv_client_global.tdb from node 3

2017/11/07 16:55:24.457436 [recoverd:18166]: recovery: Pulled 0 records for db leases.tdb from node 2

2017/11/07 16:55:24.457454 [recoverd:18166]: recovery: Pulled 0 records for db locking.tdb from node 2

2017/11/07 16:55:24.457467 [recoverd:18166]: recovery: Pulled 0 records for db brlock.tdb from node 2

2017/11/07 16:55:24.457503 [recoverd:18166]: recovery: Pulled 4 records for db smbXsrv_tcon_global.tdb from node 2

2017/11/07 16:55:24.457518 [recoverd:18166]: recovery: Pulled 76 records for db registry.tdb from node 1

2017/11/07 16:55:24.457716 [recoverd:18166]: recovery: Pulled 4 records for db passdb.tdb from node 1

2017/11/07 16:55:24.457971 [recoverd:18166]: recovery: Pulled 1 records for db printer_list.tdb from node 3

2017/11/07 16:55:24.458008 [recoverd:18166]: recovery: Pulled 1 records for db smbXsrv_version_global.tdb from node 2

2017/11/07 16:55:24.458021 [recoverd:18166]: recovery: Pulled 0 records for db g_lock.tdb from node 2

2017/11/07 16:55:24.468453 [recoverd:18166]: recovery: Pulled 0 records for db smbXsrv_open_global.tdb from node 3

2017/11/07 16:55:24.468512 [recoverd:18166]: recovery: Pulled 18 records for db account_policy.tdb from node 1

2017/11/07 16:55:24.468762 [recoverd:18166]: recovery: Pulled 2 records for db secrets.tdb from node 1

2017/11/07 16:55:24.468926 [recoverd:18166]: recovery: Pulled 2 records for db share_info.tdb from node 1

2017/11/07 16:55:24.469084 [recoverd:18166]: recovery: Pulled 1 records for db group_mapping.tdb from node 1

2017/11/07 16:55:24.469195 [recoverd:18166]: recovery: Pulled 22 records for db serverid.tdb from node 2

2017/11/07 16:55:24.470119 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for ctdb.tdb

2017/11/07 16:55:24.488502 [recoverd:18166]: recovery: Pulled 0 records for db leases.tdb from node 3

2017/11/07 16:55:24.488521 [recoverd:18166]: recovery: Pulled 0 records for db locking.tdb from node 3

2017/11/07 16:55:24.488528 [recoverd:18166]: recovery: Pulled 0 records for db brlock.tdb from node 3

2017/11/07 16:55:24.488549 [recoverd:18166]: recovery: Pulled 4 records for db smbXsrv_tcon_global.tdb from node 3

2017/11/07 16:55:24.488574 [recoverd:18166]: recovery: Pulled 6 records for db smbXsrv_session_global.tdb from node 2

2017/11/07 16:55:24.489428 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for smbXsrv_client_global.tdb

2017/11/07 16:55:24.490312 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for registry.tdb

2017/11/07 16:55:24.491101 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for passdb.tdb

2017/11/07 16:55:24.491977 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for printer_list.tdb

2017/11/07 16:55:24.491997 [recoverd:18166]: recovery: Pushing buffer 0 with 1 records for ctdb.tdb

2017/11/07 16:55:24.503242 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for smbXsrv_open_global.tdb

2017/11/07 16:55:24.503986 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for account_policy.tdb

2017/11/07 16:55:24.504661 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for secrets.tdb

2017/11/07 16:55:24.516749 [recoverd:18166]: recovery: Pulled 1 records for db smbXsrv_version_global.tdb from node 3

2017/11/07 16:55:24.516788 [recoverd:18166]: recovery: Pulled 0 records for db g_lock.tdb from node 3

2017/11/07 16:55:24.517710 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for share_info.tdb

2017/11/07 16:55:24.538280 [17979]: Starting traverse on DB smbXsrv_session_global.tdb (id 690952)

2017/11/07 16:55:24.559654 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for group_mapping.tdb

2017/11/07 16:55:24.559679 [recoverd:18166]: recovery: Pushing buffer 0 with 0 records for smbXsrv_client_global.tdb

2017/11/07 16:55:24.559687 [recoverd:18166]: recovery: Pushing buffer 0 with 76 records for registry.tdb

2017/11/07 16:55:24.559693 [recoverd:18166]: recovery: Pushing buffer 0 with 4 records for passdb.tdb

2017/11/07 16:55:24.559698 [recoverd:18166]: recovery: Pushing buffer 0 with 1 records for printer_list.tdb

2017/11/07 16:55:24.559704 [recoverd:18166]: recovery: Pushed 1 records for db ctdb.tdb

2017/11/07 16:55:24.559709 [recoverd:18166]: recovery: Pushing buffer 0 with 0 records for smbXsrv_open_global.tdb

2017/11/07 16:55:24.559715 [recoverd:18166]: recovery: Pushing buffer 0 with 18 records for account_policy.tdb

2017/11/07 16:55:24.559732 [recoverd:18166]: recovery: Pushing buffer 0 with 2 records for secrets.tdb

2017/11/07 16:55:24.559738 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for leases.tdb

2017/11/07 16:55:24.559751 [recoverd:18166]: recovery: Pulled 22 records for db serverid.tdb from node 3

2017/11/07 16:55:24.559757 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for locking.tdb

2017/11/07 16:55:24.559763 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for brlock.tdb

2017/11/07 16:55:24.559768 [recoverd:18166]: recovery: Wrote 1 buffers of recovery records for smbXsrv_tcon_global.tdb

2017/11/07 16:55:24.559774 [recoverd:18166]: recovery: Pushing buffer 0 with 2 records for share_info.tdb

2017/11/07 16:55:24.559779 [recoverd:18166]: recovery: Pushing buffer 0 with 1 records for group_mapping.tdb

2017/11/07 16:55:24.559785 [recoverd:18166]: recovery: Pushed 0 records for db smbXsrv_client_global.tdb

2017/11/07 16:55:24.854408 [recoverd:18166]: recovery: Pushed 76 records for db registry.tdb

2017/11/07 16:55:24.868833 [17979]: Ending traverse on DB smbXsrv_session_global.tdb (id 690952), records 6

2017/11/07 16:55:24.868899 [recoverd:18166]: recovery: Pulled 6 records for db smbXsrv_session_global.tdb from node 3

2017/11/07 16:55:25.050368 [17979]: pnn 3 Invalid reqid 676817 in ctdb_reply_control

2017/11/07 16:55:25.050953 [17979]: pnn 3 Invalid reqid 676851 in ctdb_reply_control

2017/11/07 16:55:25.051108 [17979]: pnn 3 Invalid reqid 677800 in ctdb_reply_control

2017/11/07 16:55:25.051158 [17979]: pnn 3 Invalid reqid 677803 in ctdb_reply_control

2017/11/07 16:55:25.051629 [17979]: pnn 3 Invalid reqid 677806 in ctdb_reply_control

2017/11/07 16:55:25.051668 [17979]: pnn 3 Invalid reqid 677809 in ctdb_reply_control

2017/11/07 16:55:25.051729 [17979]: pnn 3 Invalid reqid 677812 in ctdb_reply_control

2017/11/07 16:55:25.052103 [17979]: pnn 3 Invalid reqid 677815 in ctdb_reply_control

2017/11/07 16:55:25.061086 [17979]: pnn 3 Invalid reqid 677818 in ctdb_reply_control

2017/11/07 16:55:25.061123 [17979]: pnn 3 Invalid reqid 677821 in ctdb_reply_control

2017/11/07 16:55:25.061551 [17979]: pnn 3 Invalid reqid 677824 in ctdb_reply_control

2017/11/07 16:55:25.061578 [17979]: pnn 3 Invalid reqid 677827 in ctdb_reply_control

2017/11/07 16:55:25.061890 [17979]: pnn 3 Invalid reqid 677838 in ctdb_reply_control

2017/11/07 16:55:25.061968 [17979]: pnn 3 Invalid reqid 677841 in ctdb_reply_control

2017/11/07 16:55:25.061988 [17979]: pnn 3 Invalid reqid 677844 in ctdb_reply_control

2017/11/07 16:55:25.062019 [17979]: pnn 3 Invalid reqid 677847 in ctdb_reply_control

2017/11/07 16:55:25.062032 [17979]: pnn 3 Invalid reqid 677850 in ctdb_reply_control

2017/11/07 16:55:25.062045 [17979]: pnn 3 Invalid reqid 677853 in ctdb_reply_control

2017/11/07 16:55:25.062409 [17979]: pnn 3 Invalid reqid 677856 in ctdb_reply_control

2017/11/07 16:55:54.451637 [17979]: dead count reached for node 2

2017/11/07 16:55:54.451736 [17979]: 192.168.50.230:4379: node 192.168.50.229:4379 is dead: 1 connected

2017/11/07 16:55:54.451877 [17979]: Tearing down connection to dead node :2

2017/11/07 16:55:54.451892 [recoverd:18166]: recovery: control WIPEDB failed for db smbXsrv_session_global.tdb on node 2, ret=-1

2017/11/07 16:55:54.451942 [recoverd:18166]: recovery: recover database 0x6b06a26d, attempt 2

2017/11/07 16:55:54.451964 [recoverd:18166]: recovery: control WIPEDB failed for db serverid.tdb on node 2, ret=-1

2017/11/07 16:55:54.451996 [recoverd:18166]: recovery: recover database 0x9ec2a880, attempt 2

2017/11/07 16:55:54.452019 [recoverd:18166]: recovery: control DB_TRANSACTION_COMMIT failed for db registry.tdb on node 2, ret=-1

2017/11/07 16:55:54.452082 [recoverd:18166]: recovery: recover database 0x6cf2837d, attempt 2

2017/11/07 16:55:54.452101 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for group_mapping.tdb on node 2, ret=-1

2017/11/07 16:55:54.452173 [recoverd:18166]: recovery: recover database 0xa1413774, attempt 2

2017/11/07 16:55:54.452197 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for share_info.tdb on node 2, ret=-1

2017/11/07 16:55:54.452289 [recoverd:18166]: recovery: recover database 0xc3078fba, attempt 2

2017/11/07 16:55:54.452310 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for secrets.tdb on node 2, ret=-1

2017/11/07 16:55:54.452362 [recoverd:18166]: recovery: recover database 0x7132c184, attempt 2

2017/11/07 16:55:54.452387 [recoverd:18166]: recovery: control DB_TRANSACTION_COMMIT failed for db smbXsrv_client_global.tdb on node 2, ret=-1

2017/11/07 16:55:54.452443 [recoverd:18166]: recovery: recover database 0x477d2e20, attempt 2

2017/11/07 16:55:54.452463 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for account_policy.tdb on node 2, ret=-1

2017/11/07 16:55:54.452526 [recoverd:18166]: recovery: recover database 0x2ca251cf, attempt 2

2017/11/07 16:55:54.452561 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for smbXsrv_open_global.tdb on node 2, ret=-1

2017/11/07 16:55:54.452631 [recoverd:18166]: recovery: recover database 0x66f71b8c, attempt 2

2017/11/07 16:55:54.452654 [recoverd:18166]: recovery: control DB_PUSH_START failed for db smbXsrv_tcon_global.tdb on node 2, ret=-1

2017/11/07 16:55:54.452722 [recoverd:18166]: recovery: recover database 0x68c12c2c, attempt 2

2017/11/07 16:55:54.452761 [recoverd:18166]: recovery: control DB_PUSH_START failed for db brlock.tdb on node 2, ret=-1

2017/11/07 16:55:54.452826 [recoverd:18166]: recovery: recover database 0x4e66c2b2, attempt 2

2017/11/07 16:55:54.452846 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for printer_list.tdb on node 2, ret=-1

2017/11/07 16:55:54.452928 [recoverd:18166]: recovery: recover database 0x5bcfcbd7, attempt 2

2017/11/07 16:55:54.452936 [recoverd:18166]: recovery: control DB_PUSH_CONFIRM failed for passdb.tdb on node 2, ret=-1

2017/11/07 16:55:54.452985 [recoverd:18166]: recovery: recover database 0x3ef19640, attempt 2

2017/11/07 16:55:54.453008 [recoverd:18166]: recovery: control DB_PUSH_START failed for db locking.tdb on node 2, ret=-1

2017/11/07 16:55:54.453070 [recoverd:18166]: recovery: recover database 0x7a19d84d, attempt 2

2017/11/07 16:55:54.453092 [recoverd:18166]: recovery: control DB_PUSH_START failed for db leases.tdb on node 2, ret=-1

2017/11/07 16:55:54.453141 [recoverd:18166]: recovery: recover database 0x06916e77, attempt 2

2017/11/07 16:55:54.453166 [recoverd:18166]: recovery: control WIPEDB failed for db g_lock.tdb on node 2, ret=-1

2017/11/07 16:55:54.453206 [recoverd:18166]: recovery: recover database 0x4d2a432b, attempt 2

2017/11/07 16:55:54.453227 [recoverd:18166]: recovery: control WIPEDB failed for db smbXsrv_version_global.tdb on node 2, ret=-1

2017/11/07 16:55:54.453271 [recoverd:18166]: recovery: recover database 0x521b7544, attempt 2

2017/11/07 16:55:54.453312 [recoverd:18166]: recovery: control DB_TRANSACTION_COMMIT failed for db ctdb.tdb on node 2, ret=-1

2017/11/07 16:55:54.453356 [recoverd:18166]: recovery: recover database 0x6645c6c4, attempt 2

2017/11/07 16:55:54.453680 [17979]: Freeze db: smbXsrv_session_global.tdb frozen

2017/11/07 16:55:54.453775 [17979]: Freeze db: serverid.tdb frozen

2017/11/07 16:55:54.453832 [17979]: Freeze db: registry.tdb frozen

2017/11/07 16:55:54.453899 [17979]: Freeze db: group_mapping.tdb frozen

2017/11/07 16:55:54.453911 [17979]: Freeze db: share_info.tdb frozen

2017/11/07 16:55:54.453956 [17979]: Freeze db: secrets.tdb frozen

2017/11/07 16:55:54.453969 [17979]: Freeze db: smbXsrv_client_global.tdb frozen

2017/11/07 16:55:54.454010 [17979]: Freeze db: account_policy.tdb frozen

2017/11/07 16:55:54.454021 [17979]: Freeze db: smbXsrv_open_global.tdb frozen



Look Forward to Your Reply!  Thanks!


Best regards,


spencer
Comment 1 Amitay Isaacs 2019-03-20 00:23:49 UTC
Samba/CTDB 4.5 is out of support.  Please use more recent version of CTDB.  If you notice similar problem, open a bug against appropriate version.

Thanks.
Comment 2 Amitay Isaacs 2019-03-20 00:24:48 UTC
Closing this bug report.