the Samba Server stops to respond, while CTDB is vacuuming its databases. This causes any active transaction with the server to be cancelled. The problem seems to occur only, when ctdbd is vacuuming the locking.tdb. This takes 30 seconds while one of the ctdb processes causes up to 100% load on one CPU.
2009/09/14 12:24:25.485562 [ 2817]: Start a vacuuming child process for db locking.tdb
2009/09/14 12:24:25.504799 : Repacking locking.tdb with 15467 freelist entries
2009/09/14 12:24:55.486642 [ 2817]: Vacuuming child process timed out for db locking.tdb
2009/09/14 12:24:55.486733 [ 2817]: Vacuuming took 30.000 seconds for database locking.tdb
2009/09/14 12:24:55.486760 [ 2817]: Start new vacuum event for locking.tdb
Error at the clients side (smbclient):
Receiving SMB: Server stopped responding
Call timed out: server did not respond after 20000 milliseconds listing \*
Error in dskattr: Call timed out: server did not respond after 20000 milliseconds
- 250 users (average 150-400 locked files)
- 2 nodes (IBM POWER5 4x 1.5GHz)
- CTDB 1.0.88 & Samba 3.2.14
- RHEL 5.3 (ppc64)
Please let me know, if you need additional information.
(In reply to comment #0)
More precisely, Samba only stops to respond, while the Recovery Master is vacuuming the locking.tdb. This also only affects sessions to the recmaster node. Could not prove this effect on the second node.
Changed state to "invalid" because of improper combination of CTDB (1.0.88) and Samba (3.2.14).