Bug 6720 - Session timeout while CTDB is vacuuming TDBs
Summary: Session timeout while CTDB is vacuuming TDBs
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: unspecified
Hardware: PPC Linux
: P3 normal
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Michael Adam
Depends on:
Reported: 2009-09-14 06:04 UTC by Christoph Schmidt
Modified: 2009-09-22 14:31 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Schmidt 2009-09-14 06:04:24 UTC

the Samba Server stops to respond, while CTDB is vacuuming its databases. This causes any active transaction with the server to be cancelled. The problem seems to occur only, when ctdbd is vacuuming the locking.tdb. This takes 30 seconds while one of the ctdb processes causes up to 100% load on one CPU.

CTDB log:
2009/09/14 12:24:25.485562 [ 2817]: Start a vacuuming child process for db locking.tdb
2009/09/14 12:24:25.504799 [24079]: Repacking locking.tdb with 15467 freelist entries
2009/09/14 12:24:55.486642 [ 2817]: Vacuuming child process timed out for db locking.tdb
2009/09/14 12:24:55.486733 [ 2817]: Vacuuming took 30.000 seconds for database locking.tdb
2009/09/14 12:24:55.486760 [ 2817]: Start new vacuum event for locking.tdb

Error at the clients side (smbclient):
Receiving SMB: Server stopped responding
Call timed out: server did not respond after 20000 milliseconds listing \*
Error in dskattr: Call timed out: server did not respond after 20000 milliseconds

Our environment:
- 250 users (average 150-400 locked files)
- 2 nodes (IBM POWER5 4x 1.5GHz)
- CTDB 1.0.88 & Samba 3.2.14
- RHEL 5.3 (ppc64)

Please let me know, if you need additional information.

Comment 1 Christoph Schmidt 2009-09-14 06:17:33 UTC
(In reply to comment #0)

More precisely, Samba only stops to respond, while the Recovery Master is vacuuming the locking.tdb. This also only affects sessions to the recmaster node. Could not prove this effect on the second node.

Comment 2 Christoph Schmidt 2009-09-22 14:31:16 UTC
Changed state to "invalid" because of improper combination of CTDB (1.0.88) and Samba (3.2.14).