Bug 8404 - high unlink latency causes excessive locking.tdb fcntl lock contention
high unlink latency causes excessive locking.tdb fcntl lock contention
Status: NEW
Product: Samba 3.4
Classification: Unclassified
Component: Clustering
unspecified
All All
: P5 normal
: ---
Assigned To: Volker Lendecke
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-25 13:12 UTC by David Disseldorp
Modified: 2011-08-25 13:33 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Disseldorp 2011-08-25 13:12:17 UTC
Setup:
ctdb-1.0.114
samba-3.4.3-4 (with SCHEDULE_FOR_DELETION vacuuming enhancements)
ocfs2-kmp-default-1.4_2.6.32.29_0.3-4.15.3

We are experiencing high ctdb/smbd lockwait latencies while the cluster is under heavy load. Such occurrences are generally non-disruptive, but still cause smbd to be unresponsive over reasonably long periods leading to client side timeouts.

Adding extra debugging information to the fetch_locked code-path in Samba dbwrap_ctdb.c (70f9338bf2e6081916ffe5bb7cddf50b4e958b24) we can observe the lock contention occurs:

[2011/08/18 20:13:03,  0, pid=31234]
lib/dbwrap_ctdb.c:898(db_ctdb_record_destr)
  Held tdb 0x42fe72c5 lock for 10.292433 seconds

dbid:0x42fe72c5 name:locking.tdb path:/var/lib/ctdb/locking.tdb.0

smbd is holding a lock on the locking.tdb database for over 10 seconds (this has been seen to climb as high as 40 seconds). Unwinding the stack from this point we can see the locking.tdb.0 lock is held by smbd over the file unlink code path in close_remove_share_mode().

These delays can be directly attributed to poor OCFS2 delete performance with large data-sets. (http://oss.oracle.com/pipermail/ocfs2-devel/2011-July/008258.html).

As a potential solution, I'd like to avoid holding the share mode lock over the SMB_VFS_UNLINK() path.