Bug 8404 - high unlink latency causes excessive locking.tdb fcntl lock contention
Summary: high unlink latency causes excessive locking.tdb fcntl lock contention
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: Clustering (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Volker Lendecke
QA Contact: Samba QA Contact
Depends on:
Reported: 2011-08-25 13:12 UTC by David Disseldorp
Modified: 2020-01-09 12:14 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description David Disseldorp 2011-08-25 13:12:17 UTC
samba-3.4.3-4 (with SCHEDULE_FOR_DELETION vacuuming enhancements)

We are experiencing high ctdb/smbd lockwait latencies while the cluster is under heavy load. Such occurrences are generally non-disruptive, but still cause smbd to be unresponsive over reasonably long periods leading to client side timeouts.

Adding extra debugging information to the fetch_locked code-path in Samba dbwrap_ctdb.c (70f9338bf2e6081916ffe5bb7cddf50b4e958b24) we can observe the lock contention occurs:

[2011/08/18 20:13:03,  0, pid=31234]
  Held tdb 0x42fe72c5 lock for 10.292433 seconds

dbid:0x42fe72c5 name:locking.tdb path:/var/lib/ctdb/locking.tdb.0

smbd is holding a lock on the locking.tdb database for over 10 seconds (this has been seen to climb as high as 40 seconds). Unwinding the stack from this point we can see the locking.tdb.0 lock is held by smbd over the file unlink code path in close_remove_share_mode().

These delays can be directly attributed to poor OCFS2 delete performance with large data-sets. (http://oss.oracle.com/pipermail/ocfs2-devel/2011-July/008258.html).

As a potential solution, I'd like to avoid holding the share mode lock over the SMB_VFS_UNLINK() path.
Comment 1 David Disseldorp 2020-01-09 12:14:33 UTC
Closing this ticket, as it's ancient.