Bug 15125 - Performance regression on contended path based operations
Summary: Performance regression on contended path based operations
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.16.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Stefan Metzmacher
QA Contact: Samba QA Contact
URL: https://gitlab.com/samba-team/samba/-...
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-18 19:37 UTC by Stefan Metzmacher
Modified: 2022-07-26 14:33 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2022-07-18 19:37:29 UTC
More later...
Comment 1 Stefan Metzmacher 2022-07-19 13:35:19 UTC
If a lot of clients (a few hundred) access (open/close) the same path at the
time they all try to grab a lock on the locking.tdb record for the inode.

The following (recently added) test demonstrates the behavior change between
different Samba versions. I was testing on a AMD Ryzen 9 5950X 16-Core Processor
using for cpu in $(seq 0 31); do cpufreq-set -c $cpu -g performance; done
and exporting /dev/shm.

smbtorture //127.0.0.1/m -Utest%test smb2.create.bench-path-contention-shared \
        --option='torture:bench_path=' \
        --option="torture:timelimit=60" \
        --option="torture:nprocs=256"


On the server I used:
- v4-6-test:
  open[num/s=4977,avslat=0.025800,minlat=0.000036,maxlat=0.383098]
  close[num/s=4975,avslat=0.025981,minlat=0.000021,maxlat=0.355635]

- v4-11-test:
  open[num/s=4515,avslat=0.028414,minlat=0.000039,maxlat=0.084519]
  close[num/s=4513,avslat=0.028810,minlat=0.000021,maxlat=0.074114]

  This is little bit less than 4.6, but no major change.

- v4-12-test:
  open[num/s=36458,avslat=0.003613,minlat=0.000040,maxlat=0.015829]
  close[num/s=36456,avslat=0.003590,minlat=0.000023,maxlat=0.015742]

  This was a bit unexpected to me as ~36500 ops/s is the hardware limit
  I also get by using fileid and nolock. The reason is that 4.12
  had a separate share_entries.tdb, which means there's no real contention
  on the locking.tdb entry. And we never hit the dbwrap_watch code path.

  This is also the number we get when disabling lock contention
  via vfs_fileid. It seems to be the maximum for that test case on
  the server.

- v4-13-test - master:
  open[num/s=54,avslat=2.207252,minlat=0.006880,maxlat=26.734115]
  close[num/s=52,avslat=2.303759,minlat=0.007108,maxlat=25.945049]

  The change to g_lock re-added a thundering herd problem again
  because we always go via the dbwrap_watch when someone else
  hold the g_lock already. Currently any g_lock modification
  g_lock_lock(), modify of the payload and g_lock_unlock()
  wakes all watchers and then all but one comeback and
  see the record is still locked.

With the changes in https://gitlab.com/samba-team/samba/-/merge_requests/2608
we have an optimized interaction between g_lock and dbwrap_watch and we get the
following result:

  open[num/s=10249,avslat=0.012551,minlat=0.000072,maxlat=0.051488] 
  close[num/s=10247,avslat=0.012576,minlat=0.000031,maxlat=0.051464]

This is more than twice than 4.6 and 200 times than 4.13-master.


For the future are some ideas to improve this further by avoiding the current pattern of:

1. lck = get_share_mode_lock()
         1.1 g_lock_lock()
2. do some syscalls
3. set_share_mode()
    3.1 g_lock_writev_data()
4. TALLOC_FREE(lck)
    4.1 g_lock_unlock()

In a non conflicting case we could have a g_lock_do_locked()
that directly stores our share_mode entry if there's not existing g_lock,
Then we can have a share_mode_keep(lck), that tells the destructor to do
nothing, otherwise the destructor would remove the optimistic share_mode_entry
again. With that we would only fallback to the existing pattern is there's a conflict or an existing g_lock.

For the fallback we could even defer set_share_mode() and have
a g_lock_writev_data_and_unlock() function.

This would hopefully get us back to the 4.12 case where we had share_entries.tdb.

The idea of the share_entries.tdb can also comeback in a way
that we split locking.tdb records based cluster nodes
and maybe on a modulo operation on the pid. It means
the entry header (struct share_mode_data) with share_mode_flags flags
will be mirrored in all entries and only on conflict we need to check all
individual records. (The single opener case should still be optimized and only work on a single record).
Comment 2 Samba QA Contact 2022-07-26 14:33:03 UTC
This bug was referenced in samba master:

f26b22cc8e784af1dc64e0aa9a2c948385e2b667
3f88b700a9e73b2371cba3bb867c4c625bc36c7c
5af37ae6970674573bb6f32c6c0f347718b8a2a9
77db4b666f0eb6668b6e0bc3d445048980542bfd
cdf1c37a90c24cc32953865a0bb909fd71fbbf05
420a595c1baa6062fcb7f672cf032287b4521a76
9356b1701c43c9ec2a71252a31e09555a1f5a3ea
7226d0b365625ab0e5ff0ffe8b8fe3924ae1a569
6702b3b0da729b824455a5efc7b50ebc03f835e8
cb012e45c91b4b7d1572864b7e60de36c96a38af
b3f6668f93b73a5980dc418c6b1dacde330f7497
2342489f526a43fe2d56f3859b6df0e26538870d
c0febbd3e19a1c40c48d93f38f9223a7feab8f30
095fafbe0c901180b30253b9adb6b2de3bf827f2
eb89748ee4a9c58da8064b41d0253e00c30c213f
726f468ccdb432c35dadef1aa4af6a041cbf46f2
6e45da1a38d31c18e38397f354596f6c4654c8b1
5021abff88ccaeb8b7a2b274e4ddc9ea046b2c34
6b173bf1569b4c099223a090f021465eee04747e
39cdcec49c6578430d91c450645de1522fe26c2a
1c84980d7c3ca33c61c8137902af2fc7a0988824
8908af569551def4e06c9aa179cf77e0beaa2638
2129d352ae0d90867ebe8101529817a12223d5ee
908eea1202459c8ffdf8e43eb310cf8690693824
1fb9db8c994b60720ea82167b5a91808293640e0
cc9c8b8e7e0e2ab8512fb443b89780a11bc966ae
044e018e9a243466bdbf075304a2715f36c0199a
2eb6a209493e228e433a565154c7c4cd076c0a4a
50163da3096d71c08e9c3ff13dd3ecb252a60f4e
f62beaa2c25043a1b3dde408ce7d8f7327ac3e13
b865bb28abedd18d5db0ae3ea31908c75ee1129b
52720516998091206399e9b2387d8ee8265e4660
2e92267920ddee7b96f32e243eb7b21124ed554f
20f3fd021911a118228815cb32d436069b0273d1
e33143099b37a4d79a1cd45ca43a5a5d9d63b261
67af3586d989be9d6a8fe7e7789250451b03f2bb
98269bd5f31a2521b756e0a20fba82e9122582f7
6e701d02ee2d0fc304157395c451d3b972128cfc
9d9991166322477781f20372ffd7c19d1632276c
d5c7e2e27385a4730ff8674650700bd25b77f975