Bug 15125 - Performance regression on contended path based operations
Summary: Performance regression on contended path based operations
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.16.3
Hardware: All All
: P5 normal (vote)
Target Milestone: 4.17
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL: https://gitlab.com/samba-team/samba/-...
Keywords:
Depends on: 15146
Blocks:
  Show dependency treegraph
 
Reported: 2022-07-18 19:37 UTC by Stefan Metzmacher
Modified: 2022-09-20 13:04 UTC (History)
5 users (show)

See Also:


Attachments
Additional patches for v4-17-test (26.14 KB, patch)
2022-08-22 06:09 UTC, Stefan Metzmacher
jra: review+
slow: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2022-07-18 19:37:29 UTC
More later...
Comment 1 Stefan Metzmacher 2022-07-19 13:35:19 UTC
If a lot of clients (a few hundred) access (open/close) the same path at the
time they all try to grab a lock on the locking.tdb record for the inode.

The following (recently added) test demonstrates the behavior change between
different Samba versions. I was testing on a AMD Ryzen 9 5950X 16-Core Processor
using for cpu in $(seq 0 31); do cpufreq-set -c $cpu -g performance; done
and exporting /dev/shm.

smbtorture //127.0.0.1/m -Utest%test smb2.create.bench-path-contention-shared \
        --option='torture:bench_path=' \
        --option="torture:timelimit=60" \
        --option="torture:nprocs=256"


On the server I used:
- v4-6-test:
  open[num/s=4977,avslat=0.025800,minlat=0.000036,maxlat=0.383098]
  close[num/s=4975,avslat=0.025981,minlat=0.000021,maxlat=0.355635]

- v4-11-test:
  open[num/s=4515,avslat=0.028414,minlat=0.000039,maxlat=0.084519]
  close[num/s=4513,avslat=0.028810,minlat=0.000021,maxlat=0.074114]

  This is little bit less than 4.6, but no major change.

- v4-12-test:
  open[num/s=36458,avslat=0.003613,minlat=0.000040,maxlat=0.015829]
  close[num/s=36456,avslat=0.003590,minlat=0.000023,maxlat=0.015742]

  This was a bit unexpected to me as ~36500 ops/s is the hardware limit
  I also get by using fileid and nolock. The reason is that 4.12
  had a separate share_entries.tdb, which means there's no real contention
  on the locking.tdb entry. And we never hit the dbwrap_watch code path.

  This is also the number we get when disabling lock contention
  via vfs_fileid. It seems to be the maximum for that test case on
  the server.

- v4-13-test - master:
  open[num/s=54,avslat=2.207252,minlat=0.006880,maxlat=26.734115]
  close[num/s=52,avslat=2.303759,minlat=0.007108,maxlat=25.945049]

  The change to g_lock re-added a thundering herd problem again
  because we always go via the dbwrap_watch when someone else
  hold the g_lock already. Currently any g_lock modification
  g_lock_lock(), modify of the payload and g_lock_unlock()
  wakes all watchers and then all but one comeback and
  see the record is still locked.

With the changes in https://gitlab.com/samba-team/samba/-/merge_requests/2608
we have an optimized interaction between g_lock and dbwrap_watch and we get the
following result:

  open[num/s=10249,avslat=0.012551,minlat=0.000072,maxlat=0.051488] 
  close[num/s=10247,avslat=0.012576,minlat=0.000031,maxlat=0.051464]

This is more than twice than 4.6 and 200 times than 4.13-master.


For the future are some ideas to improve this further by avoiding the current pattern of:

1. lck = get_share_mode_lock()
         1.1 g_lock_lock()
2. do some syscalls
3. set_share_mode()
    3.1 g_lock_writev_data()
4. TALLOC_FREE(lck)
    4.1 g_lock_unlock()

In a non conflicting case we could have a g_lock_do_locked()
that directly stores our share_mode entry if there's not existing g_lock,
Then we can have a share_mode_keep(lck), that tells the destructor to do
nothing, otherwise the destructor would remove the optimistic share_mode_entry
again. With that we would only fallback to the existing pattern is there's a conflict or an existing g_lock.

For the fallback we could even defer set_share_mode() and have
a g_lock_writev_data_and_unlock() function.

This would hopefully get us back to the 4.12 case where we had share_entries.tdb.

The idea of the share_entries.tdb can also comeback in a way
that we split locking.tdb records based cluster nodes
and maybe on a modulo operation on the pid. It means
the entry header (struct share_mode_data) with share_mode_flags flags
will be mirrored in all entries and only on conflict we need to check all
individual records. (The single opener case should still be optimized and only work on a single record).
Comment 2 Samba QA Contact 2022-07-26 14:33:03 UTC
This bug was referenced in samba master:

f26b22cc8e784af1dc64e0aa9a2c948385e2b667
3f88b700a9e73b2371cba3bb867c4c625bc36c7c
5af37ae6970674573bb6f32c6c0f347718b8a2a9
77db4b666f0eb6668b6e0bc3d445048980542bfd
cdf1c37a90c24cc32953865a0bb909fd71fbbf05
420a595c1baa6062fcb7f672cf032287b4521a76
9356b1701c43c9ec2a71252a31e09555a1f5a3ea
7226d0b365625ab0e5ff0ffe8b8fe3924ae1a569
6702b3b0da729b824455a5efc7b50ebc03f835e8
cb012e45c91b4b7d1572864b7e60de36c96a38af
b3f6668f93b73a5980dc418c6b1dacde330f7497
2342489f526a43fe2d56f3859b6df0e26538870d
c0febbd3e19a1c40c48d93f38f9223a7feab8f30
095fafbe0c901180b30253b9adb6b2de3bf827f2
eb89748ee4a9c58da8064b41d0253e00c30c213f
726f468ccdb432c35dadef1aa4af6a041cbf46f2
6e45da1a38d31c18e38397f354596f6c4654c8b1
5021abff88ccaeb8b7a2b274e4ddc9ea046b2c34
6b173bf1569b4c099223a090f021465eee04747e
39cdcec49c6578430d91c450645de1522fe26c2a
1c84980d7c3ca33c61c8137902af2fc7a0988824
8908af569551def4e06c9aa179cf77e0beaa2638
2129d352ae0d90867ebe8101529817a12223d5ee
908eea1202459c8ffdf8e43eb310cf8690693824
1fb9db8c994b60720ea82167b5a91808293640e0
cc9c8b8e7e0e2ab8512fb443b89780a11bc966ae
044e018e9a243466bdbf075304a2715f36c0199a
2eb6a209493e228e433a565154c7c4cd076c0a4a
50163da3096d71c08e9c3ff13dd3ecb252a60f4e
f62beaa2c25043a1b3dde408ce7d8f7327ac3e13
b865bb28abedd18d5db0ae3ea31908c75ee1129b
52720516998091206399e9b2387d8ee8265e4660
2e92267920ddee7b96f32e243eb7b21124ed554f
20f3fd021911a118228815cb32d436069b0273d1
e33143099b37a4d79a1cd45ca43a5a5d9d63b261
67af3586d989be9d6a8fe7e7789250451b03f2bb
98269bd5f31a2521b756e0a20fba82e9122582f7
6e701d02ee2d0fc304157395c451d3b972128cfc
9d9991166322477781f20372ffd7c19d1632276c
d5c7e2e27385a4730ff8674650700bd25b77f975
Comment 3 Stefan Metzmacher 2022-08-16 09:10:52 UTC
(In reply to Stefan Metzmacher from comment #1)

https://gitlab.com/samba-team/samba/-/merge_requests/2670 will have
additional improvements:

The following test with 256 commections all looping with open/close
on the same inode (share root) is improved drastically:

  smbtorture //127.0.0.1/m -Uroot%test smb2.create.bench-path-contention-shared \
     --option='torture:bench_path=' \
     --option="torture:timelimit=60" \
     --option="torture:nprocs=256" \
     --option="torture:qdepth=1"

From some like this:

   open[num/s=11536,avslat=0.011450,minlat=0.000039,maxlat=0.052707]
   close[num/s=11534,avslat=0.010878,minlat=0.000022,maxlat=0.052342]

to:

   open[num/s=27502,avslat=0.004729,minlat=0.000040,maxlat=0.060064]
   close[num/s=27500,avslat=0.004639,minlat=0.000022,maxlat=0.061187]
Comment 4 Samba QA Contact 2022-08-16 19:38:19 UTC
This bug was referenced in samba v4-17-test:

c12a8d50837d5073d89753a400b5ffbcdf3321e7
Comment 5 Samba QA Contact 2022-08-16 20:19:00 UTC
This bug was referenced in samba v4-17-stable (Release samba-4.17.0rc2):

c12a8d50837d5073d89753a400b5ffbcdf3321e7
Comment 6 Samba QA Contact 2022-08-19 19:40:03 UTC
This bug was referenced in samba master:

bf1dd1a188c096093bedc628a14bb037e3209630
bb3dddcdf11e6c2f5319d64bf2ef20636d0ed82f
c75de325710c0fbbd50a0acd3af55404165440d6
0fbca175ae4763d82f8a414ee3d6354c95d5294e
8b3b316680221487f84a7cfe14f52e8ffd64ba85
bf8f2258497f7d2a5a5f8d1cacf1a30899ed455c
76da56aa65bb9fe7f2f8c4a2e30e278a61db1ff5
60ae7a5a2ed9a03d8693b9b455b7b3696386aeb1
d4f18f99d3a40a8df00beb006e2731959aa6fad9
Comment 7 Stefan Metzmacher 2022-08-22 06:09:24 UTC
Created attachment 17488 [details]
Additional patches for v4-17-test
Comment 8 Jule Anger 2022-08-23 06:03:23 UTC
Pushed to autobuild-v4-17-test.
Comment 9 Samba QA Contact 2022-08-23 08:58:11 UTC
This bug was referenced in samba v4-17-test:

208037a7eeab373154773056ac8ecab5f7e49de3
e4538e70cbe603ce2f26e1efa456aed9afae71b1
f207ef332244c0bc935dfbbe668d78617803994c
6bf37ba4538b309bf455c26ab0aeca6ba376d552
411af5fb48c076500be41b795b7627edf3554b8e
fa8d19056bd71d0581a5a66d2ea9bab38b8af768
e764e40ad5569b2976d7a891d868a30281e73f44
cb63afbda1b07b5fafe09a28a03aeedbd353346c
0b15ebced782382bef77877524d45197332e4488
Comment 10 Jule Anger 2022-08-23 09:30:15 UTC
Closing out bug report.

Thanks!
Comment 11 Samba QA Contact 2022-08-23 14:51:08 UTC
This bug was referenced in samba v4-17-stable (Release samba-4.17.0rc3):

208037a7eeab373154773056ac8ecab5f7e49de3
e4538e70cbe603ce2f26e1efa456aed9afae71b1
f207ef332244c0bc935dfbbe668d78617803994c
6bf37ba4538b309bf455c26ab0aeca6ba376d552
411af5fb48c076500be41b795b7627edf3554b8e
fa8d19056bd71d0581a5a66d2ea9bab38b8af768
e764e40ad5569b2976d7a891d868a30281e73f44
cb63afbda1b07b5fafe09a28a03aeedbd353346c
0b15ebced782382bef77877524d45197332e4488
Comment 12 Samba QA Contact 2022-09-20 01:35:21 UTC
This bug was referenced in samba master:

fb2776f790cc6f8a4da38339c13ebde1f56e9550
50188fb441b43d84daca607e3d74f1c2187cedae
9b815ab65ba5b28fd86797202cc5bd6236e9ab81
3ef567472e1d1c6b0c59445d6dcb792e0ded26dd
170a4812a6b05b667ea6fd7a73d417e09e24e010
bd088b4639213af8a2ee8c700e2c1105ba095674
d8de42c1558559107bbb22e0c05738621a4ac753
c61a375f14b7a3d98472eedf9873ab0973010211
3e5775084aa423d946c0264df594497f64bf2b82
e12c3b56daed806b79076ad8f0fd7c0c9bbb4ae4
b19e50634a8abd569a0170113000a37998d28f30
dd4a94ec925cf09da20c6c4f40a879131ef4be67
f0e0c0af20d93456241cc2079b7282c07ab1ad62
ca2dce3147d6d011db9526a4658271619274992a
96fe4239131f4cf7749ed25f48ba435ddee9d166
e1d1b3403e58f96947ebfaa3c633a75c3edb2cc8
5bba79d639b6498073e3917b8112bb50b4aa0c39
703a4ff525655c6f99ac24351be7eb898d4ded28
db78fe13a37d7143f622d8f2e710f2ebd2d97abc
f4e0a6fe00bc61b052ab3f1094bbce113e127d1d
65715e3431a87f519528b0daa7d877f668875a84
6ab4457b4be139c4e5a3f44ce9bf8018ad09a58b
e7cf1b07b6928f18b0147f4b6b74a97ae3fb667b
641bfc5905b0ec21ec7c65cda5139cb676d24917
f2fdeb17ec432314a81c06863f59110aad4f558c
aa7df0fb9fa62cf2f6bc774c020da26ce878e7b0
f971a4ae31fd664ecb0ab3fd2c2ab1fc2275e68d
7c6113de2b61f9a87906bb56c1b2a1f4c7974e89
0d5a96038164aef951473c94e97edd743ef0c593
7e2ec6ee5670c42bdbb88507fa82c657c035c865
b508c5a0be6565355351cb036ce6495e35d76862
b6789ff1c07202d43822c5c7cc0c51df84aba4cd
4d697278386b924441f5119dc998f06755aee915
357adc2f27e3efaeb4c38d84de06176175b6c39b
d42bb5d831ccc2e5134a2c3776546e1dda2736ac
45253acc81e613d0ab6a47d55c008f95af421bf1
7d982e8558567c6baaae888e023bde5a83ca1714
2e7ccb72a075f39ea6a406cba0ae53a1fd5314ff
c5334b0db4910a252c4eb9434385a5b70cf194d8
ee78d948129bca8650bf1729b7d15826f2e564cd
e018c6347f1e5d306f801458ab7e68434bd582c3
db0e67329297e34ade2062e73ec023a6d46efb00
faf9388e96028bb499f0ec9e34d5bb12ad26a0e8
557323ca03713c0f0f7b1b5d0f6116095ec405e0
eef0c8e25bcedb6b0028866c7540c6feeabb0dd0
f63b9b5311f3a51dcc5cfe2c1ed0adb960033933
e6ef5474009be017a3d8dc4edbad0b15d12ae877
06e2aa3cba7af00bd8f9ee92496cd6e4e94f14a1
8979311c6bde49d6210ce64b37c28b6f6fa527fb
47f1d9362e93b0d6262df6642b495afe010d8149
c5c7a377c3d395883a9d9cad3bb4256044aac0dc
432272a7c834b3ebf56d335a1f458fdc392f7e51
bb7d765663813875a5391df203038edc9747bf0d
c8458f237cd415f35dd5582876c80a9b44d0a053
42f96d29335dd1ee361f6af70460aa9ff8d33d63
1198e8c0f6c0277e0c32a6c1091b222a6d8a5f14
2474b063da95d3ef8662f9c7c27ed96e37f7a722
b9edf3c6428652671eae8a5ec8e18460c6d31196
1288989f0f5043f3a3dff9fda217f41b16d33958
b80bc6307cf4ec7d54284e1b940ce9f7430e6716
0fbd125453f7cf63e158d56b130b500e362fcbcb
ca9014d0378e1da8c30da4aae99a05005fe89b10
0b94695ebf8eb2edd6de7fc549857393461e4d50
977498d3eb821c6c30c5a991949054f796579966
a2f6f96ac74b25e99c9765bff341bfc9060a7bbf
d7f429469280676594f0c98882c5d82dc358da6e
c1ec8310496ef7355b950bcf1e4b2d882740aa94
3829acc4743db0f395ed3c3945c343a11a9e9486
6f2ce1fd34642e56a68b0997decbf50255063ea4
3c26ee84ce8bcd50e3788b1c4df5ebe2d101899c
3a5174136d6706540ace5567d4e8193534dbd13a
a75194d41b4257800595e2036772953318285d4c
d19fa657d725e887ec0cd2f19b479e827230faa6
7cac6eb5d810f78e52292964808f7e163ced105c
6bda68910e2a2d9b986d6d65c9790684b2f15e48
63291ea5c5d32e295b3638afd530e805ec59a190
01c629a409860e5cccfc64e78fa63c90303a4b7a
37c9600ff1babbb3e1fc2db3d29dfb12ca0c707a
17e496c6f91ec464766fc562f31381e057cebe65
b971a21aa3494d1ac57bfebdb9e1d224e5d79c9c
cba169252ea270bb725ec06aff71d841492099f5
775dc007d211b8153fbc5741f52dc77f92d9c314
150308d1d0d891bd86e46da0134ad15c251dc5f9
0796c5de6f382b96cfb65bc659a0c34ab0b0af58
aae504cdaa086735150ce1108075a9d926d65bec
4d06aa1550bbc980a983e881a5f1394fb6f87c1b
0bfdae92db09a2943a531f9dc0aa53c1c5c70f00
0a8619c8458fe1ab3445b9b6b22ec8ffd86e9e06
9e619f535fa561a5d37d98f3809726bffd6ff91d
26669613e2dc673d55e0f8977d7758477eab6fd6
a4dd4d5f0fdb8cb242dde93cf620f238fccf9e9c
387f126d0749355eed32f75708d488ef6ad17349
095da847e747b732c16c0a7f8516fb535b0f0f1c
ce868b095c0154401e3f4af7f296a575c8701863
dab7df932119ab10bb9fa88c8adbe02064bfee16
b0082076f9f3d81a6b47a692c7263be5b85ff99d
ac811f6f8cbc268d95f1204214614640b928fb3f
0f02f68f9f197ee7ec4b24a32a7b5a4b985ebe9b
f9ea78398949109ba7fec90cbaaf294a446042ca
d04b6e9dd0d933e99848547efc9d17edc437c1be
1ae7e47a6b0e47c8c78af91188007eafc1239835
12f6c129219670ab0d7392434f88751dedace6ed
680c7907325b433856ac1dd916ab63e671fbe4ab
3b6255b5b902a062744912a6a3a82f7fb7279b23