Hi, my colleagues found a performance issue with current libldb. Fedora 39: $ rpm -q ldb-tools ldb-tools-2.8.0-1.fc39.x86_64 $ bash repro.sh 30000 Added 30000 records successfully real 0m0.534s user 0m0.931s sys 0m0.062s $ bash repro.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m16.232s user 0m16.122s sys 0m0.238s $ bash repro.sh 20000 indexes Added 2 records successfully Added 20000 records successfully real 1m32.052s user 1m30.924s sys 0m1.235s CentOS7 (container from out CI): [samba@57da0cb767c4 ~]$ rpm -qf /usr/bin/ldbadd samba-client-4.10.13-1.el7.x86_64 [samba@57da0cb767c4 ~]$ bash repro.sh 20000 indexes Added 2 records successfully Added 20000 records successfully real 0m0.871s user 0m1.231s sys 0m0.140s
Created attachment 18258 [details] repro.sh
Can you bisect down to find the exact revision this fails on? Also can you confirm both in as close to similar environments, eg both in docker, so we can rule out a change around fsync() etc? Also note that the transaction behaviour for ldbadd is about the worst possible, it will open and close a transaction for each LDIF chuck individually.
I've bisected it on registry.gitlab.com/samba-team/devel/samba/samba-ci-centos8s:9a406973474a7903fe7fd6215226660911ed73c0 b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 is the first bad commit commit b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 Author: Gary Lockyer <gary@catalyst.net.nz> Date: Wed Mar 6 15:28:45 2019 +1300 lib ldb key value: fix index buffering As a performance enhancement the key value layer maintains a cache of the index records, which is written to disk as part of a prepare commit. This patch adds an extra cache at the operation layer to ensure that the cached indexes remain consistent in the event of an operation failing. Add test to test for index corruption in a failed modify. Signed-off-by: Gary Lockyer <gary@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Pair-Programmed-With: Andrew Bartlett <abartlet@samba.org> Signed-off-by: Andrew Bartlett <abartlet@samba.org> lib/ldb/ldb_key_value/ldb_kv.c | 22 --- lib/ldb/ldb_key_value/ldb_kv.h | 12 ++ lib/ldb/ldb_key_value/ldb_kv_index.c | 347 ++++++++++++++++++++++++++++++++--- lib/ldb/tests/python/api.py | 82 ++++++++- 4 files changed, 415 insertions(+), 48 deletions(-)
The patches from Andréas LEROUX do not have any real impact on this issue. tis-tdbfind.patch: bash repro_dev_ldb.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m9.035s user 0m9.021s sys 0m0.283s tis-ldbfind.patch: bash repro_dev_ldb.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m8.929s user 0m8.980s sys 0m0.219s
I've started to comment out code and tried to optimize away allocations [1]. I nailed it down to tdb_traverse(ldb_kv_sub_transaction_traverse). The traverse callback is called 4 times for each of the 10000 entries. If I comment out the tdb_store() inside ldb_kv_sub_transaction_traverse(), the reproducer is done in less than 1sec instead of 9 seconds. If the tdb only has 4 entries and one of those entries gets updated each time. The question is, why do we use a tdb for this? I guess the tdb does locking here and I don't think for such a small database, that tdb is probably not the right thing? [1] https://git.samba.org/?p=asn/samba.git;a=shortlog;h=refs/heads/asn-ldb
I replied to Volker's comment with much the same conclusion here: https://gitlab.com/samba-team/samba/-/merge_requests/3616#note_1890744831 Yes, the background here was that Gary was looking for a simple hash lookup or similar that was available in LDB's dependencies. I agree, the best fix here would be a better data structure, as there will never be an optimal value (for the hash size). (Also the) patch was put under test, and makes some operations slower (sadly).
Just some notes. Looking in perf on a configure.developer build (last line of repro.sh changed to `time (make_ldif $NUM | perf record -- bin/ldbadd -H $DB)`): 14.32% ldbadd libc.so.6 [.] _int_malloc 12.93% ldbadd libc.so.6 [.] __memmove_evex_unaligned_erms 10.65% ldbadd libtalloc-private-samba.so [.] _tc_free_children_internal 10.59% ldbadd libtalloc-private-samba.so [.] _tc_free_internal 8.28% ldbadd libc.so.6 [.] _int_free 7.64% ldbadd libtalloc-private-samba.so [.] __talloc_with_prefix 6.50% ldbadd libc.so.6 [.] unlink_chunk.constprop.0 4.38% ldbadd libtalloc-private-samba.so [.] talloc_chunk_from_ptr 2.74% ldbadd libc.so.6 [.] malloc 2.58% ldbadd libldb-key-value-private-samba.so [.] ldb_kv_dn_list_load 2.50% ldbadd libtalloc-private-samba.so [.] tc_alloc_pool 2.39% ldbadd libtalloc-private-samba.so [.] _talloc_memdup 2.18% ldbadd libtalloc-private-samba.so [.] _talloc_named_const 2.05% ldbadd libc.so.6 [.] cfree@GLIBC_2.2.5 1.22% ldbadd libtalloc-private-samba.so [.] tc_memlimit_update_on_free 0.94% ldbadd libtalloc-private-samba.so [.] __talloc 0.89% ldbadd libtalloc-private-samba.so [.] _talloc_chunk_set_free 0.78% ldbadd libtalloc-private-samba.so [.] talloc_memlimit_check 0.74% ldbadd libtalloc-private-samba.so [.] talloc_memlimit_grow 0.49% ldbadd libtalloc-private-samba.so [.] _tc_set_name_const 0.22% ldbadd libtalloc-private-samba.so [.] memcpy@plt 0.20% ldbadd libtalloc-private-samba.so [.] malloc@plt it's almost all fiddling with memory apart from __memmove_evex_unaligned_erms() and ldb_kv_dn_list_load(). And `perf record -g` says - 64.35% 0.01% ldbadd libldb-key-value-private-samba.so [.] ldb_kv_add - 64.35% ldb_kv_add - 39.25% ldb_kv_add_internal - 38.46% ldb_kv_index_add_new - 38.46% ldb_kv_index_add_all - 38.39% ldb_kv_index_add_el - 38.37% ldb_kv_index_add1 - 37.25% ldb_kv_dn_list_load - 21.19% _talloc_memdup - 18.90% _talloc_named_const - 16.18% __talloc - 12.79% __talloc_with_prefix 2.32% tc_alloc_pool 2.29% talloc_chunk_from_ptr 1.14% talloc_chunk_from_ptr 12.44% __memmove_evex_unaligned_erms - 25.02% ldb_kv_sub_transaction_commit - 25.02% ldb_kv_index_sub_transaction_commit - 24.99% tdb_traverse - 24.98% tdb_traverse_internal - 24.89% ldb_kv_sub_transaction_traverse - 24.55% _talloc_free - 24.54% _talloc_free_internal - _tc_free_internal - 24.34% _tc_free_children_internal - 12.44% _tc_free_internal 1.06% tc_memlimit_update_on_free 0.74% _talloc_chunk_set_free 0.54% _tc_free_children_internal 1.28% cfree@GLIBC_2.2.5 0.50% _tc_free_children_internal which again blames ldb_kv_dn_list_load() for spending an inordinate amount of time allocating memory for ldb_kv_index_sub_transaction_commit() to free. We probably don't need to do all that. Elapsed time seems slightly worse than quadratically related to list size.
That last comment was on the asn-ldb branch, but origin/master is pretty much the same.
Created attachment 18356 [details] patch to avoid copying key strings (In reply to Douglas Bagnall from comment #7) > Elapsed time seems slightly worse than quadratically related to list size. That made me suspect this bit: - for (x = 0; x < list2->count; x++) { - dns[x].length = list2->dn[x].length; - dns[x].data = talloc_memdup( - dns, - list2->dn[x].data, - list2->dn[x].length); - if (dns[x].data == NULL) { - TALLOC_FREE(dns); - return LDB_ERR_OPERATIONS_ERROR; - } - } which loops over all the dns that have been added so far. If this is replaced with a simple memcpy(), then list2->dn[x].data and list->dn[x].data will point to the same strings, but this is probably OK for the sub-transaction because we are never mutating the string in place, and probably explicitly not freeing it anywhere -- but if we are (e.g. the delete case), we can just stop doing that. The result is still quadratic, since the memcpy grows with list2->count, but probably not super-quadratic, since we avoid all the angst talloc goes through when freeing long lists. $ bash repro-15590.sh 10000 indexes BEFORE: real 0m18.368s user 0m18.248s sys 0m0.252s AFTER: real 0m0.642s user 0m0.619s sys 0m0.129s This is on master, with only the attached patch.
Created attachment 18357 [details] patch for asn-ldb tree Putting a more succinct version of the patch on Andreas' RBTREE experiments gives real 0m0.591s user 0m0.571s sys 0m0.125s which is a slight improvement.
I think this patch is great approach as it can be easily back-ported! I think we want to apply that and continue work from there. The issue is that in-memory tdb is still not optimal, see: https://gitlab.com/samba-team/samba/-/merge_requests/3616 So we might want the patch from 3616 too, as it can be back-ported easily.
It is a huge improvement but not the ultimate solution yet :-) $ bash repro_dev_ldb.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m0.542s user 0m0.538s sys 0m0.210s $ bash repro_dev_ldb.sh 30000 indexes Added 2 records successfully Added 30000 records successfully real 0m3.531s user 0m2.158s sys 0m2.086s $ bash repro_dev_ldb.sh 50000 indexes Added 2 records successfully Added 50000 records successfully real 0m9.215s user 0m4.732s sys 0m5.780s
(In reply to Andreas Schneider from comment #12) it was not entirely successful: https://gitlab.com/samba-team/devel/samba/-/pipelines/1349612471
On a typical 64‐bit system — and assuming the indices are 128‐bit GUIDs — ‘struct ldb_val’ takes up the same amount of space as storing the data inline like ‘uint8_t data[LDB_KV_GUID_SIZE]’. If we make that change (storing data inline), then we can apply Douglas’s patch as‐is; we’ll use less memory, and need fewer heap allocations; and because there’s no associated memory to keep track of, we shouldn’t have to worry about use‐after‐frees. What do you think, Andreas? Is this idea at all sensible? How easy would it be to cope with LDB_KV_INDEXING_VERSION databases that aren’t using GUIDs?
Jo, I never really worked with ldb. This means I don't understand much of it. However here is my understanding of the issue: We have two in-memory databases. One holds the information for transactions and the other holds information for sub-transactions. I think the problem is that we copy/duplicate the data from the sub-transaction database to the transaction database with tdb_store(). We do that in ldb_kv_sub_transaction_traverse(). I think that tdb_store() is the super expensive operation. If we would use a different data structure, like an rbtree instead of in-memory tdb, we could just remove a node from the sub-transaction rbtree and add it to the transaction rbtree. We would just do pointer operations avoiding the duplication of the memory.
If we only to tdb_traverse() and never do a search, a linked list might even work :-)
Douglas looked into ldb_kv_dn_list_load(), which is, as far as I understand, loading the data from the on-disk database into the memory tdb. I'm not sure if this could be improved and it looks more like a cost we have and can't really avoid. However I'm not the expert.
(In reply to Andreas Schneider from comment #17) I think that function could be improved if it were possible to store the GUIDs in one big array, rather than as individually allocated ldb_val structures. But I’ll discuss this with Douglas when I have the opportunity.
I think I have fixed the problem with my earlier patch -- we'll see how it looks in the morning. BTW, if the repro.sh script is exactly like the production problem, using 'ldbadd -o batch_mode' should solve it: time (make_ldif $NUM | ldbadd -H $DB -o batch_mode)
Created attachment 18383 [details] draft patch for master I think this patch is enough of an answer. A merge request is likely soon.
This bug was referenced in samba master: 5f0198d69843c864f2b98a7c0c6305ad789a68a0 1bf9ede94f0a6b41fb18e880e59a8e390f8c21d3
I suppose we shall presumably want to backport the fixes?
Created attachment 18401 [details] patch for 4.20
Created attachment 18402 [details] patch for 4.19
Created attachment 18403 [details] patch for 4.19
Created attachment 18404 [details] patch for 4.20
Created attachment 18405 [details] patch for 4.20
This bug was referenced in samba master: e2a74963fb89f5409c236a0fbe4cd070e1a75a43
Created attachment 18434 [details] patch for 4.20 v2 I added a third commit with the same bug number to suppress a harmless but plausible static analysis warning, which might as well be in this patch. The 4.20 and 4.19 patches are the same, and also apply to 4.17 and probably further back.
Created attachment 18435 [details] patch for 4.19 v2
Andreas, I bounced the backport patches back to you for review.
Jule, please apply the patches to the corresponding branches.
Pushed to autobuild-v4-{20,19}-test.
This bug was referenced in samba v4-20-test: 226b0a20bd1272adb958bd83ede03922f8c182e6 76e1024f4c2852caedf5e7bfe5b132c90b2ac5b2 0a99463b3e00dc5bda2759b7c8fa6e334899fe55
This bug was referenced in samba v4-19-test: 9a617692d29de6d2704dceb05e9b4468d2bdfb58 9b33893987d8686cda8695f5a044872d8be7381c 5a15cbb15da64a0ba5c0cfde5ab736712083ea24
Closing out bug report. Thanks!
This bug was referenced in samba v4-19-stable (Release samba-4.19.9): 9a617692d29de6d2704dceb05e9b4468d2bdfb58 9b33893987d8686cda8695f5a044872d8be7381c 5a15cbb15da64a0ba5c0cfde5ab736712083ea24
This bug was referenced in samba v4-20-stable (Release samba-4.20.6): 226b0a20bd1272adb958bd83ede03922f8c182e6 76e1024f4c2852caedf5e7bfe5b132c90b2ac5b2 0a99463b3e00dc5bda2759b7c8fa6e334899fe55