Bug 15590 - libldb: performance issue with indexes
Summary: libldb: performance issue with indexes
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.19.5
Hardware: All All
: P5 normal with 15 votes (vote)
Target Milestone: ---
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-22 14:16 UTC by Andreas Schneider
Modified: 2024-11-19 14:48 UTC (History)
5 users (show)

See Also:


Attachments
repro.sh (1.59 KB, text/plain)
2024-02-22 14:17 UTC, Andreas Schneider
no flags Details
patch to avoid copying key strings (1.44 KB, patch)
2024-06-25 23:22 UTC, Douglas Bagnall
no flags Details
patch for asn-ldb tree (1.66 KB, patch)
2024-06-25 23:43 UTC, Douglas Bagnall
no flags Details
draft patch for master (7.92 KB, patch)
2024-07-22 22:05 UTC, Douglas Bagnall
no flags Details
patch for 4.20 (113 bytes, patch)
2024-08-12 09:02 UTC, Andreas Schneider
no flags Details
patch for 4.19 (113 bytes, patch)
2024-08-12 09:03 UTC, Andreas Schneider
no flags Details
patch for 4.19 (8.23 KB, patch)
2024-08-12 09:04 UTC, Andreas Schneider
no flags Details
patch for 4.20 (113 bytes, patch)
2024-08-12 09:05 UTC, Andreas Schneider
no flags Details
patch for 4.20 (8.23 KB, patch)
2024-08-12 09:08 UTC, Andreas Schneider
no flags Details
patch for 4.20 v2 (9.72 KB, patch)
2024-09-06 00:37 UTC, Douglas Bagnall
asn: review+
Details
patch for 4.19 v2 (9.72 KB, patch)
2024-09-06 00:38 UTC, Douglas Bagnall
asn: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schneider 2024-02-22 14:16:06 UTC
Hi,

my colleagues found a performance issue with current libldb.

Fedora 39:
$ rpm -q ldb-tools
ldb-tools-2.8.0-1.fc39.x86_64

$ bash repro.sh 30000
Added 30000 records successfully

real    0m0.534s
user    0m0.931s
sys     0m0.062s


$ bash repro.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m16.232s
user    0m16.122s
sys     0m0.238s

$ bash repro.sh 20000 indexes
Added 2 records successfully
Added 20000 records successfully

real    1m32.052s
user    1m30.924s
sys     0m1.235s


CentOS7 (container from out CI):
[samba@57da0cb767c4 ~]$ rpm -qf /usr/bin/ldbadd
samba-client-4.10.13-1.el7.x86_64

[samba@57da0cb767c4 ~]$ bash repro.sh 20000 indexes
Added 2 records successfully
Added 20000 records successfully

real    0m0.871s
user    0m1.231s
sys     0m0.140s
Comment 1 Andreas Schneider 2024-02-22 14:17:00 UTC
Created attachment 18258 [details]
repro.sh
Comment 2 Andrew Bartlett 2024-02-29 21:42:30 UTC
Can you bisect down to find the exact revision this fails on?

Also can you confirm both in as close to similar environments, eg both in docker, so we can rule out a change around fsync() etc?

Also note that the transaction behaviour for ldbadd is about the worst possible, it will open and close a transaction for each LDIF chuck individually.
Comment 3 Andreas Schneider 2024-03-01 10:11:00 UTC
I've bisected it on registry.gitlab.com/samba-team/devel/samba/samba-ci-centos8s:9a406973474a7903fe7fd6215226660911ed73c0



b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 is the first bad commit
commit b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6
Author: Gary Lockyer <gary@catalyst.net.nz>
Date:   Wed Mar 6 15:28:45 2019 +1300

    lib ldb key value: fix index buffering

    As a performance enhancement the key value layer maintains a cache of
    the index records, which is written to disk as part of a prepare commit.
    This patch adds an extra cache at the operation layer to ensure that the
    cached indexes remain consistent in the event of an operation failing.

    Add test to test for index corruption in a failed modify.

    Signed-off-by: Gary Lockyer <gary@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet@samba.org>
    Pair-Programmed-With: Andrew Bartlett <abartlet@samba.org>
    Signed-off-by: Andrew Bartlett <abartlet@samba.org>

 lib/ldb/ldb_key_value/ldb_kv.c       |  22 ---
 lib/ldb/ldb_key_value/ldb_kv.h       |  12 ++
 lib/ldb/ldb_key_value/ldb_kv_index.c | 347 ++++++++++++++++++++++++++++++++---
 lib/ldb/tests/python/api.py          |  82 ++++++++-
 4 files changed, 415 insertions(+), 48 deletions(-)
Comment 4 Andreas Schneider 2024-05-02 12:45:55 UTC
The patches from Andréas LEROUX do not have any real impact on this issue.


tis-tdbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m9.035s
user    0m9.021s
sys     0m0.283s





tis-ldbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m8.929s
user    0m8.980s
sys     0m0.219s
Comment 5 Andreas Schneider 2024-05-02 14:09:34 UTC
I've started to comment out code and tried to optimize away allocations [1].

I nailed it down to tdb_traverse(ldb_kv_sub_transaction_traverse). The traverse callback is called 4 times for each of the 10000 entries.

If I comment out the tdb_store() inside ldb_kv_sub_transaction_traverse(), the reproducer is done in less than 1sec instead of 9 seconds.

If the tdb only has 4 entries and one of those entries gets updated each time. The question is, why do we use a tdb for this? I guess the tdb does locking here and I don't think for such a small database, that tdb is probably not the right thing?


[1] https://git.samba.org/?p=asn/samba.git;a=shortlog;h=refs/heads/asn-ldb
Comment 6 Andrew Bartlett 2024-05-02 20:38:43 UTC
I replied to Volker's comment with much the same conclusion here:

https://gitlab.com/samba-team/samba/-/merge_requests/3616#note_1890744831

Yes, the background here was that Gary was looking for a simple hash lookup or similar that was available in LDB's dependencies.

I agree, the best fix here would be a better data structure, as there will never be an optimal value (for the hash size). (Also the) patch was put under test, and makes some operations slower (sadly).
Comment 7 Douglas Bagnall 2024-06-25 09:09:57 UTC
Just some notes.

Looking in perf on a configure.developer build (last line of repro.sh changed to `time (make_ldif $NUM | perf record -- bin/ldbadd -H $DB)`):

    14.32%  ldbadd   libc.so.6                            [.] _int_malloc
    12.93%  ldbadd   libc.so.6                            [.] __memmove_evex_unaligned_erms
    10.65%  ldbadd   libtalloc-private-samba.so           [.] _tc_free_children_internal
    10.59%  ldbadd   libtalloc-private-samba.so           [.] _tc_free_internal
     8.28%  ldbadd   libc.so.6                            [.] _int_free
     7.64%  ldbadd   libtalloc-private-samba.so           [.] __talloc_with_prefix
     6.50%  ldbadd   libc.so.6                            [.] unlink_chunk.constprop.0
     4.38%  ldbadd   libtalloc-private-samba.so           [.] talloc_chunk_from_ptr
     2.74%  ldbadd   libc.so.6                            [.] malloc
     2.58%  ldbadd   libldb-key-value-private-samba.so    [.] ldb_kv_dn_list_load
     2.50%  ldbadd   libtalloc-private-samba.so           [.] tc_alloc_pool
     2.39%  ldbadd   libtalloc-private-samba.so           [.] _talloc_memdup
     2.18%  ldbadd   libtalloc-private-samba.so           [.] _talloc_named_const
     2.05%  ldbadd   libc.so.6                            [.] cfree@GLIBC_2.2.5
     1.22%  ldbadd   libtalloc-private-samba.so           [.] tc_memlimit_update_on_free
     0.94%  ldbadd   libtalloc-private-samba.so           [.] __talloc
     0.89%  ldbadd   libtalloc-private-samba.so           [.] _talloc_chunk_set_free
     0.78%  ldbadd   libtalloc-private-samba.so           [.] talloc_memlimit_check
     0.74%  ldbadd   libtalloc-private-samba.so           [.] talloc_memlimit_grow
     0.49%  ldbadd   libtalloc-private-samba.so           [.] _tc_set_name_const
     0.22%  ldbadd   libtalloc-private-samba.so           [.] memcpy@plt
     0.20%  ldbadd   libtalloc-private-samba.so           [.] malloc@plt


it's almost all fiddling with memory apart from __memmove_evex_unaligned_erms() and ldb_kv_dn_list_load().


And `perf record -g` says

-   64.35%     0.01%  ldbadd   libldb-key-value-private-samba.so     [.] ldb_kv_add
   - 64.35% ldb_kv_add
      - 39.25% ldb_kv_add_internal
         - 38.46% ldb_kv_index_add_new
            - 38.46% ldb_kv_index_add_all
               - 38.39% ldb_kv_index_add_el
                  - 38.37% ldb_kv_index_add1
                     - 37.25% ldb_kv_dn_list_load
                        - 21.19% _talloc_memdup
                           - 18.90% _talloc_named_const
                              - 16.18% __talloc
                                 - 12.79% __talloc_with_prefix
                                      2.32% tc_alloc_pool
                                      2.29% talloc_chunk_from_ptr
                                   1.14% talloc_chunk_from_ptr
                          12.44% __memmove_evex_unaligned_erms
      - 25.02% ldb_kv_sub_transaction_commit
         - 25.02% ldb_kv_index_sub_transaction_commit
            - 24.99% tdb_traverse
               - 24.98% tdb_traverse_internal
                  - 24.89% ldb_kv_sub_transaction_traverse
                     - 24.55% _talloc_free
                        - 24.54% _talloc_free_internal
                           - _tc_free_internal
                              - 24.34% _tc_free_children_internal
                                 - 12.44% _tc_free_internal
                                      1.06% tc_memlimit_update_on_free
                                      0.74% _talloc_chunk_set_free
                                      0.54% _tc_free_children_internal
                                   1.28% cfree@GLIBC_2.2.5
                                   0.50% _tc_free_children_internal

which again blames ldb_kv_dn_list_load() for spending an inordinate amount of time allocating memory for ldb_kv_index_sub_transaction_commit() to free. We probably don't need to do all that.

Elapsed time seems slightly worse than quadratically related to list size.
Comment 8 Douglas Bagnall 2024-06-25 09:18:31 UTC
That last comment was on the asn-ldb branch, but origin/master is pretty much the same.
Comment 9 Douglas Bagnall 2024-06-25 23:22:47 UTC
Created attachment 18356 [details]
patch to avoid copying key strings

(In reply to Douglas Bagnall from comment #7)
> Elapsed time seems slightly worse than quadratically related to list size.

That made me suspect this bit:

-		for (x = 0; x < list2->count; x++) {
-			dns[x].length = list2->dn[x].length;
-			dns[x].data = talloc_memdup(
-				dns,
-				list2->dn[x].data,
-				list2->dn[x].length);
-			if (dns[x].data == NULL) {
-				TALLOC_FREE(dns);
-				return LDB_ERR_OPERATIONS_ERROR;
-			}
-		}

which loops over all the dns that have been added so far. If this is replaced with a simple memcpy(), then list2->dn[x].data and list->dn[x].data will point to the same strings, but this is probably OK for the sub-transaction because we are never mutating the string in place, and probably explicitly not freeing it anywhere -- but if we are (e.g. the delete case), we can just stop doing that.

The result is still quadratic, since the memcpy grows with list2->count, but probably not super-quadratic, since we avoid all the angst talloc goes through when freeing long lists.

$ bash repro-15590.sh 10000 indexes

BEFORE:
real	0m18.368s
user	0m18.248s
sys	0m0.252s

AFTER:
real	0m0.642s
user	0m0.619s
sys	0m0.129s

This is on master, with only the attached patch.
Comment 10 Douglas Bagnall 2024-06-25 23:43:59 UTC
Created attachment 18357 [details]
patch for asn-ldb tree

Putting a more succinct version of the patch on Andreas' RBTREE experiments gives

real	0m0.591s
user	0m0.571s
sys	0m0.125s

which is a slight improvement.
Comment 11 Andreas Schneider 2024-06-26 06:09:33 UTC
I think this patch is great approach as it can be easily back-ported! I think we want to apply that and continue work from there.


The issue is that in-memory tdb is still not optimal, see:

https://gitlab.com/samba-team/samba/-/merge_requests/3616

So we might want the patch from 3616 too, as it can be back-ported easily.
Comment 12 Andreas Schneider 2024-06-26 06:11:37 UTC
It is a huge improvement but not the ultimate solution yet :-)

$ bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m0.542s
user    0m0.538s
sys     0m0.210s

$ bash repro_dev_ldb.sh 30000 indexes
Added 2 records successfully
Added 30000 records successfully

real    0m3.531s
user    0m2.158s
sys     0m2.086s

$ bash repro_dev_ldb.sh 50000 indexes
Added 2 records successfully
Added 50000 records successfully

real    0m9.215s
user    0m4.732s
sys     0m5.780s
Comment 13 Douglas Bagnall 2024-06-26 21:51:58 UTC
(In reply to Andreas Schneider from comment #12)
it was not entirely successful:
https://gitlab.com/samba-team/devel/samba/-/pipelines/1349612471
Comment 14 Jennifer Sutton 2024-07-11 05:42:28 UTC
On a typical 64‐bit system — and assuming the indices are 128‐bit GUIDs — ‘struct ldb_val’ takes up the same amount of space as storing the data inline like ‘uint8_t data[LDB_KV_GUID_SIZE]’. If we make that change (storing data inline), then we can apply Douglas’s patch as‐is; we’ll use less memory, and need fewer heap allocations; and because there’s no associated memory to keep track of, we shouldn’t have to worry about use‐after‐frees.

What do you think, Andreas? Is this idea at all sensible? How easy would it be to cope with LDB_KV_INDEXING_VERSION databases that aren’t using GUIDs?
Comment 15 Andreas Schneider 2024-07-11 07:56:14 UTC
Jo, I never really worked with ldb. This means I don't understand much of it. 

However here is my understanding of the issue:

We have two in-memory databases. One holds the information for transactions and the other holds information for sub-transactions. I think the problem is that we copy/duplicate the data from the sub-transaction database to the transaction database with tdb_store(). We do that in ldb_kv_sub_transaction_traverse(). I think that tdb_store() is the super expensive operation.

If we would use a different data structure, like an rbtree instead of in-memory tdb, we could just remove a node from the sub-transaction rbtree and add it to the transaction rbtree. We would just do pointer operations avoiding the duplication of the memory.
Comment 16 Andreas Schneider 2024-07-11 10:54:27 UTC
If we only to tdb_traverse() and never do a search, a linked list might even work :-)
Comment 17 Andreas Schneider 2024-07-12 08:26:09 UTC
Douglas looked into ldb_kv_dn_list_load(), which is, as far as I understand, loading the data from the on-disk database into the memory tdb. I'm not sure if this could be improved and it looks more like a cost we have and can't really avoid. However I'm not the expert.
Comment 18 Jennifer Sutton 2024-07-14 23:39:32 UTC
(In reply to Andreas Schneider from comment #17)
I think that function could be improved if it were possible to store the GUIDs in one big array, rather than as individually allocated ldb_val structures. But I’ll discuss this with Douglas when I have the opportunity.
Comment 19 Douglas Bagnall 2024-07-21 12:19:54 UTC
I think I have fixed the problem with my earlier patch -- we'll see how it looks in the morning.

BTW, if the repro.sh script is exactly like the production problem, using 'ldbadd -o batch_mode' should solve it:

time (make_ldif $NUM | ldbadd -H $DB -o batch_mode)
Comment 20 Douglas Bagnall 2024-07-22 22:05:20 UTC
Created attachment 18383 [details]
draft patch for master

I think this patch is enough of an answer.

A merge request is likely soon.
Comment 21 Samba QA Contact 2024-07-27 23:52:04 UTC
This bug was referenced in samba master:

5f0198d69843c864f2b98a7c0c6305ad789a68a0
1bf9ede94f0a6b41fb18e880e59a8e390f8c21d3
Comment 22 Jennifer Sutton 2024-08-02 02:02:44 UTC
I suppose we shall presumably want to backport the fixes?
Comment 23 Andreas Schneider 2024-08-12 09:02:25 UTC
Created attachment 18401 [details]
patch for 4.20
Comment 24 Andreas Schneider 2024-08-12 09:03:23 UTC
Created attachment 18402 [details]
patch for 4.19
Comment 25 Andreas Schneider 2024-08-12 09:04:47 UTC
Created attachment 18403 [details]
patch for 4.19
Comment 26 Andreas Schneider 2024-08-12 09:05:25 UTC
Created attachment 18404 [details]
patch for 4.20
Comment 27 Andreas Schneider 2024-08-12 09:08:53 UTC
Created attachment 18405 [details]
patch for 4.20
Comment 28 Samba QA Contact 2024-08-28 05:40:04 UTC
This bug was referenced in samba master:

e2a74963fb89f5409c236a0fbe4cd070e1a75a43
Comment 29 Douglas Bagnall 2024-09-06 00:37:57 UTC
Created attachment 18434 [details]
patch for 4.20 v2

I added a third commit with the same bug number to suppress a harmless but plausible static analysis warning, which might as well be in this patch.

The 4.20 and 4.19 patches are the same, and also apply to 4.17 and probably further back.
Comment 30 Douglas Bagnall 2024-09-06 00:38:45 UTC
Created attachment 18435 [details]
patch for 4.19 v2
Comment 31 Douglas Bagnall 2024-10-04 23:35:10 UTC
Andreas, I bounced the backport patches back to you for review.
Comment 32 Andreas Schneider 2024-10-07 06:06:12 UTC
Jule, please apply the patches to the corresponding branches.
Comment 33 Jule Anger 2024-10-07 08:27:52 UTC
Pushed to autobuild-v4-{20,19}-test.
Comment 34 Samba QA Contact 2024-10-07 09:46:04 UTC
This bug was referenced in samba v4-20-test:

226b0a20bd1272adb958bd83ede03922f8c182e6
76e1024f4c2852caedf5e7bfe5b132c90b2ac5b2
0a99463b3e00dc5bda2759b7c8fa6e334899fe55
Comment 35 Samba QA Contact 2024-10-07 15:23:04 UTC
This bug was referenced in samba v4-19-test:

9a617692d29de6d2704dceb05e9b4468d2bdfb58
9b33893987d8686cda8695f5a044872d8be7381c
5a15cbb15da64a0ba5c0cfde5ab736712083ea24
Comment 36 Jule Anger 2024-10-09 09:04:00 UTC
Closing out bug report.

Thanks!
Comment 37 Samba QA Contact 2024-10-17 12:04:54 UTC
This bug was referenced in samba v4-19-stable (Release samba-4.19.9):

9a617692d29de6d2704dceb05e9b4468d2bdfb58
9b33893987d8686cda8695f5a044872d8be7381c
5a15cbb15da64a0ba5c0cfde5ab736712083ea24
Comment 38 Samba QA Contact 2024-11-19 14:48:28 UTC
This bug was referenced in samba v4-20-stable (Release samba-4.20.6):

226b0a20bd1272adb958bd83ede03922f8c182e6
76e1024f4c2852caedf5e7bfe5b132c90b2ac5b2
0a99463b3e00dc5bda2759b7c8fa6e334899fe55