Bug 15590 - libldb: performance issue with indexes
Summary: libldb: performance issue with indexes
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.19.5
Hardware: All All
: P5 normal with 15 votes (vote)
Target Milestone: ---
Assignee: Douglas Bagnall
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-22 14:16 UTC by Andreas Schneider
Modified: 2024-07-22 22:05 UTC (History)
4 users (show)

See Also:


Attachments
repro.sh (1.59 KB, text/plain)
2024-02-22 14:17 UTC, Andreas Schneider
no flags Details
patch to avoid copying key strings (1.44 KB, patch)
2024-06-25 23:22 UTC, Douglas Bagnall
no flags Details
patch for asn-ldb tree (1.66 KB, patch)
2024-06-25 23:43 UTC, Douglas Bagnall
no flags Details
draft patch for master (7.92 KB, patch)
2024-07-22 22:05 UTC, Douglas Bagnall
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schneider 2024-02-22 14:16:06 UTC
Hi,

my colleagues found a performance issue with current libldb.

Fedora 39:
$ rpm -q ldb-tools
ldb-tools-2.8.0-1.fc39.x86_64

$ bash repro.sh 30000
Added 30000 records successfully

real    0m0.534s
user    0m0.931s
sys     0m0.062s


$ bash repro.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m16.232s
user    0m16.122s
sys     0m0.238s

$ bash repro.sh 20000 indexes
Added 2 records successfully
Added 20000 records successfully

real    1m32.052s
user    1m30.924s
sys     0m1.235s


CentOS7 (container from out CI):
[samba@57da0cb767c4 ~]$ rpm -qf /usr/bin/ldbadd
samba-client-4.10.13-1.el7.x86_64

[samba@57da0cb767c4 ~]$ bash repro.sh 20000 indexes
Added 2 records successfully
Added 20000 records successfully

real    0m0.871s
user    0m1.231s
sys     0m0.140s
Comment 1 Andreas Schneider 2024-02-22 14:17:00 UTC
Created attachment 18258 [details]
repro.sh
Comment 2 Andrew Bartlett 2024-02-29 21:42:30 UTC
Can you bisect down to find the exact revision this fails on?

Also can you confirm both in as close to similar environments, eg both in docker, so we can rule out a change around fsync() etc?

Also note that the transaction behaviour for ldbadd is about the worst possible, it will open and close a transaction for each LDIF chuck individually.
Comment 3 Andreas Schneider 2024-03-01 10:11:00 UTC
I've bisected it on registry.gitlab.com/samba-team/devel/samba/samba-ci-centos8s:9a406973474a7903fe7fd6215226660911ed73c0



b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 is the first bad commit
commit b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6
Author: Gary Lockyer <gary@catalyst.net.nz>
Date:   Wed Mar 6 15:28:45 2019 +1300

    lib ldb key value: fix index buffering

    As a performance enhancement the key value layer maintains a cache of
    the index records, which is written to disk as part of a prepare commit.
    This patch adds an extra cache at the operation layer to ensure that the
    cached indexes remain consistent in the event of an operation failing.

    Add test to test for index corruption in a failed modify.

    Signed-off-by: Gary Lockyer <gary@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet@samba.org>
    Pair-Programmed-With: Andrew Bartlett <abartlet@samba.org>
    Signed-off-by: Andrew Bartlett <abartlet@samba.org>

 lib/ldb/ldb_key_value/ldb_kv.c       |  22 ---
 lib/ldb/ldb_key_value/ldb_kv.h       |  12 ++
 lib/ldb/ldb_key_value/ldb_kv_index.c | 347 ++++++++++++++++++++++++++++++++---
 lib/ldb/tests/python/api.py          |  82 ++++++++-
 4 files changed, 415 insertions(+), 48 deletions(-)
Comment 4 Andreas Schneider 2024-05-02 12:45:55 UTC
The patches from Andréas LEROUX do not have any real impact on this issue.


tis-tdbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m9.035s
user    0m9.021s
sys     0m0.283s





tis-ldbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m8.929s
user    0m8.980s
sys     0m0.219s
Comment 5 Andreas Schneider 2024-05-02 14:09:34 UTC
I've started to comment out code and tried to optimize away allocations [1].

I nailed it down to tdb_traverse(ldb_kv_sub_transaction_traverse). The traverse callback is called 4 times for each of the 10000 entries.

If I comment out the tdb_store() inside ldb_kv_sub_transaction_traverse(), the reproducer is done in less than 1sec instead of 9 seconds.

If the tdb only has 4 entries and one of those entries gets updated each time. The question is, why do we use a tdb for this? I guess the tdb does locking here and I don't think for such a small database, that tdb is probably not the right thing?


[1] https://git.samba.org/?p=asn/samba.git;a=shortlog;h=refs/heads/asn-ldb
Comment 6 Andrew Bartlett 2024-05-02 20:38:43 UTC
I replied to Volker's comment with much the same conclusion here:

https://gitlab.com/samba-team/samba/-/merge_requests/3616#note_1890744831

Yes, the background here was that Gary was looking for a simple hash lookup or similar that was available in LDB's dependencies.

I agree, the best fix here would be a better data structure, as there will never be an optimal value (for the hash size). (Also the) patch was put under test, and makes some operations slower (sadly).
Comment 7 Douglas Bagnall 2024-06-25 09:09:57 UTC
Just some notes.

Looking in perf on a configure.developer build (last line of repro.sh changed to `time (make_ldif $NUM | perf record -- bin/ldbadd -H $DB)`):

    14.32%  ldbadd   libc.so.6                            [.] _int_malloc
    12.93%  ldbadd   libc.so.6                            [.] __memmove_evex_unaligned_erms
    10.65%  ldbadd   libtalloc-private-samba.so           [.] _tc_free_children_internal
    10.59%  ldbadd   libtalloc-private-samba.so           [.] _tc_free_internal
     8.28%  ldbadd   libc.so.6                            [.] _int_free
     7.64%  ldbadd   libtalloc-private-samba.so           [.] __talloc_with_prefix
     6.50%  ldbadd   libc.so.6                            [.] unlink_chunk.constprop.0
     4.38%  ldbadd   libtalloc-private-samba.so           [.] talloc_chunk_from_ptr
     2.74%  ldbadd   libc.so.6                            [.] malloc
     2.58%  ldbadd   libldb-key-value-private-samba.so    [.] ldb_kv_dn_list_load
     2.50%  ldbadd   libtalloc-private-samba.so           [.] tc_alloc_pool
     2.39%  ldbadd   libtalloc-private-samba.so           [.] _talloc_memdup
     2.18%  ldbadd   libtalloc-private-samba.so           [.] _talloc_named_const
     2.05%  ldbadd   libc.so.6                            [.] cfree@GLIBC_2.2.5
     1.22%  ldbadd   libtalloc-private-samba.so           [.] tc_memlimit_update_on_free
     0.94%  ldbadd   libtalloc-private-samba.so           [.] __talloc
     0.89%  ldbadd   libtalloc-private-samba.so           [.] _talloc_chunk_set_free
     0.78%  ldbadd   libtalloc-private-samba.so           [.] talloc_memlimit_check
     0.74%  ldbadd   libtalloc-private-samba.so           [.] talloc_memlimit_grow
     0.49%  ldbadd   libtalloc-private-samba.so           [.] _tc_set_name_const
     0.22%  ldbadd   libtalloc-private-samba.so           [.] memcpy@plt
     0.20%  ldbadd   libtalloc-private-samba.so           [.] malloc@plt


it's almost all fiddling with memory apart from __memmove_evex_unaligned_erms() and ldb_kv_dn_list_load().


And `perf record -g` says

-   64.35%     0.01%  ldbadd   libldb-key-value-private-samba.so     [.] ldb_kv_add
   - 64.35% ldb_kv_add
      - 39.25% ldb_kv_add_internal
         - 38.46% ldb_kv_index_add_new
            - 38.46% ldb_kv_index_add_all
               - 38.39% ldb_kv_index_add_el
                  - 38.37% ldb_kv_index_add1
                     - 37.25% ldb_kv_dn_list_load
                        - 21.19% _talloc_memdup
                           - 18.90% _talloc_named_const
                              - 16.18% __talloc
                                 - 12.79% __talloc_with_prefix
                                      2.32% tc_alloc_pool
                                      2.29% talloc_chunk_from_ptr
                                   1.14% talloc_chunk_from_ptr
                          12.44% __memmove_evex_unaligned_erms
      - 25.02% ldb_kv_sub_transaction_commit
         - 25.02% ldb_kv_index_sub_transaction_commit
            - 24.99% tdb_traverse
               - 24.98% tdb_traverse_internal
                  - 24.89% ldb_kv_sub_transaction_traverse
                     - 24.55% _talloc_free
                        - 24.54% _talloc_free_internal
                           - _tc_free_internal
                              - 24.34% _tc_free_children_internal
                                 - 12.44% _tc_free_internal
                                      1.06% tc_memlimit_update_on_free
                                      0.74% _talloc_chunk_set_free
                                      0.54% _tc_free_children_internal
                                   1.28% cfree@GLIBC_2.2.5
                                   0.50% _tc_free_children_internal

which again blames ldb_kv_dn_list_load() for spending an inordinate amount of time allocating memory for ldb_kv_index_sub_transaction_commit() to free. We probably don't need to do all that.

Elapsed time seems slightly worse than quadratically related to list size.
Comment 8 Douglas Bagnall 2024-06-25 09:18:31 UTC
That last comment was on the asn-ldb branch, but origin/master is pretty much the same.
Comment 9 Douglas Bagnall 2024-06-25 23:22:47 UTC
Created attachment 18356 [details]
patch to avoid copying key strings

(In reply to Douglas Bagnall from comment #7)
> Elapsed time seems slightly worse than quadratically related to list size.

That made me suspect this bit:

-		for (x = 0; x < list2->count; x++) {
-			dns[x].length = list2->dn[x].length;
-			dns[x].data = talloc_memdup(
-				dns,
-				list2->dn[x].data,
-				list2->dn[x].length);
-			if (dns[x].data == NULL) {
-				TALLOC_FREE(dns);
-				return LDB_ERR_OPERATIONS_ERROR;
-			}
-		}

which loops over all the dns that have been added so far. If this is replaced with a simple memcpy(), then list2->dn[x].data and list->dn[x].data will point to the same strings, but this is probably OK for the sub-transaction because we are never mutating the string in place, and probably explicitly not freeing it anywhere -- but if we are (e.g. the delete case), we can just stop doing that.

The result is still quadratic, since the memcpy grows with list2->count, but probably not super-quadratic, since we avoid all the angst talloc goes through when freeing long lists.

$ bash repro-15590.sh 10000 indexes

BEFORE:
real	0m18.368s
user	0m18.248s
sys	0m0.252s

AFTER:
real	0m0.642s
user	0m0.619s
sys	0m0.129s

This is on master, with only the attached patch.
Comment 10 Douglas Bagnall 2024-06-25 23:43:59 UTC
Created attachment 18357 [details]
patch for asn-ldb tree

Putting a more succinct version of the patch on Andreas' RBTREE experiments gives

real	0m0.591s
user	0m0.571s
sys	0m0.125s

which is a slight improvement.
Comment 11 Andreas Schneider 2024-06-26 06:09:33 UTC
I think this patch is great approach as it can be easily back-ported! I think we want to apply that and continue work from there.


The issue is that in-memory tdb is still not optimal, see:

https://gitlab.com/samba-team/samba/-/merge_requests/3616

So we might want the patch from 3616 too, as it can be back-ported easily.
Comment 12 Andreas Schneider 2024-06-26 06:11:37 UTC
It is a huge improvement but not the ultimate solution yet :-)

$ bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m0.542s
user    0m0.538s
sys     0m0.210s

$ bash repro_dev_ldb.sh 30000 indexes
Added 2 records successfully
Added 30000 records successfully

real    0m3.531s
user    0m2.158s
sys     0m2.086s

$ bash repro_dev_ldb.sh 50000 indexes
Added 2 records successfully
Added 50000 records successfully

real    0m9.215s
user    0m4.732s
sys     0m5.780s
Comment 13 Douglas Bagnall 2024-06-26 21:51:58 UTC
(In reply to Andreas Schneider from comment #12)
it was not entirely successful:
https://gitlab.com/samba-team/devel/samba/-/pipelines/1349612471
Comment 14 Jo Sutton 2024-07-11 05:42:28 UTC
On a typical 64‐bit system — and assuming the indices are 128‐bit GUIDs — ‘struct ldb_val’ takes up the same amount of space as storing the data inline like ‘uint8_t data[LDB_KV_GUID_SIZE]’. If we make that change (storing data inline), then we can apply Douglas’s patch as‐is; we’ll use less memory, and need fewer heap allocations; and because there’s no associated memory to keep track of, we shouldn’t have to worry about use‐after‐frees.

What do you think, Andreas? Is this idea at all sensible? How easy would it be to cope with LDB_KV_INDEXING_VERSION databases that aren’t using GUIDs?
Comment 15 Andreas Schneider 2024-07-11 07:56:14 UTC
Jo, I never really worked with ldb. This means I don't understand much of it. 

However here is my understanding of the issue:

We have two in-memory databases. One holds the information for transactions and the other holds information for sub-transactions. I think the problem is that we copy/duplicate the data from the sub-transaction database to the transaction database with tdb_store(). We do that in ldb_kv_sub_transaction_traverse(). I think that tdb_store() is the super expensive operation.

If we would use a different data structure, like an rbtree instead of in-memory tdb, we could just remove a node from the sub-transaction rbtree and add it to the transaction rbtree. We would just do pointer operations avoiding the duplication of the memory.
Comment 16 Andreas Schneider 2024-07-11 10:54:27 UTC
If we only to tdb_traverse() and never do a search, a linked list might even work :-)
Comment 17 Andreas Schneider 2024-07-12 08:26:09 UTC
Douglas looked into ldb_kv_dn_list_load(), which is, as far as I understand, loading the data from the on-disk database into the memory tdb. I'm not sure if this could be improved and it looks more like a cost we have and can't really avoid. However I'm not the expert.
Comment 18 Jo Sutton 2024-07-14 23:39:32 UTC
(In reply to Andreas Schneider from comment #17)
I think that function could be improved if it were possible to store the GUIDs in one big array, rather than as individually allocated ldb_val structures. But I’ll discuss this with Douglas when I have the opportunity.
Comment 19 Douglas Bagnall 2024-07-21 12:19:54 UTC
I think I have fixed the problem with my earlier patch -- we'll see how it looks in the morning.

BTW, if the repro.sh script is exactly like the production problem, using 'ldbadd -o batch_mode' should solve it:

time (make_ldif $NUM | ldbadd -H $DB -o batch_mode)
Comment 20 Douglas Bagnall 2024-07-22 22:05:20 UTC
Created attachment 18383 [details]
draft patch for master

I think this patch is enough of an answer.

A merge request is likely soon.