Bug 15590 - libldb: peformance issue with indexes
Summary: libldb: peformance issue with indexes
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.19.5
Hardware: All All
: P5 normal with 15 votes (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-22 14:16 UTC by Andreas Schneider
Modified: 2024-05-02 20:50 UTC (History)
3 users (show)

See Also:


Attachments
repro.sh (1.59 KB, text/plain)
2024-02-22 14:17 UTC, Andreas Schneider
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schneider 2024-02-22 14:16:06 UTC
Hi,

my colleagues found a performance issue with current libldb.

Fedora 39:
$ rpm -q ldb-tools
ldb-tools-2.8.0-1.fc39.x86_64

$ bash repro.sh 30000
Added 30000 records successfully

real    0m0.534s
user    0m0.931s
sys     0m0.062s


$ bash repro.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m16.232s
user    0m16.122s
sys     0m0.238s

$ bash repro.sh 20000 indexes
Added 2 records successfully
Added 20000 records successfully

real    1m32.052s
user    1m30.924s
sys     0m1.235s


CentOS7 (container from out CI):
[samba@57da0cb767c4 ~]$ rpm -qf /usr/bin/ldbadd
samba-client-4.10.13-1.el7.x86_64

[samba@57da0cb767c4 ~]$ bash repro.sh 20000 indexes
Added 2 records successfully
Added 20000 records successfully

real    0m0.871s
user    0m1.231s
sys     0m0.140s
Comment 1 Andreas Schneider 2024-02-22 14:17:00 UTC
Created attachment 18258 [details]
repro.sh
Comment 2 Andrew Bartlett 2024-02-29 21:42:30 UTC
Can you bisect down to find the exact revision this fails on?

Also can you confirm both in as close to similar environments, eg both in docker, so we can rule out a change around fsync() etc?

Also note that the transaction behaviour for ldbadd is about the worst possible, it will open and close a transaction for each LDIF chuck individually.
Comment 3 Andreas Schneider 2024-03-01 10:11:00 UTC
I've bisected it on registry.gitlab.com/samba-team/devel/samba/samba-ci-centos8s:9a406973474a7903fe7fd6215226660911ed73c0



b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 is the first bad commit
commit b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6
Author: Gary Lockyer <gary@catalyst.net.nz>
Date:   Wed Mar 6 15:28:45 2019 +1300

    lib ldb key value: fix index buffering

    As a performance enhancement the key value layer maintains a cache of
    the index records, which is written to disk as part of a prepare commit.
    This patch adds an extra cache at the operation layer to ensure that the
    cached indexes remain consistent in the event of an operation failing.

    Add test to test for index corruption in a failed modify.

    Signed-off-by: Gary Lockyer <gary@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet@samba.org>
    Pair-Programmed-With: Andrew Bartlett <abartlet@samba.org>
    Signed-off-by: Andrew Bartlett <abartlet@samba.org>

 lib/ldb/ldb_key_value/ldb_kv.c       |  22 ---
 lib/ldb/ldb_key_value/ldb_kv.h       |  12 ++
 lib/ldb/ldb_key_value/ldb_kv_index.c | 347 ++++++++++++++++++++++++++++++++---
 lib/ldb/tests/python/api.py          |  82 ++++++++-
 4 files changed, 415 insertions(+), 48 deletions(-)
Comment 4 Andreas Schneider 2024-05-02 12:45:55 UTC
The patches from Andréas LEROUX do not have any real impact on this issue.


tis-tdbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m9.035s
user    0m9.021s
sys     0m0.283s





tis-ldbfind.patch:

bash repro_dev_ldb.sh 10000 indexes
Added 2 records successfully
Added 10000 records successfully

real    0m8.929s
user    0m8.980s
sys     0m0.219s
Comment 5 Andreas Schneider 2024-05-02 14:09:34 UTC
I've started to comment out code and tried to optimize away allocations [1].

I nailed it down to tdb_traverse(ldb_kv_sub_transaction_traverse). The traverse callback is called 4 times for each of the 10000 entries.

If I comment out the tdb_store() inside ldb_kv_sub_transaction_traverse(), the reproducer is done in less than 1sec instead of 9 seconds.

If the tdb only has 4 entries and one of those entries gets updated each time. The question is, why do we use a tdb for this? I guess the tdb does locking here and I don't think for such a small database, that tdb is probably not the right thing?


[1] https://git.samba.org/?p=asn/samba.git;a=shortlog;h=refs/heads/asn-ldb
Comment 6 Andrew Bartlett 2024-05-02 20:38:43 UTC
I replied to Volker's comment with much the same conclusion here:

https://gitlab.com/samba-team/samba/-/merge_requests/3616#note_1890744831

Yes, the background here was that Gary was looking for a simple hash lookup or similar that was available in LDB's dependencies.

I agree, the best fix here would be a better data structure, as there will never be an optimal value (for the hash size). (Also the) patch was put under test, and makes some operations slower (sadly).