Hi, my colleagues found a performance issue with current libldb. Fedora 39: $ rpm -q ldb-tools ldb-tools-2.8.0-1.fc39.x86_64 $ bash repro.sh 30000 Added 30000 records successfully real 0m0.534s user 0m0.931s sys 0m0.062s $ bash repro.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m16.232s user 0m16.122s sys 0m0.238s $ bash repro.sh 20000 indexes Added 2 records successfully Added 20000 records successfully real 1m32.052s user 1m30.924s sys 0m1.235s CentOS7 (container from out CI): [samba@57da0cb767c4 ~]$ rpm -qf /usr/bin/ldbadd samba-client-4.10.13-1.el7.x86_64 [samba@57da0cb767c4 ~]$ bash repro.sh 20000 indexes Added 2 records successfully Added 20000 records successfully real 0m0.871s user 0m1.231s sys 0m0.140s
Created attachment 18258 [details] repro.sh
Can you bisect down to find the exact revision this fails on? Also can you confirm both in as close to similar environments, eg both in docker, so we can rule out a change around fsync() etc? Also note that the transaction behaviour for ldbadd is about the worst possible, it will open and close a transaction for each LDIF chuck individually.
I've bisected it on registry.gitlab.com/samba-team/devel/samba/samba-ci-centos8s:9a406973474a7903fe7fd6215226660911ed73c0 b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 is the first bad commit commit b6b5b5fe355fee2a4096e9214831cb88c7a2a4c6 Author: Gary Lockyer <gary@catalyst.net.nz> Date: Wed Mar 6 15:28:45 2019 +1300 lib ldb key value: fix index buffering As a performance enhancement the key value layer maintains a cache of the index records, which is written to disk as part of a prepare commit. This patch adds an extra cache at the operation layer to ensure that the cached indexes remain consistent in the event of an operation failing. Add test to test for index corruption in a failed modify. Signed-off-by: Gary Lockyer <gary@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Pair-Programmed-With: Andrew Bartlett <abartlet@samba.org> Signed-off-by: Andrew Bartlett <abartlet@samba.org> lib/ldb/ldb_key_value/ldb_kv.c | 22 --- lib/ldb/ldb_key_value/ldb_kv.h | 12 ++ lib/ldb/ldb_key_value/ldb_kv_index.c | 347 ++++++++++++++++++++++++++++++++--- lib/ldb/tests/python/api.py | 82 ++++++++- 4 files changed, 415 insertions(+), 48 deletions(-)
The patches from Andréas LEROUX do not have any real impact on this issue. tis-tdbfind.patch: bash repro_dev_ldb.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m9.035s user 0m9.021s sys 0m0.283s tis-ldbfind.patch: bash repro_dev_ldb.sh 10000 indexes Added 2 records successfully Added 10000 records successfully real 0m8.929s user 0m8.980s sys 0m0.219s
I've started to comment out code and tried to optimize away allocations [1]. I nailed it down to tdb_traverse(ldb_kv_sub_transaction_traverse). The traverse callback is called 4 times for each of the 10000 entries. If I comment out the tdb_store() inside ldb_kv_sub_transaction_traverse(), the reproducer is done in less than 1sec instead of 9 seconds. If the tdb only has 4 entries and one of those entries gets updated each time. The question is, why do we use a tdb for this? I guess the tdb does locking here and I don't think for such a small database, that tdb is probably not the right thing? [1] https://git.samba.org/?p=asn/samba.git;a=shortlog;h=refs/heads/asn-ldb
I replied to Volker's comment with much the same conclusion here: https://gitlab.com/samba-team/samba/-/merge_requests/3616#note_1890744831 Yes, the background here was that Gary was looking for a simple hash lookup or similar that was available in LDB's dependencies. I agree, the best fix here would be a better data structure, as there will never be an optimal value (for the hash size). (Also the) patch was put under test, and makes some operations slower (sadly).