Bug 14404 - LDB 2.1.3 ppc64 test error: test_free_list_no_read_lock, test_free_list_read_lock, test_free_list_stale_reader
Summary: LDB 2.1.3 ppc64 test error: test_free_list_no_read_lock, test_free_list_read_...
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Build (show other bugs)
Version: 4.12.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-09 13:18 UTC by Andreas Hasenack
Modified: 2024-04-19 08:22 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Hasenack 2020-06-09 13:18:22 UTC
I'm trying to update ldb from 2.1.2 to 2.1.3 but am seeing these test
failures that only happen on ppc64el:

[==========] Running 3 test(s).
[ RUN      ] test_free_list_no_read_lock
Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend
'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend'
Could not run test: 0x1 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error:
Failure!Test setup failed
[  ERROR   ] test_free_list_no_read_lock
[ RUN      ] test_free_list_read_lock
Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend
'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend'
Could not run test: 0x1 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error:
Failure!Test setup failed
[  ERROR   ] test_free_list_read_lock
[ RUN      ] test_free_list_stale_reader
Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend
'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend'
Could not run test: 0x1 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error:
Failure!Test setup failed
[  ERROR   ] test_free_list_stale_reader
[==========] 3 test(s) run.
[  PASSED  ] 0 test(s).
testsuite returned 0
Running Python test with /usr/bin/python3: tests/python/api.py
Running Python test with /usr/bin/python3: tests/python/index.py
Running Python test with /usr/bin/python3: tests/python/repack.py
Python testsuite returned 0
make[2]: *** [Makefile:17: test] Error 3

On the same machine and environment, 2.1.2 passes. lmdb is 0.9.24-1,
talloc is 2.3.1, tevent is 0.10.2, tdb is 1.4.3

Any idea what is going on?
Comment 1 Andreas Hasenack 2020-06-09 13:19:14 UTC
I understand ppc64el is not a very common architecture, and am happy to provide more debugging information.
Comment 2 Andreas Hasenack 2020-06-09 14:45:24 UTC
From what I could tell, this is the bit that fails:
ldb_key_value/ldb_kv_cache.c:487
        /* OK now we definitely have a @BASEINFO record so fetch it */
        r = ldb_kv_search_dn1(module, baseinfo_dn, baseinfo, 0);
        if (r != LDB_SUCCESS) {
                goto failed_and_unlock;
        }

r has value 2



If I bump lmdb_env_size in tests/ldb_lmdb_free_list_test.c's setup() from 65536 to 262144:
const char *options[] = {"lmdb_env_size:262144", NULL};

Then the test goes a bit further:
$ ./bin/ldb_lmdb_free_list_test
[==========] Running 3 test(s).
[ RUN      ] test_free_list_no_read_lock
[  ERROR   ] --- 0xffffffffffff87b8 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:292: error: Failure!
[  FAILED  ] test_free_list_no_read_lock
[ RUN      ] test_free_list_read_lock
[       OK ] test_free_list_read_lock
[ RUN      ] test_free_list_stale_reader
LMDB Stale readers, deleted (1)
[  ERROR   ] --- 0xffffffffffff87b8 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:618: error: Failure!
[  FAILED  ] test_free_list_stale_reader
[==========] 3 test(s) run.
[  PASSED  ] 1 test(s).
[  FAILED  ] 2 test(s), listed below:
[  FAILED  ] test_free_list_no_read_lock
[  FAILED  ] test_free_list_stale_reader

 2 FAILED TEST(S)
[  ERROR   ] --- 0x2 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:328: error: Failure!
[  FAILED  ] test_free_list_no_read_lock
[ RUN      ] test_free_list_read_lock
[       OK ] test_free_list_read_lock
[ RUN      ] test_free_list_stale_reader
LMDB Stale readers, deleted (1)
[  ERROR   ] --- 0xffffffffffff87b8 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:618: error: Failure!
[  FAILED  ] test_free_list_stale_reader
[==========] 3 test(s) run.
[  PASSED  ] 1 test(s).
[  FAILED  ] 2 test(s), listed below:
[  FAILED  ] test_free_list_no_read_lock
[  FAILED  ] test_free_list_stale_reader

 2 FAILED TEST(S)
Comment 3 Andreas Hasenack 2020-06-09 14:50:17 UTC
If I bump it furter, say, 4194304, then I get an even different result:

$ ./bin/ldb_lmdb_free_list_test
[==========] Running 3 test(s).
[ RUN      ] test_free_list_no_read_lock
[       OK ] test_free_list_no_read_lock
[ RUN      ] test_free_list_read_lock
[  ERROR   ] --- 0 != 0x33
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:436: error: Failure!
[  FAILED  ] test_free_list_read_lock
[ RUN      ] test_free_list_stale_reader
LMDB Stale readers, deleted (1)
[       OK ] test_free_list_stale_reader
[==========] 3 test(s) run.
[  PASSED  ] 2 test(s).
[  FAILED  ] 1 test(s), listed below:
[  FAILED  ] test_free_list_read_lock

 1 FAILED TEST(S)
[  ERROR   ] --- 0 != 0x2
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:484: error: Failure!
[  FAILED  ] test_free_list_read_lock
[ RUN      ] test_free_list_stale_reader
LMDB Stale readers, deleted (1)
[       OK ] test_free_list_stale_reader
[==========] 3 test(s) run.
[  PASSED  ] 2 test(s).
[  FAILED  ] 1 test(s), listed below:
[  FAILED  ] test_free_list_read_lock

 1 FAILED TEST(S)
Comment 4 Evgeny Sinelnikov 2020-07-09 13:25:17 UTC
I reproduce it locally:

[builder@localhost ldb-2.0.11]$ LD_LIBRARY_PATH=bin/default/ LDB_MODULES_PATH=bin/default/modules/ ./bin/ldb_lmdb_free_list_test    
[==========] Running 3 test(s).
[ RUN      ] test_free_list_no_read_lock
ldb: unable to dlopen bin/default/modules//asq.c.25.d : bin/default/modules//asq.c.25.d: invalid ELF header
ldb: unable to dlopen bin/default/modules//asq.c.25.o : bin/default/modules//asq.c.25.o: only ET_DYN and ET_EXEC can be loaded
ldb: unable to dlopen bin/default/modules//paged_searches.c.29.d : bin/default/modules//paged_searches.c.29.d: invalid ELF header
ldb: unable to dlopen bin/default/modules//paged_searches.c.29.o : bin/default/modules//paged_searches.c.29.o: only ET_DYN and ET_EXEC can be loaded
ldb: unable to dlopen bin/default/modules//rdn_name.c.31.d : bin/default/modules//rdn_name.c.31.d: invalid ELF header
ldb: unable to dlopen bin/default/modules//rdn_name.c.31.o : bin/default/modules//rdn_name.c.31.o: only ET_DYN and ET_EXEC can be loaded
ldb: unable to dlopen bin/default/modules//skel.c.35.d : bin/default/modules//skel.c.35.d: invalid ELF header
ldb: unable to dlopen bin/default/modules//skel.c.35.o : bin/default/modules//skel.c.35.o: only ET_DYN and ET_EXEC can be loaded
ldb: unable to dlopen bin/default/modules//sort.c.27.d : bin/default/modules//sort.c.27.d: invalid ELF header
ldb: unable to dlopen bin/default/modules//sort.c.27.o : bin/default/modules//sort.c.27.o: only ET_DYN and ET_EXEC can be loaded

ldb_kv_cache_load
lmdb_parse_record
ldb_kv_cache_load
ldb_kv_search_dn1 (ldb_kv_search_key)
ldb_kv_search_key (fetch_and_parse for DN=@BASEINFO)
lmdb_parse_record lmdb->error = -30782 (ldb_mdb_error = 2)

Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend'
Could not run test: 0x1 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed
[  ERROR   ] test_free_list_no_read_lock
[ RUN      ] test_free_list_read_lock

ldb_kv_cache_load
lmdb_parse_record
ldb_kv_cache_load
ldb_kv_search_dn1 (ldb_kv_search_key)
ldb_kv_search_key (fetch_and_parse for DN=@BASEINFO)
lmdb_parse_record lmdb->error = -30782 (ldb_mdb_error = 2)

Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend'
Could not run test: 0x1 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed
[  ERROR   ] test_free_list_read_lock
[ RUN      ] test_free_list_stale_reader

ldb_kv_cache_load
lmdb_parse_record
ldb_kv_cache_load
ldb_kv_search_dn1 (ldb_kv_search_key)
ldb_kv_search_key (fetch_and_parse for DN=@BASEINFO)
lmdb_parse_record lmdb->error = -30782 (ldb_mdb_error = 2)

Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend'
Could not run test: 0x1 != 0
[   LINE   ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed
[  ERROR   ] test_free_list_stale_reader
[==========] 3 test(s) run.
[  PASSED  ] 0 test(s).

It looks like fetch_and_parse for @BASEINFO fails on trying open db:
lmdb->error = mdb_dbi_open(txn, NULL, 0, &dbi);
Comment 5 Evgeny Sinelnikov 2020-07-09 13:46:47 UTC
So, mdb_dbi_open() returns MDB_BAD_TXN.

http://www.lmdb.tech/doc/group__errors.html#gadcc86b34304f2ae761e3ba7c2eb88ba9

#define MDB_BAD_TXN   (-30782)
Transaction must abort, has a child, or is invalid

./ldb_mdb/ldb_mdb.c:380
static int lmdb_parse_record(struct ldb_kv_private *ldb_kv,
                             struct ldb_val key,
                             int (*parser)(struct ldb_val key,
                                           struct ldb_val data,
                                           void *private_data),
                             void *ctx)
{
        struct lmdb_private *lmdb = ldb_kv->lmdb_private;
        MDB_val mdb_key;
        MDB_val mdb_data;
        MDB_txn *txn = NULL;
        MDB_dbi dbi;
        struct ldb_val data;

        txn = get_current_txn(lmdb);
        if (txn == NULL) {
                ldb_debug(lmdb->ldb, LDB_DEBUG_FATAL, "No transaction active");
                lmdb->error = MDB_PANIC;
                return ldb_mdb_error(lmdb->ldb, lmdb->error);
        }

        lmdb->error = mdb_dbi_open(txn, NULL, 0, &dbi);
        if (lmdb->error != MDB_SUCCESS) {
                return ldb_mdb_error(lmdb->ldb, lmdb->error);
        }
Comment 6 Mathieu Parent 2020-11-11 11:41:47 UTC
Here are the Debian architectures where I need to skip this test:
'alpha', 'ia64', 'mips64el', 'ppc64el', 'ppc64', 'sparc64'

mips64el now fails in ldb 2.2.0, it was not failing in 2.1.4 (or has not failed by chance).

Log: https://buildd.debian.org/status/fetch.php?pkg=ldb&arch=mips64el&ver=2%3A2.2.0-1&stamp=1605093081&raw=0

Patch: https://salsa.debian.org/samba-team/ldb/-/blob/master/debian/patches/Skip-ldb_lmdb_free_list_test-on-ppc64el-ppc64-and-sp.patch
Comment 7 Douglas Bagnall 2024-04-19 08:22:03 UTC
(In reply to Evgeny Sinelnikov from comment #5)
> #define MDB_BAD_TXN   (-30782)
> Transaction must abort, has a child, or is invalid

Note, in comment 2 we see 

> 0xffffffffffff87b8

which is -30792 or MDB_MAP_FULL.

It may or may not be the same thing.

Can people still reproduce this?