I'm trying to update ldb from 2.1.2 to 2.1.3 but am seeing these test failures that only happen on ppc64el: [==========] Running 3 test(s). [ RUN ] test_free_list_no_read_lock Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend' Could not run test: 0x1 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed [ ERROR ] test_free_list_no_read_lock [ RUN ] test_free_list_read_lock Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend' Could not run test: 0x1 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed [ ERROR ] test_free_list_read_lock [ RUN ] test_free_list_stale_reader Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend' Could not run test: 0x1 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed [ ERROR ] test_free_list_stale_reader [==========] 3 test(s) run. [ PASSED ] 0 test(s). testsuite returned 0 Running Python test with /usr/bin/python3: tests/python/api.py Running Python test with /usr/bin/python3: tests/python/index.py Running Python test with /usr/bin/python3: tests/python/repack.py Python testsuite returned 0 make[2]: *** [Makefile:17: test] Error 3 On the same machine and environment, 2.1.2 passes. lmdb is 0.9.24-1, talloc is 2.3.1, tevent is 0.10.2, tdb is 1.4.3 Any idea what is going on?
I understand ppc64el is not a very common architecture, and am happy to provide more debugging information.
From what I could tell, this is the bit that fails: ldb_key_value/ldb_kv_cache.c:487 /* OK now we definitely have a @BASEINFO record so fetch it */ r = ldb_kv_search_dn1(module, baseinfo_dn, baseinfo, 0); if (r != LDB_SUCCESS) { goto failed_and_unlock; } r has value 2 If I bump lmdb_env_size in tests/ldb_lmdb_free_list_test.c's setup() from 65536 to 262144: const char *options[] = {"lmdb_env_size:262144", NULL}; Then the test goes a bit further: $ ./bin/ldb_lmdb_free_list_test [==========] Running 3 test(s). [ RUN ] test_free_list_no_read_lock [ ERROR ] --- 0xffffffffffff87b8 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:292: error: Failure! [ FAILED ] test_free_list_no_read_lock [ RUN ] test_free_list_read_lock [ OK ] test_free_list_read_lock [ RUN ] test_free_list_stale_reader LMDB Stale readers, deleted (1) [ ERROR ] --- 0xffffffffffff87b8 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:618: error: Failure! [ FAILED ] test_free_list_stale_reader [==========] 3 test(s) run. [ PASSED ] 1 test(s). [ FAILED ] 2 test(s), listed below: [ FAILED ] test_free_list_no_read_lock [ FAILED ] test_free_list_stale_reader 2 FAILED TEST(S) [ ERROR ] --- 0x2 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:328: error: Failure! [ FAILED ] test_free_list_no_read_lock [ RUN ] test_free_list_read_lock [ OK ] test_free_list_read_lock [ RUN ] test_free_list_stale_reader LMDB Stale readers, deleted (1) [ ERROR ] --- 0xffffffffffff87b8 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:618: error: Failure! [ FAILED ] test_free_list_stale_reader [==========] 3 test(s) run. [ PASSED ] 1 test(s). [ FAILED ] 2 test(s), listed below: [ FAILED ] test_free_list_no_read_lock [ FAILED ] test_free_list_stale_reader 2 FAILED TEST(S)
If I bump it furter, say, 4194304, then I get an even different result: $ ./bin/ldb_lmdb_free_list_test [==========] Running 3 test(s). [ RUN ] test_free_list_no_read_lock [ OK ] test_free_list_no_read_lock [ RUN ] test_free_list_read_lock [ ERROR ] --- 0 != 0x33 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:436: error: Failure! [ FAILED ] test_free_list_read_lock [ RUN ] test_free_list_stale_reader LMDB Stale readers, deleted (1) [ OK ] test_free_list_stale_reader [==========] 3 test(s) run. [ PASSED ] 2 test(s). [ FAILED ] 1 test(s), listed below: [ FAILED ] test_free_list_read_lock 1 FAILED TEST(S) [ ERROR ] --- 0 != 0x2 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:484: error: Failure! [ FAILED ] test_free_list_read_lock [ RUN ] test_free_list_stale_reader LMDB Stale readers, deleted (1) [ OK ] test_free_list_stale_reader [==========] 3 test(s) run. [ PASSED ] 2 test(s). [ FAILED ] 1 test(s), listed below: [ FAILED ] test_free_list_read_lock 1 FAILED TEST(S)
I reproduce it locally: [builder@localhost ldb-2.0.11]$ LD_LIBRARY_PATH=bin/default/ LDB_MODULES_PATH=bin/default/modules/ ./bin/ldb_lmdb_free_list_test [==========] Running 3 test(s). [ RUN ] test_free_list_no_read_lock ldb: unable to dlopen bin/default/modules//asq.c.25.d : bin/default/modules//asq.c.25.d: invalid ELF header ldb: unable to dlopen bin/default/modules//asq.c.25.o : bin/default/modules//asq.c.25.o: only ET_DYN and ET_EXEC can be loaded ldb: unable to dlopen bin/default/modules//paged_searches.c.29.d : bin/default/modules//paged_searches.c.29.d: invalid ELF header ldb: unable to dlopen bin/default/modules//paged_searches.c.29.o : bin/default/modules//paged_searches.c.29.o: only ET_DYN and ET_EXEC can be loaded ldb: unable to dlopen bin/default/modules//rdn_name.c.31.d : bin/default/modules//rdn_name.c.31.d: invalid ELF header ldb: unable to dlopen bin/default/modules//rdn_name.c.31.o : bin/default/modules//rdn_name.c.31.o: only ET_DYN and ET_EXEC can be loaded ldb: unable to dlopen bin/default/modules//skel.c.35.d : bin/default/modules//skel.c.35.d: invalid ELF header ldb: unable to dlopen bin/default/modules//skel.c.35.o : bin/default/modules//skel.c.35.o: only ET_DYN and ET_EXEC can be loaded ldb: unable to dlopen bin/default/modules//sort.c.27.d : bin/default/modules//sort.c.27.d: invalid ELF header ldb: unable to dlopen bin/default/modules//sort.c.27.o : bin/default/modules//sort.c.27.o: only ET_DYN and ET_EXEC can be loaded ldb_kv_cache_load lmdb_parse_record ldb_kv_cache_load ldb_kv_search_dn1 (ldb_kv_search_key) ldb_kv_search_key (fetch_and_parse for DN=@BASEINFO) lmdb_parse_record lmdb->error = -30782 (ldb_mdb_error = 2) Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend' Could not run test: 0x1 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed [ ERROR ] test_free_list_no_read_lock [ RUN ] test_free_list_read_lock ldb_kv_cache_load lmdb_parse_record ldb_kv_cache_load ldb_kv_search_dn1 (ldb_kv_search_key) ldb_kv_search_key (fetch_and_parse for DN=@BASEINFO) lmdb_parse_record lmdb->error = -30782 (ldb_mdb_error = 2) Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend' Could not run test: 0x1 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed [ ERROR ] test_free_list_read_lock [ RUN ] test_free_list_stale_reader ldb_kv_cache_load lmdb_parse_record ldb_kv_cache_load ldb_kv_search_dn1 (ldb_kv_search_key) ldb_kv_search_key (fetch_and_parse for DN=@BASEINFO) lmdb_parse_record lmdb->error = -30782 (ldb_mdb_error = 2) Failed to connect to 'mdb://lmdb_free_list_test.ldb' with backend 'mdb': Unable to load ltdb cache records for backend 'ldb_mdb backend' Could not run test: 0x1 != 0 [ LINE ] --- ../../tests/ldb_lmdb_free_list_test.c:165: error: Failure!Test setup failed [ ERROR ] test_free_list_stale_reader [==========] 3 test(s) run. [ PASSED ] 0 test(s). It looks like fetch_and_parse for @BASEINFO fails on trying open db: lmdb->error = mdb_dbi_open(txn, NULL, 0, &dbi);
So, mdb_dbi_open() returns MDB_BAD_TXN. http://www.lmdb.tech/doc/group__errors.html#gadcc86b34304f2ae761e3ba7c2eb88ba9 #define MDB_BAD_TXN (-30782) Transaction must abort, has a child, or is invalid ./ldb_mdb/ldb_mdb.c:380 static int lmdb_parse_record(struct ldb_kv_private *ldb_kv, struct ldb_val key, int (*parser)(struct ldb_val key, struct ldb_val data, void *private_data), void *ctx) { struct lmdb_private *lmdb = ldb_kv->lmdb_private; MDB_val mdb_key; MDB_val mdb_data; MDB_txn *txn = NULL; MDB_dbi dbi; struct ldb_val data; txn = get_current_txn(lmdb); if (txn == NULL) { ldb_debug(lmdb->ldb, LDB_DEBUG_FATAL, "No transaction active"); lmdb->error = MDB_PANIC; return ldb_mdb_error(lmdb->ldb, lmdb->error); } lmdb->error = mdb_dbi_open(txn, NULL, 0, &dbi); if (lmdb->error != MDB_SUCCESS) { return ldb_mdb_error(lmdb->ldb, lmdb->error); }
Here are the Debian architectures where I need to skip this test: 'alpha', 'ia64', 'mips64el', 'ppc64el', 'ppc64', 'sparc64' mips64el now fails in ldb 2.2.0, it was not failing in 2.1.4 (or has not failed by chance). Log: https://buildd.debian.org/status/fetch.php?pkg=ldb&arch=mips64el&ver=2%3A2.2.0-1&stamp=1605093081&raw=0 Patch: https://salsa.debian.org/samba-team/ldb/-/blob/master/debian/patches/Skip-ldb_lmdb_free_list_test-on-ppc64el-ppc64-and-sp.patch
(In reply to Evgeny Sinelnikov from comment #5) > #define MDB_BAD_TXN (-30782) > Transaction must abort, has a child, or is invalid Note, in comment 2 we see > 0xffffffffffff87b8 which is -30792 or MDB_MAP_FULL. It may or may not be the same thing. Can people still reproduce this?
I believe I'm seeing this same error on Gentoo sparc64, on all versions we currently have available (2.7.2, 2.8.0, 2.8.1, 2.9.0, 2.9.1). It looks like the ldb test source now uses a 1MB DB size, so out of the gate I'm seeing the same errors as comment 2 with "0xffffffffffff87b8 != 0". I then bumped that size to 10MB and I get the same errors as comment 3. If anyone has any suggestions, I'd be happy to test & report back.