With valgrind on v4-2* ==3636== Invalid write of size 8 ==3636== at 0x151F3D: ctdb_lock_context_destructor (ctdb_lock.c:276) ==3636== by 0x58B3618: _talloc_free_internal (talloc.c:993) ==3636== by 0x58AD692: _talloc_free_children_internal (talloc.c:1472) ==3636== by 0x58AD692: _talloc_free_internal (talloc.c:1019) ==3636== by 0x58AD692: _talloc_free_internal (talloc.c:1019) ==3636== by 0x58AD692: _talloc_free_children_internal (talloc.c:1472) ==3636== by 0x58AD692: _talloc_free_internal (talloc.c:1019) ==3636== by 0x58AD692: _talloc_free (talloc.c:1594) ==3636== by 0x15292E: ctdb_lock_handler (ctdb_lock.c:471) ==3636== by 0x56A535A: epoll_event_loop (tevent_epoll.c:728) ==3636== by 0x56A535A: epoll_event_loop_once (tevent_epoll.c:926) ==3636== by 0x56A3826: std_event_loop_once (tevent_standard.c:114) ==3636== by 0x569FFFC: _tevent_loop_once (tevent.c:533) ==3636== by 0x56A019A: tevent_common_loop_wait (tevent.c:637) ==3636== by 0x56A37C6: std_event_loop_wait (tevent_standard.c:140) ==3636== by 0x11E03A: ctdb_start_daemon (ctdb_daemon.c:1320) ==3636== by 0x118557: main (ctdbd.c:321) ==3636== Address 0x9c5b660 is 96 bytes inside a block of size 120 free'd ==3636== at 0x4C29D17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3636== by 0x58B32D3: _talloc_free_internal (talloc.c:1063) ==3636== by 0x58B3232: _talloc_free_children_internal (talloc.c:1472) ==3636== by 0x58B3232: _talloc_free_internal (talloc.c:1019) ==3636== by 0x58B3232: _talloc_free_children_internal (talloc.c:1472) ==3636== by 0x58B3232: _talloc_free_internal (talloc.c:1019) ==3636== by 0x58AD692: _talloc_free_children_internal (talloc.c:1472) ==3636== by 0x58AD692: _talloc_free_internal (talloc.c:1019) ==3636== by 0x58AD692: _talloc_free (talloc.c:1594) ==3636== by 0x11EC30: daemon_incoming_packet (ctdb_daemon.c:844) ==3636== by 0x136F4A: lock_fetch_callback (ctdb_ltdb_server.c:268) ==3636== by 0x152489: process_callbacks (ctdb_lock.c:353) ==3636== by 0x152489: ctdb_lock_handler (ctdb_lock.c:468) ==3636== by 0x56A535A: epoll_event_loop (tevent_epoll.c:728) ==3636== by 0x56A535A: epoll_event_loop_once (tevent_epoll.c:926) ==3636== by 0x56A3826: std_event_loop_once (tevent_standard.c:114) ==3636== by 0x569FFFC: _tevent_loop_once (tevent.c:533) ==3636== by 0x56A019A: tevent_common_loop_wait (tevent.c:637) ==3636== by 0x56A37C6: std_event_loop_wait (tevent_standard.c:140) ==3636== by 0x11E03A: ctdb_start_daemon (ctdb_daemon.c:1320) ==3636== by 0x118557: main (ctdbd.c:321) Backtrace in production with ctdb 2.5.3*: #0 0x00007fa672d54635 in raise () from /lib64/libc.so.6 #1 0x00007fa672d55e15 in abort () from /lib64/libc.so.6 #2 0x000000000045926b in smb_panic (why=0x48e593 "internal error") at lib/util/fault.c:162 #3 0x00000000004594fd in fault_report (sig=11) at lib/util/fault.c:179 #4 sig_fault (sig=11) at lib/util/fault.c:194 #5 <signal handler called> #6 ctdb_lock_context_destructor (lock_ctx=0x1571850) at server/ctdb_lock.c:276 #7 0x000000000045f8aa in _talloc_free_internal (ptr=0x1571850, location=0x487fd9 "server/ctdb_lock.c:471") at lib/talloc/talloc.c:872 #8 0x000000000045f693 in _talloc_free_children_internal (ptr=0xb4fba0, location=0x487fd9 "server/ctdb_lock.c:471") at lib/talloc/talloc.c:1355 #9 _talloc_free_internal (ptr=0xb4fba0, location=0x487fd9 "server/ctdb_lock.c:471") at lib/talloc/talloc.c:892 #10 0x000000000044394f in ctdb_lock_handler (ev=<value optimized out>, tfd=<value optimized out>, flags=<value optimized out>, private_data=<value optimized out>) at server/ctdb_lock.c:471 #11 0x000000000046a54e in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at lib/tevent/tevent_epoll.c:736 #12 epoll_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at lib/tevent/tevent_epoll.c:931 #13 0x0000000000467be6 in std_event_loop_once (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent_standard.c:112 #14 0x0000000000464b9d in _tevent_loop_once (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent.c:530 #15 0x0000000000464c1b in tevent_common_loop_wait (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent.c:634 #16 0x0000000000467b56 in std_event_loop_wait (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent_standard.c:138 #17 0x0000000000409cd7 in ctdb_start_daemon (ctdb=0xaa2f40, do_fork=<value optimized out>, use_syslog=<value optimized out>) at server/ctdb_daemon.c:1320 #18 0x000000000040536f in main (argc=<value optimized out>, argv=<value optimized out>) at server/ctdbd.c:323 The issue is that the request member of the lock context is invalid in ctdb_lock_context_destructor(): (gdb) p *lock_ctx $6 = {next = 0x144f410, prev = 0x11f4210, type = LOCK_RECORD, ctdb = 0xaa2f40, ctdb_db = 0xaccaa0, key = { dptr = 0x1263c20 "....", dsize = 24}, priority = 0, auto_mark = true, request = 0x134ac970, child = 377590, fd = {673, 674}, tfd = 0x128ad60, ttimer = 0x0, start_time = {tv_sec = 1432141949, tv_usec = 954169}, key_hash = 3388878912, can_schedule = false} (gdb) p *lock_ctx->request Cannot access memory at address 0x134ac970
Metze, thanks for following up on this defect. I need to understand why this is happening. Will look at this closely once I am home.
Created attachment 11111 [details] CTDB locking patches
The content of attachment 11111 [details] has been deleted for the following reason: incomplete fixes
Created attachment 11119 [details] Proposed but untested patches
Patches are now tested, pushing to master.
Created attachment 11151 [details] Backported patches for v4-2 branch
Pushed to autobuild-v4-2-test.
(In reply to Karolin Seeger from comment #7) Pushed to v4-2-test. Closing out bug report. Thanks!