Bug 11293 - invalid write in ctdb_lock_context_destructor
Summary: invalid write in ctdb_lock_context_destructor
Status: RESOLVED FIXED
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: 4.2.1
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-27 13:35 UTC by Stefan Metzmacher
Modified: 2015-06-22 17:23 UTC (History)
2 users (show)

See Also:


Attachments
CTDB locking patches (deleted)
2015-06-01 07:11 UTC, Amitay Isaacs
no flags Details
Proposed but untested patches (13.64 KB, text/plain)
2015-06-02 11:00 UTC, Stefan Metzmacher
no flags Details
Backported patches for v4-2 branch (14.48 KB, patch)
2015-06-15 02:06 UTC, Amitay Isaacs
metze: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2015-05-27 13:35:11 UTC
With valgrind on v4-2*

    ==3636== Invalid write of size 8
    ==3636==    at 0x151F3D: ctdb_lock_context_destructor (ctdb_lock.c:276)
    ==3636==    by 0x58B3618: _talloc_free_internal (talloc.c:993)
    ==3636==    by 0x58AD692: _talloc_free_children_internal (talloc.c:1472)
    ==3636==    by 0x58AD692: _talloc_free_internal (talloc.c:1019)
    ==3636==    by 0x58AD692: _talloc_free_internal (talloc.c:1019)
    ==3636==    by 0x58AD692: _talloc_free_children_internal (talloc.c:1472)
    ==3636==    by 0x58AD692: _talloc_free_internal (talloc.c:1019)
    ==3636==    by 0x58AD692: _talloc_free (talloc.c:1594)
    ==3636==    by 0x15292E: ctdb_lock_handler (ctdb_lock.c:471)
    ==3636==    by 0x56A535A: epoll_event_loop (tevent_epoll.c:728)
    ==3636==    by 0x56A535A: epoll_event_loop_once (tevent_epoll.c:926)
    ==3636==    by 0x56A3826: std_event_loop_once (tevent_standard.c:114)
    ==3636==    by 0x569FFFC: _tevent_loop_once (tevent.c:533)
    ==3636==    by 0x56A019A: tevent_common_loop_wait (tevent.c:637)
    ==3636==    by 0x56A37C6: std_event_loop_wait (tevent_standard.c:140)
    ==3636==    by 0x11E03A: ctdb_start_daemon (ctdb_daemon.c:1320)
    ==3636==    by 0x118557: main (ctdbd.c:321)
    ==3636==  Address 0x9c5b660 is 96 bytes inside a block of size 120 free'd
    ==3636==    at 0x4C29D17: free (in
    /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==3636==    by 0x58B32D3: _talloc_free_internal (talloc.c:1063)
    ==3636==    by 0x58B3232: _talloc_free_children_internal (talloc.c:1472)
    ==3636==    by 0x58B3232: _talloc_free_internal (talloc.c:1019)
    ==3636==    by 0x58B3232: _talloc_free_children_internal (talloc.c:1472)
    ==3636==    by 0x58B3232: _talloc_free_internal (talloc.c:1019)
    ==3636==    by 0x58AD692: _talloc_free_children_internal (talloc.c:1472)
    ==3636==    by 0x58AD692: _talloc_free_internal (talloc.c:1019)
    ==3636==    by 0x58AD692: _talloc_free (talloc.c:1594)
    ==3636==    by 0x11EC30: daemon_incoming_packet (ctdb_daemon.c:844)
    ==3636==    by 0x136F4A: lock_fetch_callback (ctdb_ltdb_server.c:268)
    ==3636==    by 0x152489: process_callbacks (ctdb_lock.c:353)
    ==3636==    by 0x152489: ctdb_lock_handler (ctdb_lock.c:468)
    ==3636==    by 0x56A535A: epoll_event_loop (tevent_epoll.c:728)
    ==3636==    by 0x56A535A: epoll_event_loop_once (tevent_epoll.c:926)
    ==3636==    by 0x56A3826: std_event_loop_once (tevent_standard.c:114)
    ==3636==    by 0x569FFFC: _tevent_loop_once (tevent.c:533)
    ==3636==    by 0x56A019A: tevent_common_loop_wait (tevent.c:637)
    ==3636==    by 0x56A37C6: std_event_loop_wait (tevent_standard.c:140)
    ==3636==    by 0x11E03A: ctdb_start_daemon (ctdb_daemon.c:1320)
    ==3636==    by 0x118557: main (ctdbd.c:321)

Backtrace in production with ctdb 2.5.3*:

#0  0x00007fa672d54635 in raise () from /lib64/libc.so.6
  #1  0x00007fa672d55e15 in abort () from /lib64/libc.so.6
  #2  0x000000000045926b in smb_panic (why=0x48e593 "internal error") at lib/util/fault.c:162
  #3  0x00000000004594fd in fault_report (sig=11) at lib/util/fault.c:179
  #4  sig_fault (sig=11) at lib/util/fault.c:194
  #5  <signal handler called>
  #6  ctdb_lock_context_destructor (lock_ctx=0x1571850) at server/ctdb_lock.c:276
  #7  0x000000000045f8aa in _talloc_free_internal (ptr=0x1571850, location=0x487fd9 "server/ctdb_lock.c:471") at lib/talloc/talloc.c:872
  #8  0x000000000045f693 in _talloc_free_children_internal (ptr=0xb4fba0, location=0x487fd9 "server/ctdb_lock.c:471") at lib/talloc/talloc.c:1355
  #9  _talloc_free_internal (ptr=0xb4fba0, location=0x487fd9 "server/ctdb_lock.c:471") at lib/talloc/talloc.c:892
  #10 0x000000000044394f in ctdb_lock_handler (ev=<value optimized out>, tfd=<value optimized out>, flags=<value optimized out>,
  private_data=<value optimized out>) at server/ctdb_lock.c:471
  #11 0x000000000046a54e in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at lib/tevent/tevent_epoll.c:736
  #12 epoll_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at lib/tevent/tevent_epoll.c:931
  #13 0x0000000000467be6 in std_event_loop_once (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent_standard.c:112
  #14 0x0000000000464b9d in _tevent_loop_once (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent.c:530
  #15 0x0000000000464c1b in tevent_common_loop_wait (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent.c:634
  #16 0x0000000000467b56 in std_event_loop_wait (ev=0xab1360, location=0x46ed7f "server/ctdb_daemon.c:1320") at lib/tevent/tevent_standard.c:138
  #17 0x0000000000409cd7 in ctdb_start_daemon (ctdb=0xaa2f40, do_fork=<value optimized out>, use_syslog=<value optimized out>)
  at server/ctdb_daemon.c:1320
  #18 0x000000000040536f in main (argc=<value optimized out>, argv=<value optimized out>) at server/ctdbd.c:323

The issue is that the request member of the lock context is invalid in
ctdb_lock_context_destructor():

  (gdb) p *lock_ctx
  $6 = {next = 0x144f410, prev = 0x11f4210, type = LOCK_RECORD, ctdb = 0xaa2f40, ctdb_db = 0xaccaa0, key = {
      dptr = 0x1263c20 "....", dsize = 24}, priority = 0, auto_mark = true, request = 0x134ac970, child = 377590, fd = {673,
      674}, tfd = 0x128ad60, ttimer = 0x0, start_time = {tv_sec = 1432141949, tv_usec = 954169}, key_hash = 3388878912, can_schedule = false}
  (gdb) p *lock_ctx->request
  Cannot access memory at address 0x134ac970
Comment 1 Amitay Isaacs 2015-05-30 22:06:20 UTC
Metze, thanks for following up on this defect.  I need to understand why this is happening.  Will look at this closely once I am home.
Comment 2 Amitay Isaacs 2015-06-01 07:11:47 UTC
Created attachment 11111 [details]
CTDB locking patches
Comment 3 Amitay Isaacs 2015-06-01 14:56:40 UTC
The content of attachment 11111 [details] has been deleted for the following reason:

incomplete fixes
Comment 4 Stefan Metzmacher 2015-06-02 11:00:08 UTC
Created attachment 11119 [details]
Proposed but untested patches
Comment 5 Amitay Isaacs 2015-06-12 08:13:01 UTC
Patches are now tested, pushing to master.
Comment 6 Amitay Isaacs 2015-06-15 02:06:36 UTC
Created attachment 11151 [details]
Backported patches for v4-2 branch
Comment 7 Karolin Seeger 2015-06-17 18:13:01 UTC
Pushed to autobuild-v4-2-test.
Comment 8 Karolin Seeger 2015-06-22 17:23:40 UTC
(In reply to Karolin Seeger from comment #7)
Pushed to v4-2-test.
Closing out bug report.

Thanks!