Bug 6210 - abort()'s in tevent cleanup
Summary: abort()'s in tevent cleanup
Alias: None
Product: Samba 4.0
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: Other Linux
: P1 regression (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Andrew Bartlett
Depends on:
Reported: 2009-03-24 21:57 UTC by Andrew Kroeger
Modified: 2009-05-22 01:10 UTC (History)
0 users

See Also:

Debugging output referenced in initial bug report. (12.94 KB, text/plain)
2009-03-24 21:58 UTC, Andrew Kroeger
no flags Details
samba.log when running without '-M single' (25.80 KB, text/plain)
2009-03-25 00:13 UTC, Andrew Kroeger
no flags Details
gdb backtrace when running with '-M single' (8.69 KB, text/plain)
2009-03-25 00:14 UTC, Andrew Kroeger
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Kroeger 2009-03-24 21:57:33 UTC
ldbedit (actually all ldb* command-line tools) abort() with a "Bad talloc magic value - unknown value" error.  Initially found error when running with the --help option, but also found error occurs when actually editing.

The comments below reference gdb debugging output which I will attach after opening this bug.  Note that I added the talloc_enable_leak_report_full() call shown in the debugging output to help my debugging efforts.

From the debuging output:
- Once the tevent_context has been setup in ldb_init():
  - ev_ctx == (struct tevent_context *) 0x110e370
  - ev_ctx->pipe_fde == (struct tevent_fd *) 0x110e430
  - ev_ctx->debug_ops.debug == 0x469588 <ldb_tevent_debug>

After these values are set, I call talloc_report_full() to verify the current talloc stack looks good.  Continuing the execution to the exit() call [line 106 in the debugging output] shows the help text displays correctly.  I then use the value of the ev_ctx pointer from above to verify the values of pipe_fde and debug_ops.debug, which are still correct.

Continuing the execution from there results in the abort() call.  The talloc error message can be seen in frame #2 of the backtrace.  Looking at backtrace frames #4, #5, #6 shows we are trying to free the ev_ctx->debug_ops.debug pointer (function pointer passed to tevent_set_debug() in line 18 of the debugging output).  However backtrace frame #7 indicates the talloc_free() was called on line 150 of ../lib/tevent/tevent.c, which is talloc_free(ev->pipe_fde).  This indicates some sort of corruption in the cleanup process, as the values of pipe_fde and debug_ops.debug were verified before the cleanup started as part of the exit() call.

Note that in all my attempts to debug the error, the talloc_free() call always aborts while attempting to free the address of the ldb_tevent_debug function, which is passed to tevent_set_debug().
Comment 1 Andrew Kroeger 2009-03-24 21:58:29 UTC
Created attachment 4010 [details]
Debugging output referenced in initial bug report.
Comment 2 Andrew Kroeger 2009-03-25 00:12:36 UTC
The original reported issue is not confined to the ldb* command-line tools - it also happens within the samba4 daemon.

When running without the '-M single' arguments, there are numerous abort() calls.  I will attach the samba.log from one of these runs.

When running with '-M single', there is only one single abort() after stopping with SIGTERM.  I will attach a gdb backtrace from this run.
Comment 3 Andrew Kroeger 2009-03-25 00:13:36 UTC
Created attachment 4011 [details]
samba.log when running without '-M single'
Comment 4 Andrew Kroeger 2009-03-25 00:14:13 UTC
Created attachment 4012 [details]
gdb backtrace when running with '-M single'
Comment 5 Matthias Dieter Wallnöfer 2009-04-13 08:24:54 UTC
I can't reproduce it now, so I mark it as "FIXED". If it isn't, please reopen!
Comment 6 Andrew Kroeger 2009-04-15 02:55:26 UTC
I had just reproduced this again over the weekend (4.0.0alpha8-GIT-a1d8315).  I'm running 64-bit Fedora 9 w/ all updates.  If there has been something committed since what I am working with that is expected to fix the issue I have reported, please let me know.

I am changing severity of this bug to blocker.  I understand that there may be issues reproducing the bug under different installations/configurations, however I believe what I have observed should be corrected before the next alpha is cut.

If there are any problems reproducing this issue, please post through this bug, as I cannot currently commit to my schedule of availability.  I will respond to posts on this bug and am willing to work with someone with better gdb-foo to track down the issue.
Comment 7 Andrew Kroeger 2009-05-22 01:10:32 UTC
Okay, I finally got some time to look into this again.  Got quite a surprise too...

I have shell scripts to do all my builds (including clean builds), installs, provisions, etc., and I just found out my clean build script (which uses "git clean -dfx") was not completely cleaning my build environment.  As it turns out, git clean (even with the "-dfx" arguments) only cleans from the current directory down, and my clean build script was being run in the source4 directory.  Therefore, none of the common directories that have been moved for the merged build (including tevent) were being cleaned.  Once I realized what was happening and ran the git clean from the top-level samba directory, all works fine.

No bug here, so closing as INVALID.