OS : ubuntu server 14.04 LTS running as VM in Virtualbox 4.3.10 Host OS: ubuntu server 12.04 LTS Samba4 : 4.1.7 compiled from source in the VM V 4.1.7 as AD-server ran without problems for about 10 days. Today i got several crashes (see log-snippet below). Sorry, i can not attach the core-dump, it was not written in difference to what the log says. There is only an outdated core dump. It seems to be similar to Bug 9853 but now also in V 4.1.7 Regards Ingo ========= extracted from log.smbd ======================= [2014/05/06 11:09:00.566988, 0] ../source3/smbd/open.c:1582(defer_open_done) PANIC: assert failed at ../source3/smbd/open.c(1582): ret [2014/05/06 11:09:00.567884, 0] ../source3/lib/util.c:785(smb_panic_s3) PANIC (pid 9135): assert failed: ret [2014/05/06 11:09:00.571179, 0] ../source3/lib/util.c:896(log_stack_trace) BACKTRACE: 20 stack frames: #0 /usr/local/samba/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f0aab57aac7] #1 /usr/local/samba/lib/libsmbconf.so.0(smb_panic_s3+0x6f) [0x7f0aab57a912] #2 /usr/local/samba/lib/libsamba-util.so.0(smb_panic+0x28) [0x7f0aacff4c40] #3 /usr/local/samba/lib/private/libsmbd_base.so(+0x125053) [0x7f0aacb6d053] #4 /usr/local/samba/lib/private/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7f0aac835a01] #5 /usr/local/samba/lib/private/libtevent.so.0(+0x5a33) [0x7f0aac835a33] #6 /usr/local/samba/lib/private/libtevent.so.0(+0x5d6f) [0x7f0aac835d6f] #7 /usr/local/samba/lib/private/libtevent.so.0(tevent_common_loop_timer_delay+0x246) [0x7f0aac83a95a] #8 /usr/local/samba/lib/libsmbconf.so.0(run_events_poll+0x76) [0x7f0aab596571] #9 /usr/local/samba/lib/libsmbconf.so.0(+0x3ecd6) [0x7f0aab596cd6] #10 /usr/local/samba/lib/private/libtevent.so.0(_tevent_loop_once+0xf4) [0x7f0aac8340a3] #11 /usr/local/samba/lib/private/libsmbd_base.so(smbd_process+0x12a4) [0x7f0aacb9a4f8] #12 /usr/local/samba/sbin/smbd(+0x9abf) [0x7f0aad669abf] #13 /usr/local/samba/lib/libsmbconf.so.0(run_events_poll+0x54f) [0x7f0aab596a4a] #14 /usr/local/samba/lib/libsmbconf.so.0(+0x3ecd6) [0x7f0aab596cd6] #15 /usr/local/samba/lib/private/libtevent.so.0(_tevent_loop_once+0xf4) [0x7f0aac8340a3] #16 /usr/local/samba/sbin/smbd(+0xa737) [0x7f0aad66a737] #17 /usr/local/samba/sbin/smbd(main+0x178e) [0x7f0aad66c019] #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f0aa9dc9ec5] #19 /usr/local/samba/sbin/smbd(+0x57f9) [0x7f0aad6657f9] [2014/05/06 11:09:00.579449, 0] ../source3/lib/dumpcore.c:317(dump_core) dumping core in /usr/local/samba/var/cores/smbd
We are hitting a very similar error condition on Samba 4.1.8, where the assert statement in defer_open_done is triggered. Sometimes it is called from dbwrap_record_watch_done and other times tevent_common_loop_timer_delay. I chose to update this bug because the version number is closer to 4.1.8 than the others but it appears that bug 10407 could also be the same issue. I am attaching a core file, some smbd level 10 log files (seperated by pid) and a sanitized copy of my smb.conf file in a tarball. We can reproduce fairly easily on our test systems. If you need a network trace please let me know. Looking at the logs I see that the last action performed before the assert was that it got the "can't find mid 1072" message inside schedule_deferred_open_message_smb2. But earlier in the log I see it reads: [2014/06/16 15:34:44.693096, 10, pid=16863, effective(0, 0), real(0, 0)] ../source3/smbd/smb2_create.c:1217(remove_deferred_open_message_smb2_internal) remove_deferred_open_message_smb2_internal: mid 1072 Which sounds to me like the mid 1072 was supposed to be already removed. We tried this using max protocol of SMB2 and the result was the same. We could not reproduce the issue when setting max protocol to NT1. GDB bt full follows below: Core was generated by `smbd'. Program terminated with signal 6, Aborted. #0 0x00007fe2fe32d215 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); #0 0x00007fe2fe32d215 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 resultvar = 0 pid = <value optimized out> selftid = 16863 #1 0x00007fe2fe32eb2b in abort () at abort.c:93 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x7fe301a30000, sa_sigaction = 0x7fe301a30000}, sa_mask = {__val = {0, 140612903889912, 140612961752272, 0, 17179869184, 0, 0, 4294967295, 140734141726912, 4294967296, 401258, 2633384, 0, 0, 1, 140612930478080}}, sa_flags = 25347954, sa_restorer = 0x1} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007fe2ffc95f51 in dump_core () at ../source3/lib/dumpcore.c:336 called = true __FUNCTION__ = "dump_core" #3 0x00007fe2ffc7f1f2 in smb_panic_s3 (why=<value optimized out>) at ../source3/lib/util.c:808 cmd = <value optimized out> result = <value optimized out> __FUNCTION__ = "smb_panic_s3" #4 0x00007fe3013e99c1 in smb_panic (why=0x7fe301109bab "assert failed: ret") at ../lib/util/fault.c:159 No locals. #5 0x00007fe300fbd47a in defer_open_done (req=0x0) at ../source3/smbd/open.c:1587 state = 0x7fe302c1c850 status = {v = 0} ret = <value optimized out> __FUNCTION__ = "defer_open_done" #6 0x00007fe2ffc93cf6 in dbwrap_record_watch_done (subreq=0x0) at ../source3/lib/dbwrap/dbwrap_watch.c:378 req = 0x7fe302c54790 state = 0x7fe302c54920 rec = 0x7fe302c24bf0 ret = 0 #7 0x00007fe2fe696632 in tevent_common_loop_immediate (ev=0x7fe302b7c430) at ../tevent_immediate.c:135 im = 0x7fe302c438a0 handler = 0x7fe2fe696ef0 <tevent_req_trigger> private_data = 0x7fe302c437c0 #8 0x00007fe2ffc9e318 in run_events_poll (ev=0x7fe302b7c430, pollrtn=0, pfds=0x0, num_pfds=0) at ../source3/lib/events.c:192 state = <value optimized out> pollfd_idx = <value optimized out> fde = <value optimized out> __FUNCTION__ = "run_events_poll" #9 0x00007fe2ffc9e608 in s3_event_loop_once (ev=0x7fe302b7c430, location=<value optimized out>) at ../source3/lib/events.c:303 state = 0x7fe302b7af20 timeout = 2147483647 num_pfds = <value optimized out> ret = <value optimized out> poll_errno = <value optimized out> #10 0x00007fe2fe695d8d in _tevent_loop_once (ev=0x7fe302b7c430, location=0x7fe3011197a8 "../source3/smbd/process.c:3695") at ../tevent.c:530 ret = <value optimized out> nesting_stack_ptr = 0x0 #11 0x00007fe300fe0d10 in smbd_process (ev_ctx=0x7fe302b7c430, msg_ctx=<value optimized out>, sock_fd=<value optimized out>, interactive=128) at ../source3/smbd/process.c:3695 frame = 0x7fe302baa250 conn = <value optimized out> sconn = <value optimized out> ss = {ss_family = 2, __ss_align = 0, __ss_padding = "\000\000\000\000\000\000\000\000e\005\350\253¤\263\275\000\000\244\001\343\177\000\000rǂ\001\343\177\000\000\001\000\000\000\343\177\000\000\000\000\000\000\000\000\000\000 ŷ\002\343\177\000\000\230\310\354\000\343\177\000\000e\005\350\253¤\263\275\000\000\000\000\000\000\000\000\037\000\000\000\000\000\000\000\060ķ\002\343\177\000\000 ŷ\002\343\177\000\000\245'\203\001\343\177\000"} sa = 0x7fff388688a0 sa_socklen = <value optimized out> local_address = 0x7fe302b913a0 remote_address = 0x7fe302b88490 locaddr = <value optimized out> remaddr = <value optimized out> rhost = 0x0 ret = <value optimized out> tmp = <value optimized out> __FUNCTION__ = "smbd_process" #12 0x00007fe301a488b4 in smbd_accept_connection (ev=0x7fe302b7c430, fde=<value optimized out>, flags=<value optimized out>, private_data=<value optimized out>) at ../source3/smbd/server.c:628 status = <value optimized out> s = 0x0 msg_ctx = 0x7fe302b7c520 addr = {ss_family = 2, __ss_align = 0, __ss_padding = "\000\212\206\070\377\177\000\000\000\000\000\000\000\000\000\000\300\211\206\070\377\177\000\000\354\370\233\070\377\177\000\000\001\000\000\000\000\000\000\000 \257\267\002\343\177\000\000 \252\270\002\343\177\000\000*P9\376\342\177\000\000dF\237S\000\000\000\000\321\a>\001\343\177\000\000\060ķ\002\343\177\000\000Z\240i\376\342\177\000\000;\000\000\000\000\000\000\000T@\017\000\000\000\000"} in_addrlen = 16 fd = 31 pid = 0 unique_id = 13669450450014307685 __FUNCTION__ = "smbd_accept_connection" #13 0x00007fe2ffc9e564 in run_events_poll (ev=0x7fe302b7c430, pollrtn=<value optimized out>, pfds=0x7fe302b8aa20, num_pfds=5) at ../source3/lib/events.c:257 pfd = <value optimized out> flags = <value optimized out> state = <value optimized out> pollfd_idx = 0x7fe302b84a00 fde = 0x7fe302b90170 __FUNCTION__ = "run_events_poll" #14 0x00007fe2ffc9e6d6 in s3_event_loop_once (ev=0x7fe302b7c430, location=<value optimized out>) at ../source3/lib/events.c:326 state = 0x7fe302b7af20 timeout = 60000 num_pfds = 5 ret = 1 poll_errno = 4 #15 0x00007fe2fe695d8d in _tevent_loop_once (ev=0x7fe302b7c430, location=0x7fe301a4ce04 "../source3/smbd/server.c:950") at ../tevent.c:530 ret = <value optimized out> nesting_stack_ptr = 0x0 #16 0x00007fe301a4a7cd in smbd_parent_loop (argc=<value optimized out>, argv=<value optimized out>) at ../source3/smbd/server.c:950 ret = <value optimized out> frame = 0x7fe302b7d4f0 #17 main (argc=<value optimized out>, argv=<value optimized out>) at ../source3/smbd/server.c:1582 is_daemon = <value optimized out> interactive = <value optimized out> Fork = true no_process_group = <value optimized out> log_stdout = 139 ports = 0x0 profile_level = 0x0 opt = <value optimized out> pc = <value optimized out> print_build_options = <value optimized out> long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7fe2fecb3420, val = 0, descrip = 0x7fe301a4c6ca "Help options:", argDescrip = 0x0}, {longName = 0x7fe301a4c6d8 "daemon", shortName = 68 'D', argInfo = 0, arg = 0x0, val = 1000, descrip = 0x7fe301a4c6df "Become a daemon (default)", argDescrip = 0x0}, { longName = 0x7fe301a4c6f9 "interactive", shortName = 105 'i', argInfo = 0, arg = 0x0, val = 1001, descrip = 0x7fe301a4d248 "Run interactive (not a daemon)", argDescrip = 0x0}, {longName = 0x7fe301a4c705 "foreground", shortName = 70 'F', argInfo = 0, arg = 0x0, val = 1002, descrip = 0x7fe301a4d268 "Run daemon in foreground (for daemontools, etc.)", argDescrip = 0x0}, {longName = 0x7fe301a4c710 "no-process-group", shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 1003, descrip = 0x7fe301a4d2a0 "Don't create a new process group", argDescrip = 0x0}, {longName = 0x7fe301a4c721 "log-stdout", shortName = 83 'S', argInfo = 0, arg = 0x0, val = 1004, descrip = 0x7fe301a4c72c "Log to stdout", argDescrip = 0x0}, { longName = 0x7fe301a4c73a "build-options", shortName = 98 'b', argInfo = 0, arg = 0x0, val = 98, descrip = 0x7fe301a4c748 "Print build options", argDescrip = 0x0}, {longName = 0x7fe301a4c75c "port", shortName = 112 'p', argInfo = 1, arg = 0x7fff38868ef8, val = 0, descrip = 0x7fe301a4c761 "Listen on the specified ports", argDescrip = 0x0}, {longName = 0x7fe301a4c77f "profiling-level", shortName = 80 'P', argInfo = 1, arg = 0x7fff38868ef0, val = 0, descrip = 0x7fe301a4c78f "Set profiling level", argDescrip = 0x7fe301a4c7a3 "PROFILE_LEVEL"}, {longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7fe30030f860, val = 0, descrip = 0x7fe301a4c7b1 "Common samba options:", argDescrip = 0x0}, { longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7fe30030f040, val = 0, descrip = 0x7fe301a4d2c8 "Build-time configuration overrides:", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} parent = <value optimized out> frame = <value optimized out> status = <value optimized out> ev_ctx = 0x7fe302b7c430 msg_ctx = 0x7fe302b7c520 server_id = {pid = 14997, task_id = 0, vnn = 4294967295, unique_id = 9428862371530993319} se = <value optimized out> np_dir = <value optimized out> smbd_shim_fns = { cancel_pending_lock_requests_by_fid = 0x7fe300fc9a60 <smbd_cancel_pending_lock_requests_by_fid>, send_stat_cache_delete_message = 0x7fe300fcfb10 <smbd_send_stat_cache_delete_message>, change_to_root_user = 0x7fe300fb7fa0 <smbd_change_to_root_user>, contend_level2_oplocks_begin = 0x7fe301019e30 <smbd_contend_level2_oplocks_begin>, contend_level2_oplocks_end = 0x7fe30101a000 <smbd_contend_level2_oplocks_end>, become_root = 0x7fe300fb8090 <smbd_become_root>, unbecome_root = 0x7fe300fb80c0 <smbd_unbecome_root>, exit_server = 0x7fe301012af0 <smbd_exit_server>, exit_server_cleanly = 0x7fe301012b10 <smbd_exit_server_cleanly>} __FUNCTION__ = "main"
Created attachment 10037 [details] tarball w/ core, logs, smb.conf
(In reply to comment #2) > Created attachment 10037 [details] > tarball w/ core, logs, smb.conf Thanks! With that info I got it reproduced. Expect a fix soon.
Created attachment 10042 [details] patch for master This is a reproducer for the bug including a fix for master. This fix might be a little intrusive and will need some patch cosmetics, but I'd like to present it already now for review.
Created attachment 10043 [details] Quick patch for 4.1 This is a quick, minimal patch that should do it for 4.1
(In reply to comment #4) > Created attachment 10042 [details] > patch for master > > This is a reproducer for the bug including a fix for master. This fix might be > a little intrusive and will need some patch cosmetics, but I'd like to present > it already now for review. Woohoo ! GREAT CATCH ! That must have taken a while.
Ok, I've gone through the code for master *very* carefully, and run the torture3 against an unpatched server (crashes + wireshark trace) and against a patched server (no crash + wireshark trace :-). I particularly like the cleanups to remove the anonymous blob in the pending message lists and replace it with a known struct (I knew that was a horrible hack, but couldn't figure out how to fix it properly :-). I'm currently running a local make test. I'll update when (if:-) it passes. Is there anything else you need on this before you're happy with push to master ? Cheers, Jeremy.
(In reply to comment #7) > I'm currently running a local make test. I'll update when (if:-) it passes. Is > there anything else you need on this before you're happy with push to master ? It just survived a autobuild on sn-devel. The patches as such are good I think. There's one or two commit messages that are missing signed-off-by's and could be a bit more elaborate. Second, we need to run the new test in autobuild. But now it's late here :-) What do you think about the 4.1 quick patch? Should functionally do it as well. Volker
(In reply to comment #8) > (In reply to comment #7) > > I'm currently running a local make test. I'll update when (if:-) it passes. It passed :-). > It just survived a autobuild on sn-devel. The patches as such are good I think. > There's one or two commit messages that are missing signed-off-by's and could > be a bit more elaborate. Second, we need to run the new test in autobuild. But > now it's late here :-) Sure, update the commit messages and plug in the new test to autobuild, but the functionality + code is good to go into master IMHO. Feel free to add my 'Reviewed-by:' tags to the master patchset + push. > What do you think about the 4.1 quick patch? Should functionally do it as well. Haven't had time to test a 4.1.x build against the new test yet. Will do this no later than Monday next week (might get to this over the weekend). Bit busy at the plugfest here at the moment, then getting on a plane in 1 hour :-).
Comment on attachment 10043 [details] Quick patch for 4.1 Tested - this patch fixes it for 4.1.x. Volker, if you're happy with this request my +1 and I'll give it and re-assign to Karolin for inclusion. Cheers, Jeremy.
*** Bug 10498 has been marked as a duplicate of this bug. ***
Comment on attachment 10043 [details] Quick patch for 4.1 LGTM.
Re-assigning to Karolin for inclusion in 4.1.next. Jeremy.
(In reply to comment #13) > Re-assigning to Karolin for inclusion in 4.1.next. > Jeremy. Pushed to autobuild-v4-1-test.
Pushed to v4-1-test. Closing out bug report. Thanks!
Jeremy/Volker, We are using samba 4.3.11+ stack which has the actual changes for this issue. But we are still seeing the same issue at one of our customers. (gdb) bt #0 0x00007f41937335e5 in raise () from /lib64/libc.so.6 #1 0x00007f4193734dc5 in abort () from /lib64/libc.so.6 #2 0x00007f4194e258c3 in dump_core () at ../source3/lib/dumpcore.c:322 #3 0x00007f4194e137d1 in smb_panic_s3 (why=0x7f41973a551c "assert failed: ret") at ../source3/lib/util.c:812 #4 0x00007f41976acd55 in smb_panic (why=0x7f41973a551c "assert failed: ret") at ../lib/util/fault.c:166 #5 0x00007f41971d086c in defer_open_done (req=0x0) at ../source3/smbd/open.c:2298 #6 0x00007f419664e318 in _tevent_req_notify_callback (req=0x564467687a90, location=0x7f4194e55d90 "../source3/lib/dbwrap/dbwrap_watch.c:389") at ../lib/tevent/tevent_req.c:112 #7 0x00007f419664e3ed in tevent_req_finish (req=0x564467687a90, state=TEVENT_REQ_DONE, location=0x7f4194e55d90 "../source3/lib/dbwrap/dbwrap_watch.c:389") at ../lib/tevent/tevent_req.c:149 #8 0x00007f419664e414 in _tevent_req_done (req=0x564467687a90, location=0x7f4194e55d90 "../source3/lib/dbwrap/dbwrap_watch.c:389") at ../lib/tevent/tevent_req.c:155 #9 0x00007f4194e1c26d in dbwrap_record_watch_done (subreq=0x0) at ../source3/lib/dbwrap/dbwrap_watch.c:389 (gdb) p *state->xconn->smb2.requests->subreq $4 = {async = {fn = 0, private_data = 0x0}, data = 0x5589eb25d4a0, private_print = 0, private_cancel = 0x7ffb8be01b2b <smbd_smb2_create_cancel>, private_cleanup = {fn = 0, state = TEVENT_REQ_INIT}, internal = { private_type = 0x7ffb8bf9386a "struct smbd_smb2_create_state", create_location = 0x7ffb8bf93848 "../source3/smbd/smb2_create.c:504", finish_location = 0x7ffb8bf94c08 "../source3/smbd/smb2_create.c:1622", cancel_location = 0x0, state = **TEVENT_REQ_USER_ERROR***, finish_location actually maps to the setting "tevent_req_nterror(req, NT_STATUS_CANCELLED)". Which means that the request has been cancelled when it was in deferred queue. As per the bug fix, dbwrap_record_watch_done callback should have dropped the cancelled requests. Looks like its not the case here.