Bug 10593 - PANIC: assert failed at ../source3/smbd/open.c(1582): ret
Summary: PANIC: assert failed at ../source3/smbd/open.c(1582): ret
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.1.7
Hardware: x64 Linux
: P5 critical (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
: 10498 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-05-06 11:49 UTC by Ingo Denner
Modified: 2018-03-27 21:14 UTC (History)
2 users (show)

See Also:


Attachments
tarball w/ core, logs, smb.conf (71 bytes, text/plain)
2014-06-17 14:10 UTC, John Mulligan
no flags Details
patch for master (47.06 KB, patch)
2014-06-20 14:28 UTC, Volker Lendecke
no flags Details
Quick patch for 4.1 (945 bytes, patch)
2014-06-20 14:29 UTC, Volker Lendecke
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ingo Denner 2014-05-06 11:49:28 UTC
OS     : ubuntu server 14.04 LTS running as VM in Virtualbox 4.3.10
Host OS: ubuntu server 12.04 LTS
Samba4 : 4.1.7 compiled from source in the VM

V 4.1.7 as AD-server ran without problems for about 10 days. Today i got several crashes (see log-snippet below).
Sorry, i can not attach the core-dump, it was not written in difference to what the log says. There is only an outdated core dump.
It seems to be similar to Bug 9853 but now also in V 4.1.7
Regards Ingo

========= extracted from log.smbd =======================
[2014/05/06 11:09:00.566988,  0] ../source3/smbd/open.c:1582(defer_open_done)
  PANIC: assert failed at ../source3/smbd/open.c(1582): ret
[2014/05/06 11:09:00.567884,  0] ../source3/lib/util.c:785(smb_panic_s3)
  PANIC (pid 9135): assert failed: ret
[2014/05/06 11:09:00.571179,  0] ../source3/lib/util.c:896(log_stack_trace)
  BACKTRACE: 20 stack frames:
   #0 /usr/local/samba/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f0aab57aac7]
   #1 /usr/local/samba/lib/libsmbconf.so.0(smb_panic_s3+0x6f) [0x7f0aab57a912]
   #2 /usr/local/samba/lib/libsamba-util.so.0(smb_panic+0x28) [0x7f0aacff4c40]
   #3 /usr/local/samba/lib/private/libsmbd_base.so(+0x125053) [0x7f0aacb6d053]
   #4 /usr/local/samba/lib/private/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7f0aac835a01]
   #5 /usr/local/samba/lib/private/libtevent.so.0(+0x5a33) [0x7f0aac835a33]
   #6 /usr/local/samba/lib/private/libtevent.so.0(+0x5d6f) [0x7f0aac835d6f]
   #7 /usr/local/samba/lib/private/libtevent.so.0(tevent_common_loop_timer_delay+0x246) [0x7f0aac83a95a]
   #8 /usr/local/samba/lib/libsmbconf.so.0(run_events_poll+0x76) [0x7f0aab596571]
   #9 /usr/local/samba/lib/libsmbconf.so.0(+0x3ecd6) [0x7f0aab596cd6]
   #10 /usr/local/samba/lib/private/libtevent.so.0(_tevent_loop_once+0xf4) [0x7f0aac8340a3]
   #11 /usr/local/samba/lib/private/libsmbd_base.so(smbd_process+0x12a4) [0x7f0aacb9a4f8]
   #12 /usr/local/samba/sbin/smbd(+0x9abf) [0x7f0aad669abf]
   #13 /usr/local/samba/lib/libsmbconf.so.0(run_events_poll+0x54f) [0x7f0aab596a4a]
   #14 /usr/local/samba/lib/libsmbconf.so.0(+0x3ecd6) [0x7f0aab596cd6]
   #15 /usr/local/samba/lib/private/libtevent.so.0(_tevent_loop_once+0xf4) [0x7f0aac8340a3]
   #16 /usr/local/samba/sbin/smbd(+0xa737) [0x7f0aad66a737]
   #17 /usr/local/samba/sbin/smbd(main+0x178e) [0x7f0aad66c019]
   #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f0aa9dc9ec5]
   #19 /usr/local/samba/sbin/smbd(+0x57f9) [0x7f0aad6657f9]
[2014/05/06 11:09:00.579449,  0] ../source3/lib/dumpcore.c:317(dump_core)
  dumping core in /usr/local/samba/var/cores/smbd
Comment 1 John Mulligan 2014-06-17 14:04:25 UTC
We are hitting a very similar error condition on Samba 4.1.8, where the assert statement in defer_open_done is triggered. Sometimes it is called from dbwrap_record_watch_done and other times tevent_common_loop_timer_delay.
I chose to update this bug because the version number is closer to 4.1.8 than the others but it appears that bug 10407 could also be the same issue.

I am attaching a core file, some smbd level 10 log files (seperated by pid) and
a sanitized copy of my smb.conf file in a tarball. We can reproduce fairly
easily on our test systems. If you need a network trace please let me know.

Looking at the logs I see that the last action performed before the assert
was that it got the "can't find mid 1072" message inside schedule_deferred_open_message_smb2. But earlier in the log I see it reads:
[2014/06/16 15:34:44.693096, 10, pid=16863, effective(0, 0), real(0, 0)] ../source3/smbd/smb2_create.c:1217(remove_deferred_open_message_smb2_internal)
  remove_deferred_open_message_smb2_internal: mid 1072
Which sounds to me like the mid 1072 was supposed to be already removed.

We tried this using max protocol of SMB2 and the result was the same.  We
could not reproduce the issue when setting max protocol to NT1.


GDB bt full follows below:

Core was generated by `smbd'.
Program terminated with signal 6, Aborted.
#0  0x00007fe2fe32d215 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
#0  0x00007fe2fe32d215 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = <value optimized out>
        selftid = 16863
#1  0x00007fe2fe32eb2b in abort () at abort.c:93
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7fe301a30000,
            sa_sigaction = 0x7fe301a30000}, sa_mask = {__val = {0, 140612903889912,
              140612961752272, 0, 17179869184, 0, 0, 4294967295, 140734141726912,
              4294967296, 401258, 2633384, 0, 0, 1, 140612930478080}}, 
          sa_flags = 25347954, sa_restorer = 0x1}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fe2ffc95f51 in dump_core () at ../source3/lib/dumpcore.c:336
        called = true
        __FUNCTION__ = "dump_core"
#3  0x00007fe2ffc7f1f2 in smb_panic_s3 (why=<value optimized out>)
    at ../source3/lib/util.c:808
        cmd = <value optimized out>
        result = <value optimized out>
        __FUNCTION__ = "smb_panic_s3"
#4  0x00007fe3013e99c1 in smb_panic (why=0x7fe301109bab "assert failed: ret")
    at ../lib/util/fault.c:159
No locals.
#5  0x00007fe300fbd47a in defer_open_done (req=0x0) at ../source3/smbd/open.c:1587
        state = 0x7fe302c1c850
        status = {v = 0}
        ret = <value optimized out>
        __FUNCTION__ = "defer_open_done"
#6  0x00007fe2ffc93cf6 in dbwrap_record_watch_done (subreq=0x0)
    at ../source3/lib/dbwrap/dbwrap_watch.c:378
        req = 0x7fe302c54790
        state = 0x7fe302c54920
        rec = 0x7fe302c24bf0
        ret = 0
#7  0x00007fe2fe696632 in tevent_common_loop_immediate (ev=0x7fe302b7c430)
    at ../tevent_immediate.c:135
        im = 0x7fe302c438a0
        handler = 0x7fe2fe696ef0 <tevent_req_trigger>
        private_data = 0x7fe302c437c0
#8  0x00007fe2ffc9e318 in run_events_poll (ev=0x7fe302b7c430, pollrtn=0, pfds=0x0,
    num_pfds=0) at ../source3/lib/events.c:192
        state = <value optimized out>
        pollfd_idx = <value optimized out>
        fde = <value optimized out>
        __FUNCTION__ = "run_events_poll"
#9  0x00007fe2ffc9e608 in s3_event_loop_once (ev=0x7fe302b7c430,
    location=<value optimized out>) at ../source3/lib/events.c:303
        state = 0x7fe302b7af20
        timeout = 2147483647
        num_pfds = <value optimized out>
        ret = <value optimized out>
        poll_errno = <value optimized out>
#10 0x00007fe2fe695d8d in _tevent_loop_once (ev=0x7fe302b7c430,
    location=0x7fe3011197a8 "../source3/smbd/process.c:3695") at ../tevent.c:530
        ret = <value optimized out>
        nesting_stack_ptr = 0x0
#11 0x00007fe300fe0d10 in smbd_process (ev_ctx=0x7fe302b7c430,
    msg_ctx=<value optimized out>, sock_fd=<value optimized out>, interactive=128)
    at ../source3/smbd/process.c:3695
        frame = 0x7fe302baa250
        conn = <value optimized out>
        sconn = <value optimized out>
        ss = {ss_family = 2, __ss_align = 0,
          __ss_padding = "\000\000\000\000\000\000\000\000e\005\350\253¤\263\275\000\000\244\001\343\177\000\000rǂ\001\343\177\000\000\001\000\000\000\343\177\000\000\000\000\000\000\000\000\000\000 ŷ\002\343\177\000\000\230\310\354\000\343\177\000\000e\005\350\253¤\263\275\000\000\000\000\000\000\000\000\037\000\000\000\000\000\000\000\060ķ\002\343\177\000\000 ŷ\002\343\177\000\000\245'\203\001\343\177\000"}
        sa = 0x7fff388688a0
        sa_socklen = <value optimized out>
        local_address = 0x7fe302b913a0
        remote_address = 0x7fe302b88490
        locaddr = <value optimized out>
        remaddr = <value optimized out>
        rhost = 0x0
        ret = <value optimized out>
        tmp = <value optimized out>
        __FUNCTION__ = "smbd_process"
#12 0x00007fe301a488b4 in smbd_accept_connection (ev=0x7fe302b7c430,
    fde=<value optimized out>, flags=<value optimized out>,
    private_data=<value optimized out>) at ../source3/smbd/server.c:628
        status = <value optimized out>
        s = 0x0
        msg_ctx = 0x7fe302b7c520
        addr = {ss_family = 2, __ss_align = 0,
          __ss_padding = "\000\212\206\070\377\177\000\000\000\000\000\000\000\000\000\000\300\211\206\070\377\177\000\000\354\370\233\070\377\177\000\000\001\000\000\000\000\000\000\000 \257\267\002\343\177\000\000 \252\270\002\343\177\000\000*P9\376\342\177\000\000dF\237S\000\000\000\000\321\a>\001\343\177\000\000\060ķ\002\343\177\000\000Z\240i\376\342\177\000\000;\000\000\000\000\000\000\000T@\017\000\000\000\000"}
        in_addrlen = 16
        fd = 31
        pid = 0
        unique_id = 13669450450014307685
        __FUNCTION__ = "smbd_accept_connection"
#13 0x00007fe2ffc9e564 in run_events_poll (ev=0x7fe302b7c430,
    pollrtn=<value optimized out>, pfds=0x7fe302b8aa20, num_pfds=5)
    at ../source3/lib/events.c:257
        pfd = <value optimized out>
        flags = <value optimized out>
        state = <value optimized out>
        pollfd_idx = 0x7fe302b84a00
        fde = 0x7fe302b90170
        __FUNCTION__ = "run_events_poll"
#14 0x00007fe2ffc9e6d6 in s3_event_loop_once (ev=0x7fe302b7c430,
    location=<value optimized out>) at ../source3/lib/events.c:326
        state = 0x7fe302b7af20
        timeout = 60000
        num_pfds = 5
        ret = 1
        poll_errno = 4
#15 0x00007fe2fe695d8d in _tevent_loop_once (ev=0x7fe302b7c430,
    location=0x7fe301a4ce04 "../source3/smbd/server.c:950") at ../tevent.c:530
        ret = <value optimized out>
        nesting_stack_ptr = 0x0
#16 0x00007fe301a4a7cd in smbd_parent_loop (argc=<value optimized out>,
    argv=<value optimized out>) at ../source3/smbd/server.c:950
        ret = <value optimized out>
        frame = 0x7fe302b7d4f0
#17 main (argc=<value optimized out>, argv=<value optimized out>)
    at ../source3/smbd/server.c:1582
        is_daemon = <value optimized out>
        interactive = <value optimized out>
        Fork = true
        no_process_group = <value optimized out>
        log_stdout = 139
        ports = 0x0
        profile_level = 0x0
        opt = <value optimized out>
        pc = <value optimized out>
        print_build_options = <value optimized out>
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4,
            arg = 0x7fe2fecb3420, val = 0, descrip = 0x7fe301a4c6ca "Help options:",
            argDescrip = 0x0}, {longName = 0x7fe301a4c6d8 "daemon", shortName = 68 'D',
            argInfo = 0, arg = 0x0, val = 1000,
            descrip = 0x7fe301a4c6df "Become a daemon (default)", argDescrip = 0x0}, {
            longName = 0x7fe301a4c6f9 "interactive", shortName = 105 'i', argInfo = 0,
            arg = 0x0, val = 1001,
            descrip = 0x7fe301a4d248 "Run interactive (not a daemon)",
            argDescrip = 0x0}, {longName = 0x7fe301a4c705 "foreground",
            shortName = 70 'F', argInfo = 0, arg = 0x0, val = 1002,
            descrip = 0x7fe301a4d268 "Run daemon in foreground (for daemontools, etc.)",
            argDescrip = 0x0}, {longName = 0x7fe301a4c710 "no-process-group",
            shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 1003,
            descrip = 0x7fe301a4d2a0 "Don't create a new process group",
            argDescrip = 0x0}, {longName = 0x7fe301a4c721 "log-stdout",
            shortName = 83 'S', argInfo = 0, arg = 0x0, val = 1004,
            descrip = 0x7fe301a4c72c "Log to stdout", argDescrip = 0x0}, {
            longName = 0x7fe301a4c73a "build-options", shortName = 98 'b', argInfo = 0,
            arg = 0x0, val = 98, descrip = 0x7fe301a4c748 "Print build options",
            argDescrip = 0x0}, {longName = 0x7fe301a4c75c "port", shortName = 112 'p',
            argInfo = 1, arg = 0x7fff38868ef8, val = 0,
            descrip = 0x7fe301a4c761 "Listen on the specified ports", argDescrip = 0x0},
          {longName = 0x7fe301a4c77f "profiling-level", shortName = 80 'P', argInfo = 1,
            arg = 0x7fff38868ef0, val = 0,
            descrip = 0x7fe301a4c78f "Set profiling level",
            argDescrip = 0x7fe301a4c7a3 "PROFILE_LEVEL"}, {longName = 0x0,
            shortName = 0 '\000', argInfo = 4, arg = 0x7fe30030f860, val = 0,
            descrip = 0x7fe301a4c7b1 "Common samba options:", argDescrip = 0x0}, {
            longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7fe30030f040,
            val = 0, descrip = 0x7fe301a4d2c8 "Build-time configuration overrides:",
            argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0,
            arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}}
        parent = <value optimized out>
        frame = <value optimized out>
        status = <value optimized out>
        ev_ctx = 0x7fe302b7c430
        msg_ctx = 0x7fe302b7c520
        server_id = {pid = 14997, task_id = 0, vnn = 4294967295,
          unique_id = 9428862371530993319}
        se = <value optimized out>
        np_dir = <value optimized out>
        smbd_shim_fns = {
          cancel_pending_lock_requests_by_fid = 0x7fe300fc9a60 <smbd_cancel_pending_lock_requests_by_fid>,
          send_stat_cache_delete_message = 0x7fe300fcfb10 <smbd_send_stat_cache_delete_message>, change_to_root_user = 0x7fe300fb7fa0 <smbd_change_to_root_user>,
          contend_level2_oplocks_begin = 0x7fe301019e30 <smbd_contend_level2_oplocks_begin>, contend_level2_oplocks_end = 0x7fe30101a000 <smbd_contend_level2_oplocks_end>,
          become_root = 0x7fe300fb8090 <smbd_become_root>,
          unbecome_root = 0x7fe300fb80c0 <smbd_unbecome_root>,
          exit_server = 0x7fe301012af0 <smbd_exit_server>,
          exit_server_cleanly = 0x7fe301012b10 <smbd_exit_server_cleanly>}
        __FUNCTION__ = "main"
Comment 2 John Mulligan 2014-06-17 14:10:44 UTC
Created attachment 10037 [details]
tarball w/ core, logs, smb.conf
Comment 3 Volker Lendecke 2014-06-20 10:31:32 UTC
(In reply to comment #2)
> Created attachment 10037 [details]
> tarball w/ core, logs, smb.conf

Thanks! With that info I got it reproduced. Expect a fix soon.
Comment 4 Volker Lendecke 2014-06-20 14:28:20 UTC
Created attachment 10042 [details]
patch for master

This is a reproducer for the bug including a fix for master. This fix might be a little intrusive and will need some patch cosmetics, but I'd like to present it already now for review.
Comment 5 Volker Lendecke 2014-06-20 14:29:20 UTC
Created attachment 10043 [details]
Quick patch for 4.1

This is a quick, minimal patch that should do it for 4.1
Comment 6 Jeremy Allison 2014-06-20 16:26:57 UTC
(In reply to comment #4)
> Created attachment 10042 [details]
> patch for master
> 
> This is a reproducer for the bug including a fix for master. This fix might be
> a little intrusive and will need some patch cosmetics, but I'd like to present
> it already now for review.

Woohoo ! GREAT CATCH ! That must have taken a while.
Comment 7 Jeremy Allison 2014-06-20 17:58:19 UTC
Ok, I've gone through the code for master *very* carefully, and run the torture3 against an unpatched server (crashes + wireshark trace) and against a patched server (no crash + wireshark trace :-).

I particularly like the cleanups to remove the anonymous blob in the pending message lists and replace it with a known struct (I knew that was a horrible hack, but couldn't figure out how to fix it properly :-).

I'm currently running a local make test. I'll update when (if:-) it passes. Is there anything else you need on this before you're happy with push to master ?

Cheers,

Jeremy.
Comment 8 Volker Lendecke 2014-06-20 19:44:32 UTC
(In reply to comment #7)
> I'm currently running a local make test. I'll update when (if:-) it passes. Is
> there anything else you need on this before you're happy with push to master ?

It just survived a autobuild on sn-devel. The patches as such are good I think. There's one or two commit messages that are missing signed-off-by's and could be a bit more elaborate. Second, we need to run the new test in autobuild. But now it's late here :-)

What do you think about the 4.1 quick patch? Should functionally do it as well.

Volker
Comment 9 Jeremy Allison 2014-06-20 19:53:42 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I'm currently running a local make test. I'll update when (if:-) it passes. 

It passed :-).

> It just survived a autobuild on sn-devel. The patches as such are good I think.
> There's one or two commit messages that are missing signed-off-by's and could
> be a bit more elaborate. Second, we need to run the new test in autobuild. But
> now it's late here :-)

Sure, update the commit messages and plug in the new test to autobuild, but the functionality + code is good to go into master IMHO.

Feel free to add my 'Reviewed-by:' tags to the master patchset + push.

> What do you think about the 4.1 quick patch? Should functionally do it as well.

Haven't had time to test a 4.1.x build against the new test yet. Will do this no later than Monday next week (might get to this over the weekend). Bit busy at the plugfest here at the moment, then getting on a plane in 1 hour :-).
Comment 10 Jeremy Allison 2014-06-23 20:38:52 UTC
Comment on attachment 10043 [details]
Quick patch for 4.1

Tested - this patch fixes it for 4.1.x.

Volker, if you're happy with this request my +1 and I'll give it and re-assign to Karolin for inclusion.

Cheers,

Jeremy.
Comment 11 Volker Lendecke 2014-06-24 06:52:09 UTC
*** Bug 10498 has been marked as a duplicate of this bug. ***
Comment 12 Jeremy Allison 2014-06-24 17:34:10 UTC
Comment on attachment 10043 [details]
Quick patch for 4.1

LGTM.
Comment 13 Jeremy Allison 2014-06-24 17:34:32 UTC
Re-assigning to Karolin for inclusion in 4.1.next.
Jeremy.
Comment 14 Karolin Seeger 2014-07-01 07:11:19 UTC
(In reply to comment #13)
> Re-assigning to Karolin for inclusion in 4.1.next.
> Jeremy.

Pushed to autobuild-v4-1-test.
Comment 15 Karolin Seeger 2014-07-17 18:10:18 UTC
Pushed to v4-1-test.
Closing out bug report.

Thanks!
Comment 16 Hemanth 2018-03-27 21:14:42 UTC
Jeremy/Volker,
We are using samba 4.3.11+ stack which has the actual changes for this issue. But we are still seeing the same issue at one of our customers.


(gdb) bt
#0  0x00007f41937335e5 in raise () from /lib64/libc.so.6
#1  0x00007f4193734dc5 in abort () from /lib64/libc.so.6
#2  0x00007f4194e258c3 in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f4194e137d1 in smb_panic_s3 (why=0x7f41973a551c "assert failed: ret") at ../source3/lib/util.c:812
#4  0x00007f41976acd55 in smb_panic (why=0x7f41973a551c "assert failed: ret") at ../lib/util/fault.c:166
#5  0x00007f41971d086c in defer_open_done (req=0x0) at ../source3/smbd/open.c:2298
#6  0x00007f419664e318 in _tevent_req_notify_callback (req=0x564467687a90, 
    location=0x7f4194e55d90 "../source3/lib/dbwrap/dbwrap_watch.c:389") at ../lib/tevent/tevent_req.c:112
#7  0x00007f419664e3ed in tevent_req_finish (req=0x564467687a90, state=TEVENT_REQ_DONE, 
    location=0x7f4194e55d90 "../source3/lib/dbwrap/dbwrap_watch.c:389") at ../lib/tevent/tevent_req.c:149
#8  0x00007f419664e414 in _tevent_req_done (req=0x564467687a90, location=0x7f4194e55d90 "../source3/lib/dbwrap/dbwrap_watch.c:389")
    at ../lib/tevent/tevent_req.c:155
#9  0x00007f4194e1c26d in dbwrap_record_watch_done (subreq=0x0) at ../source3/lib/dbwrap/dbwrap_watch.c:389


(gdb) p *state->xconn->smb2.requests->subreq
$4 = {async = {fn = 0, private_data = 0x0}, data = 0x5589eb25d4a0, private_print = 0,
 private_cancel = 0x7ffb8be01b2b <smbd_smb2_create_cancel>, private_cleanup = {fn = 0, state = TEVENT_REQ_INIT}, internal = {
   private_type = 0x7ffb8bf9386a "struct smbd_smb2_create_state", create_location = 0x7ffb8bf93848 "../source3/smbd/smb2_create.c:504",
   finish_location = 0x7ffb8bf94c08 "../source3/smbd/smb2_create.c:1622", cancel_location = 0x0, state = **TEVENT_REQ_USER_ERROR***,

finish_location actually maps to the setting "tevent_req_nterror(req, NT_STATUS_CANCELLED)". Which means that the request has been cancelled when it was in deferred queue. As per the bug fix, dbwrap_record_watch_done callback should have dropped the cancelled requests. Looks like its not the case here.