Created attachment 13722 [details] mdssd coredump "external" I have compiled Samba 4.7 myself from the offical sources, with Spotlight search enabled. It runs in a Docker container on 3 Ubuntu 16.04 systems without much problems, but on a 4th it crashes, generating core dumps. Reproducing the bug is tricky, it seems to be triggered by someone launching a search client-side. Sadly this is our only site not standardized on one single OSX version due to software constraints. The other sites all run OSX 10.12 Sierra, but this-one uses versions ranging from 10.8 to 10.13 (which could be related?). When I configure the `rpc_server:mdssvc` to `external`, it only affects the client's search ability, but when I set it to `embedded`, it takes down the smbd instance too (which is normal I suppose), so currently it's set to `external`, for which I attached a coredump. When it happens, I get traces like this: samba_1 | Bad talloc magic value - wrong talloc version used/mixed samba_1 | PANIC (pid 23177): Bad talloc magic value - wrong talloc version used/mixed samba_1 | BACKTRACE: 36 stack frames: samba_1 | #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f7675b8333f] samba_1 | #1 /usr/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7f7675b83190] samba_1 | #2 /usr/lib/libsamba-util.so.0(smb_panic+0x28) [0x7f76782d42ff] samba_1 | #3 /usr/lib/samba/libtalloc.so.2(+0x251c) [0x7f76773eb51c] samba_1 | #4 /usr/lib/samba/libtalloc.so.2(+0x2536) [0x7f76773eb536] samba_1 | #5 /usr/lib/samba/libtalloc.so.2(+0x25c2) [0x7f76773eb5c2] samba_1 | #6 /usr/lib/samba/libtalloc.so.2(_talloc_free+0x36) [0x7f76773ed914] samba_1 | #7 /usr/lib/samba/libsmbd-base-samba4.so(+0x886ef) [0x7f7677c8e6ef] samba_1 | #8 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7f7676fd8f52] samba_1 | #9 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7f7676fd9027] samba_1 | #10 /usr/lib/samba/libtevent.so.0(+0x714f) [0x7f7676fd914f] samba_1 | #11 /usr/lib/samba/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7f7676fd833d] samba_1 | #12 /usr/lib/samba/libtevent.so.0(+0xf681) [0x7f7676fe1681] samba_1 | #13 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f7676fde417] samba_1 | #14 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f7676fd71c6] samba_1 | #15 /usr/sbin/smbd(+0x9d4d) [0x555bcc322d4d] samba_1 | #16 /usr/lib/libsmbconf.so.0(prefork_add_children+0x1a8) [0x7f7675ba249f] samba_1 | #17 /usr/lib/libsmbconf.so.0(pfh_manage_pool+0x13a) [0x7f7675ba3ad8] samba_1 | #18 /usr/sbin/smbd(+0xa49c) [0x555bcc32349c] samba_1 | #19 /usr/lib/libsmbconf.so.0(+0x532ea) [0x7f7675b942ea] samba_1 | #20 /usr/lib/libsmbconf.so.0(+0x5336c) [0x7f7675b9436c] samba_1 | #21 /usr/lib/libsmbconf.so.0(+0x511c1) [0x7f7675b921c1] samba_1 | #22 /usr/lib/samba/libmessages-dgm-samba4.so(+0x8bc9) [0x7f76733a4bc9] samba_1 | #23 /usr/lib/samba/libmessages-dgm-samba4.so(+0x76e0) [0x7f76733a36e0] samba_1 | #24 /usr/lib/samba/libmessages-dgm-samba4.so(+0x751a) [0x7f76733a351a] samba_1 | #25 /usr/lib/samba/libtevent.so.0(+0xf0e3) [0x7f7676fe10e3] samba_1 | #26 /usr/lib/samba/libtevent.so.0(+0xf71b) [0x7f7676fe171b] samba_1 | #27 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f7676fde417] samba_1 | #28 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f7676fd71c6] samba_1 | #29 /usr/lib/samba/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7f7676fd74dd] samba_1 | #30 /usr/lib/samba/libtevent.so.0(+0xc4b9) [0x7f7676fde4b9] samba_1 | #31 /usr/lib/samba/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7f7676fd7580] samba_1 | #32 /usr/sbin/smbd(start_mdssd+0x4a0) [0x555bcc323d5b] samba_1 | #33 /usr/sbin/smbd(main+0x168a) [0x555bcc32983c] samba_1 | #34 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f7674412830] samba_1 | #35 /usr/sbin/smbd(_start+0x29) [0x555bcc31f0c9] samba_1 | coredump is handled by helper binary specified at /proc/sys/kernel/core_pattern I can provide the compile flags if necessary, the most noticeable except for the spotlight-one are `--bundled-libraries=ALL` and `--with-static-modules=ALL` I think.
Hey, someone is actually using my baby! :) Can you recompile with debug symbols (CFLAGS="-g -O0") so we get strack-backtraces with symbols? Then also add "panic action = sleep 100000" to the global section of smb.conf. That way the crashed process stays alive so you can attach with a debugger: # gdb -p PID gdb> bt full ... Fwiw, corefiles can only be analysed on systems that match exactly the originating host.
Ah that makes it a bit tricky, since these are actually running in production right now (the tests on our pilot site went without a hickup), and our Samba instance is running in a docker container. On the target system there's also no GCC or GDB installed... I'll see what I can do though, I've been trying to reproduce it locally on a test-vm and on our test-setup, without much success though. I can try building a debug container though, I'll get back to you.
Created attachment 13724 [details] GDB mdssvc stacktrace
Ok, I've been able to simulate this crash on my test environment. I got the following trace: samba_1 | BACKTRACE: 36 stack frames: samba_1 | #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f4b7b2f833f] samba_1 | #1 /usr/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7f4b7b2f8190] samba_1 | #2 /usr/lib/libsamba-util.so.0(smb_panic+0x28) [0x7f4b7da492ff] samba_1 | #3 /usr/lib/samba/libtalloc.so.2(+0x251c) [0x7f4b7cb6051c] samba_1 | #4 /usr/lib/samba/libtalloc.so.2(+0x2536) [0x7f4b7cb60536] samba_1 | #5 /usr/lib/samba/libtalloc.so.2(+0x25c2) [0x7f4b7cb605c2] samba_1 | #6 /usr/lib/samba/libtalloc.so.2(_talloc_free+0x36) [0x7f4b7cb62914] samba_1 | #7 /usr/lib/samba/libsmbd-base-samba4.so(+0x886ef) [0x7f4b7d4036ef] samba_1 | #8 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7f4b7c74df52] samba_1 | #9 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7f4b7c74e027] samba_1 | #10 /usr/lib/samba/libtevent.so.0(+0x714f) [0x7f4b7c74e14f] samba_1 | #11 /usr/lib/samba/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7f4b7c74d33d] samba_1 | #12 /usr/lib/samba/libtevent.so.0(+0xf681) [0x7f4b7c756681] samba_1 | #13 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f4b7c753417] samba_1 | #14 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f4b7c74c1c6] samba_1 | #15 /usr/sbin/smbd(+0x9d4d) [0x558bdc805d4d] samba_1 | #16 /usr/lib/libsmbconf.so.0(prefork_add_children+0x1a8) [0x7f4b7b31749f] samba_1 | #17 /usr/lib/libsmbconf.so.0(pfh_manage_pool+0x13a) [0x7f4b7b318ad8] samba_1 | #18 /usr/sbin/smbd(+0xa49c) [0x558bdc80649c] samba_1 | #19 /usr/lib/libsmbconf.so.0(+0x532ea) [0x7f4b7b3092ea] samba_1 | #20 /usr/lib/libsmbconf.so.0(+0x5336c) [0x7f4b7b30936c] samba_1 | #21 /usr/lib/libsmbconf.so.0(+0x511c1) [0x7f4b7b3071c1] samba_1 | #22 /usr/lib/samba/libmessages-dgm-samba4.so(+0x8bc9) [0x7f4b78b19bc9] samba_1 | #23 /usr/lib/samba/libmessages-dgm-samba4.so(+0x76e0) [0x7f4b78b186e0] samba_1 | #24 /usr/lib/samba/libmessages-dgm-samba4.so(+0x751a) [0x7f4b78b1851a] samba_1 | #25 /usr/lib/samba/libtevent.so.0(+0xf0e3) [0x7f4b7c7560e3] samba_1 | #26 /usr/lib/samba/libtevent.so.0(+0xf71b) [0x7f4b7c75671b] samba_1 | #27 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f4b7c753417] samba_1 | #28 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f4b7c74c1c6] samba_1 | #29 /usr/lib/samba/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7f4b7c74c4dd] samba_1 | #30 /usr/lib/samba/libtevent.so.0(+0xc4b9) [0x7f4b7c7534b9] samba_1 | #31 /usr/lib/samba/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7f4b7c74c580] samba_1 | #32 /usr/sbin/smbd(start_mdssd+0x4a0) [0x558bdc806d5b] samba_1 | #33 /usr/sbin/smbd(main+0x168a) [0x558bdc80c83c] samba_1 | #34 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f4b79b87830] samba_1 | #35 /usr/sbin/smbd(_start+0x29) [0x558bdc8020c9] samba_1 | smb_panic(): calling panic action [sleep 100000] The full gdb stacktrace is attached. It seems we can trigger it by doing an advanced spotlight search with multiple search conditions combined. You can do this by searching something in finder, and then clicking the "+" button and add another condition. We also from time to time get things like this, the clients hangs at that point (when searching for something like "250 10" - without quotes) samba_1 | Tracker query error: GDBus.Error:org.freedesktop.Tracker1.SparqlError.Internal: Operation was cancelled samba_1 | query in error state samba_1 | bad context: [0x4009,0x6b000080]
Created attachment 13725 [details] Possible patch for master Can you try the attached patch? Disclaimer: I'm not 100% sure this is correct, so ideally test this on a test system...
Created attachment 13728 [details] GDB mdssvc stacktrace #2 Got another crash, I can simulate this now reliably on my local test-vm. samba_1 | BACKTRACE: 62 stack frames: samba_1 | #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7fda5792233f] samba_1 | #1 /usr/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7fda57922190] samba_1 | #2 /usr/lib/libsamba-util.so.0(smb_panic+0x28) [0x7fda5a0732ff] samba_1 | #3 /usr/lib/samba/libtalloc.so.2(+0x251c) [0x7fda5918a51c] samba_1 | #4 /usr/lib/samba/libtalloc.so.2(+0x2536) [0x7fda5918a536] samba_1 | #5 /usr/lib/samba/libtalloc.so.2(+0x25c2) [0x7fda5918a5c2] samba_1 | #6 /usr/lib/samba/libtalloc.so.2(+0x4210) [0x7fda5918c210] samba_1 | #7 /usr/lib/samba/libtalloc.so.2(_talloc_get_type_abort+0x4c) [0x7fda5918c3ab] samba_1 | #8 /usr/lib/samba/libsmbd-base-samba4.so(+0x887af) [0x7fda59a2d7af] samba_1 | #9 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(g_simple_async_result_complete+0x87) [0x7fda4f057457] samba_1 | #10 /usr/lib/x86_64-linux-gnu/libtracker-sparql-1.0.so.0(+0x7bdd) [0x7fda4f370bdd] samba_1 | #11 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(g_simple_async_result_complete+0x87) [0x7fda4f057457] samba_1 | #12 /usr/lib/x86_64-linux-gnu/libtracker-sparql-1.0.so.0(+0x11b75) [0x7fda4f37ab75] samba_1 | #13 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43] samba_1 | #14 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x881ee) [0x7fda4f0691ee] samba_1 | #15 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x6e17b) [0x7fda4f04f17b] samba_1 | #16 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43] samba_1 | #17 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x881ee) [0x7fda4f0691ee] samba_1 | #18 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x6c89d) [0x7fda4f04d89d] samba_1 | #19 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43] samba_1 | #20 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x881ee) [0x7fda4f0691ee] samba_1 | #21 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x6b9f3) [0x7fda4f04c9f3] samba_1 | #22 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43] samba_1 | #23 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b79) [0x7fda4f068b79] samba_1 | #24 /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_context_dispatch+0x15a) [0x7fda4eac704a] samba_1 | #25 /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x4a3f0) [0x7fda4eac73f0] samba_1 | #26 /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_loop_run+0xc2) [0x7fda4eac7712] samba_1 | #27 /usr/lib/samba/libsmbd-base-samba4.so(+0x8be86) [0x7fda59a30e86] samba_1 | #28 /usr/lib/samba/libsmbd-base-samba4.so(mds_dispatch+0x214) [0x7fda59a310bd] samba_1 | #29 /usr/lib/samba/libsmbd-base-samba4.so(_mdssvc_cmd+0x40c) [0x7fda59a37160] samba_1 | #30 /usr/lib/samba/libsmbd-base-samba4.so(+0x92c56) [0x7fda59a37c56] samba_1 | #31 /usr/lib/samba/libsmbd-base-samba4.so(+0x7d0b0) [0x7fda59a220b0] samba_1 | #32 /usr/lib/samba/libsmbd-base-samba4.so(+0x7cc48) [0x7fda59a21c48] samba_1 | #33 /usr/lib/samba/libsmbd-base-samba4.so(+0x7da43) [0x7fda59a22a43] samba_1 | #34 /usr/lib/samba/libsmbd-base-samba4.so(process_complete_pdu+0xde) [0x7fda59a22b37] samba_1 | #35 /usr/lib/samba/libsmbd-base-samba4.so(named_pipe_packet_process+0x195) [0x7fda59a4305e] samba_1 | #36 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52] samba_1 | #37 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027] samba_1 | #38 /usr/lib/samba/libtevent.so.0(_tevent_req_done+0x25) [0x7fda58d7804f] samba_1 | #39 /usr/lib/libdcerpc-binding.so.0(+0x1d3fb) [0x7fda50cab3fb] samba_1 | #40 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52] samba_1 | #41 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027] samba_1 | #42 /usr/lib/samba/libtevent.so.0(_tevent_req_done+0x25) [0x7fda58d7804f] samba_1 | #43 /usr/lib/samba/libsamba-sockets-samba4.so(+0xccbf) [0x7fda574cecbf] samba_1 | #44 /usr/lib/samba/libsamba-sockets-samba4.so(+0xceea) [0x7fda574ceeea] samba_1 | #45 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52] samba_1 | #46 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027] samba_1 | #47 /usr/lib/samba/libtevent.so.0(_tevent_req_done+0x25) [0x7fda58d7804f] samba_1 | #48 /usr/lib/samba/libsamba-sockets-samba4.so(+0xc210) [0x7fda574ce210] samba_1 | #49 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52] samba_1 | #50 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027] samba_1 | #51 /usr/lib/samba/libtevent.so.0(+0x714f) [0x7fda58d7814f] samba_1 | #52 /usr/lib/samba/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7fda58d7733d] samba_1 | #53 /usr/lib/samba/libtevent.so.0(+0xf681) [0x7fda58d80681] samba_1 | #54 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7fda58d7d417] samba_1 | #55 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7fda58d761c6] samba_1 | #56 /usr/sbin/smbd(+0x9d4d) [0x558f64c45d4d] samba_1 | #57 /usr/lib/libsmbconf.so.0(prefork_create_pool+0x3d9) [0x7fda579410d5] samba_1 | #58 /usr/sbin/smbd(start_mdssd+0x308) [0x558f64c46bc3] samba_1 | #59 /usr/sbin/smbd(main+0x168a) [0x558f64c4c83c] samba_1 | #60 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fda561b1830] samba_1 | #61 /usr/sbin/smbd(_start+0x29) [0x558f64c420c9] samba_1 | smb_panic(): calling panic action [sleep 100000] To simulate this I mirrored the production share with only the filenames and empty files (using `cp -r --attributes-only`) - which somehow seems to affect this. Some background: This is migrated from what was originally a Mac server on that site, to now a Linux/Samba system on ZFS. It's a rather large fileshare with over 350000 files, with names created by Mac users, meaning they're an absolute mess (containing various weird unicode characters, ':', '&', '?', starting with spaces, ending with spaces or dots, ...).
(In reply to Bart Meuris from comment #6) Oh, this looks like a different crash. Is that with or without the proposed patch?
That is with the patch applied, sorry, should have mentioned that.
Hm, at least a different crash.
Do you need additional information for this? If necessary, I can setup a test environment to simulate this behavior on a linode box and give you access.
(In reply to Bart Meuris from comment #10) No, thanks. Analyzing this would take quite some time, probably several days and I don't have that atm.