Bug 13098 - Spotlight search mdssvc coredumps
Summary: Spotlight search mdssvc coredumps
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: DCE-RPCs and pipes (show other bugs)
Version: 4.7.0
Hardware: x64 Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Ralph Böhme
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-25 09:05 UTC by Bart Meuris
Modified: 2017-11-07 07:56 UTC (History)
1 user (show)

See Also:


Attachments
mdssd coredump "external" (1013.72 KB, application/x-gzip)
2017-10-25 09:05 UTC, Bart Meuris
no flags Details
GDB mdssvc stacktrace (13.85 KB, text/plain)
2017-10-26 11:15 UTC, Bart Meuris
no flags Details
Possible patch for master (1.05 KB, patch)
2017-10-26 11:47 UTC, Ralph Böhme
no flags Details
GDB mdssvc stacktrace #2 (15.83 KB, text/plain)
2017-10-26 13:20 UTC, Bart Meuris
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bart Meuris 2017-10-25 09:05:29 UTC
Created attachment 13722 [details]
mdssd coredump "external"

I have compiled Samba 4.7 myself from the offical sources, with Spotlight search enabled. It runs in a Docker container on 3 Ubuntu 16.04 systems without much problems, but on a 4th it crashes, generating core dumps.

Reproducing the bug is tricky, it seems to be triggered by someone launching a search client-side. Sadly this is our only site not standardized on one single OSX version due to software constraints. The other sites all run OSX 10.12 Sierra, but this-one uses versions ranging from 10.8 to 10.13 (which could be related?).

When I configure the `rpc_server:mdssvc` to `external`, it only affects the client's search ability, but when I set it to `embedded`, it takes down the smbd instance too (which is normal I suppose), so currently it's set to `external`, for which I attached a coredump.

When it happens, I get traces like this:

samba_1  | Bad talloc magic value - wrong talloc version used/mixed
samba_1  | PANIC (pid 23177): Bad talloc magic value - wrong talloc version used/mixed
samba_1  | BACKTRACE: 36 stack frames:
samba_1  |  #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f7675b8333f]
samba_1  |  #1 /usr/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7f7675b83190]
samba_1  |  #2 /usr/lib/libsamba-util.so.0(smb_panic+0x28) [0x7f76782d42ff]
samba_1  |  #3 /usr/lib/samba/libtalloc.so.2(+0x251c) [0x7f76773eb51c]
samba_1  |  #4 /usr/lib/samba/libtalloc.so.2(+0x2536) [0x7f76773eb536]
samba_1  |  #5 /usr/lib/samba/libtalloc.so.2(+0x25c2) [0x7f76773eb5c2]
samba_1  |  #6 /usr/lib/samba/libtalloc.so.2(_talloc_free+0x36) [0x7f76773ed914]
samba_1  |  #7 /usr/lib/samba/libsmbd-base-samba4.so(+0x886ef) [0x7f7677c8e6ef]
samba_1  |  #8 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7f7676fd8f52]
samba_1  |  #9 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7f7676fd9027]
samba_1  |  #10 /usr/lib/samba/libtevent.so.0(+0x714f) [0x7f7676fd914f]
samba_1  |  #11 /usr/lib/samba/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7f7676fd833d]
samba_1  |  #12 /usr/lib/samba/libtevent.so.0(+0xf681) [0x7f7676fe1681]
samba_1  |  #13 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f7676fde417]
samba_1  |  #14 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f7676fd71c6]
samba_1  |  #15 /usr/sbin/smbd(+0x9d4d) [0x555bcc322d4d]
samba_1  |  #16 /usr/lib/libsmbconf.so.0(prefork_add_children+0x1a8) [0x7f7675ba249f]
samba_1  |  #17 /usr/lib/libsmbconf.so.0(pfh_manage_pool+0x13a) [0x7f7675ba3ad8]
samba_1  |  #18 /usr/sbin/smbd(+0xa49c) [0x555bcc32349c]
samba_1  |  #19 /usr/lib/libsmbconf.so.0(+0x532ea) [0x7f7675b942ea]
samba_1  |  #20 /usr/lib/libsmbconf.so.0(+0x5336c) [0x7f7675b9436c]
samba_1  |  #21 /usr/lib/libsmbconf.so.0(+0x511c1) [0x7f7675b921c1]
samba_1  |  #22 /usr/lib/samba/libmessages-dgm-samba4.so(+0x8bc9) [0x7f76733a4bc9]
samba_1  |  #23 /usr/lib/samba/libmessages-dgm-samba4.so(+0x76e0) [0x7f76733a36e0]
samba_1  |  #24 /usr/lib/samba/libmessages-dgm-samba4.so(+0x751a) [0x7f76733a351a]
samba_1  |  #25 /usr/lib/samba/libtevent.so.0(+0xf0e3) [0x7f7676fe10e3]
samba_1  |  #26 /usr/lib/samba/libtevent.so.0(+0xf71b) [0x7f7676fe171b]
samba_1  |  #27 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f7676fde417]
samba_1  |  #28 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f7676fd71c6]
samba_1  |  #29 /usr/lib/samba/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7f7676fd74dd]
samba_1  |  #30 /usr/lib/samba/libtevent.so.0(+0xc4b9) [0x7f7676fde4b9]
samba_1  |  #31 /usr/lib/samba/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7f7676fd7580]
samba_1  |  #32 /usr/sbin/smbd(start_mdssd+0x4a0) [0x555bcc323d5b]
samba_1  |  #33 /usr/sbin/smbd(main+0x168a) [0x555bcc32983c]
samba_1  |  #34 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f7674412830]
samba_1  |  #35 /usr/sbin/smbd(_start+0x29) [0x555bcc31f0c9]
samba_1  | coredump is handled by helper binary specified at /proc/sys/kernel/core_pattern


I can provide the compile flags if necessary, the most noticeable except for the spotlight-one are `--bundled-libraries=ALL` and `--with-static-modules=ALL` I think.
Comment 1 Ralph Böhme 2017-10-25 13:58:37 UTC
Hey, someone is actually using my baby! :)

Can you recompile with debug symbols (CFLAGS="-g -O0") so we get strack-backtraces with symbols?

Then also add "panic action = sleep 100000" to the global section of smb.conf. That way the crashed process stays alive so you can attach with a debugger:

# gdb -p PID
gdb> bt full
...

Fwiw, corefiles can only be analysed on systems that match exactly the originating host.
Comment 2 Bart Meuris 2017-10-25 15:03:44 UTC
Ah that makes it a bit tricky, since these are actually running in production right now (the tests on our pilot site went without a hickup), and our Samba instance is running in a docker container. On the target system there's also no GCC or GDB installed...

I'll see what I can do though, I've been trying to reproduce it locally on a test-vm and on our test-setup, without much success though. I can try building a debug container though, I'll get back to you.
Comment 3 Bart Meuris 2017-10-26 11:15:09 UTC
Created attachment 13724 [details]
GDB mdssvc stacktrace
Comment 4 Bart Meuris 2017-10-26 11:29:23 UTC
Ok, I've been able to simulate this crash on my test environment. I got the following trace:

samba_1  | BACKTRACE: 36 stack frames:
samba_1  |  #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7f4b7b2f833f]
samba_1  |  #1 /usr/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7f4b7b2f8190]
samba_1  |  #2 /usr/lib/libsamba-util.so.0(smb_panic+0x28) [0x7f4b7da492ff]
samba_1  |  #3 /usr/lib/samba/libtalloc.so.2(+0x251c) [0x7f4b7cb6051c]
samba_1  |  #4 /usr/lib/samba/libtalloc.so.2(+0x2536) [0x7f4b7cb60536]
samba_1  |  #5 /usr/lib/samba/libtalloc.so.2(+0x25c2) [0x7f4b7cb605c2]
samba_1  |  #6 /usr/lib/samba/libtalloc.so.2(_talloc_free+0x36) [0x7f4b7cb62914]
samba_1  |  #7 /usr/lib/samba/libsmbd-base-samba4.so(+0x886ef) [0x7f4b7d4036ef]
samba_1  |  #8 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7f4b7c74df52]
samba_1  |  #9 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7f4b7c74e027]
samba_1  |  #10 /usr/lib/samba/libtevent.so.0(+0x714f) [0x7f4b7c74e14f]
samba_1  |  #11 /usr/lib/samba/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7f4b7c74d33d]
samba_1  |  #12 /usr/lib/samba/libtevent.so.0(+0xf681) [0x7f4b7c756681]
samba_1  |  #13 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f4b7c753417]
samba_1  |  #14 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f4b7c74c1c6]
samba_1  |  #15 /usr/sbin/smbd(+0x9d4d) [0x558bdc805d4d]
samba_1  |  #16 /usr/lib/libsmbconf.so.0(prefork_add_children+0x1a8) [0x7f4b7b31749f]
samba_1  |  #17 /usr/lib/libsmbconf.so.0(pfh_manage_pool+0x13a) [0x7f4b7b318ad8]
samba_1  |  #18 /usr/sbin/smbd(+0xa49c) [0x558bdc80649c]
samba_1  |  #19 /usr/lib/libsmbconf.so.0(+0x532ea) [0x7f4b7b3092ea]
samba_1  |  #20 /usr/lib/libsmbconf.so.0(+0x5336c) [0x7f4b7b30936c]
samba_1  |  #21 /usr/lib/libsmbconf.so.0(+0x511c1) [0x7f4b7b3071c1]
samba_1  |  #22 /usr/lib/samba/libmessages-dgm-samba4.so(+0x8bc9) [0x7f4b78b19bc9]
samba_1  |  #23 /usr/lib/samba/libmessages-dgm-samba4.so(+0x76e0) [0x7f4b78b186e0]
samba_1  |  #24 /usr/lib/samba/libmessages-dgm-samba4.so(+0x751a) [0x7f4b78b1851a]
samba_1  |  #25 /usr/lib/samba/libtevent.so.0(+0xf0e3) [0x7f4b7c7560e3]
samba_1  |  #26 /usr/lib/samba/libtevent.so.0(+0xf71b) [0x7f4b7c75671b]
samba_1  |  #27 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7f4b7c753417]
samba_1  |  #28 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7f4b7c74c1c6]
samba_1  |  #29 /usr/lib/samba/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7f4b7c74c4dd]
samba_1  |  #30 /usr/lib/samba/libtevent.so.0(+0xc4b9) [0x7f4b7c7534b9]
samba_1  |  #31 /usr/lib/samba/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7f4b7c74c580]
samba_1  |  #32 /usr/sbin/smbd(start_mdssd+0x4a0) [0x558bdc806d5b]
samba_1  |  #33 /usr/sbin/smbd(main+0x168a) [0x558bdc80c83c]
samba_1  |  #34 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f4b79b87830]
samba_1  |  #35 /usr/sbin/smbd(_start+0x29) [0x558bdc8020c9]
samba_1  | smb_panic(): calling panic action [sleep 100000]


The full gdb stacktrace is attached.

It seems we can trigger it by doing an advanced spotlight search with multiple search conditions combined. You can do this by searching something in finder, and then clicking the "+" button and add another condition.

We also from time to time get things like this, the clients hangs at that point (when searching for something like "250 10" - without quotes)

samba_1  | Tracker query error: GDBus.Error:org.freedesktop.Tracker1.SparqlError.Internal: Operation was cancelled
samba_1  | query in error state
samba_1  | bad context: [0x4009,0x6b000080]
Comment 5 Ralph Böhme 2017-10-26 11:47:09 UTC
Created attachment 13725 [details]
Possible patch for master

Can you try the attached patch? Disclaimer: I'm not 100% sure this is correct, so ideally test this on a test system...
Comment 6 Bart Meuris 2017-10-26 13:20:10 UTC
Created attachment 13728 [details]
GDB mdssvc stacktrace #2

Got another crash, I can simulate this now reliably on my local test-vm.

samba_1  | BACKTRACE: 62 stack frames:
samba_1  |  #0 /usr/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7fda5792233f]
samba_1  |  #1 /usr/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7fda57922190]
samba_1  |  #2 /usr/lib/libsamba-util.so.0(smb_panic+0x28) [0x7fda5a0732ff]
samba_1  |  #3 /usr/lib/samba/libtalloc.so.2(+0x251c) [0x7fda5918a51c]
samba_1  |  #4 /usr/lib/samba/libtalloc.so.2(+0x2536) [0x7fda5918a536]
samba_1  |  #5 /usr/lib/samba/libtalloc.so.2(+0x25c2) [0x7fda5918a5c2]
samba_1  |  #6 /usr/lib/samba/libtalloc.so.2(+0x4210) [0x7fda5918c210]
samba_1  |  #7 /usr/lib/samba/libtalloc.so.2(_talloc_get_type_abort+0x4c) [0x7fda5918c3ab]
samba_1  |  #8 /usr/lib/samba/libsmbd-base-samba4.so(+0x887af) [0x7fda59a2d7af]
samba_1  |  #9 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(g_simple_async_result_complete+0x87) [0x7fda4f057457]
samba_1  |  #10 /usr/lib/x86_64-linux-gnu/libtracker-sparql-1.0.so.0(+0x7bdd) [0x7fda4f370bdd]
samba_1  |  #11 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(g_simple_async_result_complete+0x87) [0x7fda4f057457]
samba_1  |  #12 /usr/lib/x86_64-linux-gnu/libtracker-sparql-1.0.so.0(+0x11b75) [0x7fda4f37ab75]
samba_1  |  #13 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43]
samba_1  |  #14 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x881ee) [0x7fda4f0691ee]
samba_1  |  #15 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x6e17b) [0x7fda4f04f17b]
samba_1  |  #16 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43]
samba_1  |  #17 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x881ee) [0x7fda4f0691ee]
samba_1  |  #18 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x6c89d) [0x7fda4f04d89d]
samba_1  |  #19 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43]
samba_1  |  #20 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x881ee) [0x7fda4f0691ee]
samba_1  |  #21 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x6b9f3) [0x7fda4f04c9f3]
samba_1  |  #22 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b43) [0x7fda4f068b43]
samba_1  |  #23 /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0(+0x87b79) [0x7fda4f068b79]
samba_1  |  #24 /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_context_dispatch+0x15a) [0x7fda4eac704a]
samba_1  |  #25 /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x4a3f0) [0x7fda4eac73f0]
samba_1  |  #26 /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_loop_run+0xc2) [0x7fda4eac7712]
samba_1  |  #27 /usr/lib/samba/libsmbd-base-samba4.so(+0x8be86) [0x7fda59a30e86]
samba_1  |  #28 /usr/lib/samba/libsmbd-base-samba4.so(mds_dispatch+0x214) [0x7fda59a310bd]
samba_1  |  #29 /usr/lib/samba/libsmbd-base-samba4.so(_mdssvc_cmd+0x40c) [0x7fda59a37160]
samba_1  |  #30 /usr/lib/samba/libsmbd-base-samba4.so(+0x92c56) [0x7fda59a37c56]
samba_1  |  #31 /usr/lib/samba/libsmbd-base-samba4.so(+0x7d0b0) [0x7fda59a220b0]
samba_1  |  #32 /usr/lib/samba/libsmbd-base-samba4.so(+0x7cc48) [0x7fda59a21c48]
samba_1  |  #33 /usr/lib/samba/libsmbd-base-samba4.so(+0x7da43) [0x7fda59a22a43]
samba_1  |  #34 /usr/lib/samba/libsmbd-base-samba4.so(process_complete_pdu+0xde) [0x7fda59a22b37]
samba_1  |  #35 /usr/lib/samba/libsmbd-base-samba4.so(named_pipe_packet_process+0x195) [0x7fda59a4305e]
samba_1  |  #36 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52]
samba_1  |  #37 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027]
samba_1  |  #38 /usr/lib/samba/libtevent.so.0(_tevent_req_done+0x25) [0x7fda58d7804f]
samba_1  |  #39 /usr/lib/libdcerpc-binding.so.0(+0x1d3fb) [0x7fda50cab3fb]
samba_1  |  #40 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52]
samba_1  |  #41 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027]
samba_1  |  #42 /usr/lib/samba/libtevent.so.0(_tevent_req_done+0x25) [0x7fda58d7804f]
samba_1  |  #43 /usr/lib/samba/libsamba-sockets-samba4.so(+0xccbf) [0x7fda574cecbf]
samba_1  |  #44 /usr/lib/samba/libsamba-sockets-samba4.so(+0xceea) [0x7fda574ceeea]
samba_1  |  #45 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52]
samba_1  |  #46 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027]
samba_1  |  #47 /usr/lib/samba/libtevent.so.0(_tevent_req_done+0x25) [0x7fda58d7804f]
samba_1  |  #48 /usr/lib/samba/libsamba-sockets-samba4.so(+0xc210) [0x7fda574ce210]
samba_1  |  #49 /usr/lib/samba/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7fda58d77f52]
samba_1  |  #50 /usr/lib/samba/libtevent.so.0(+0x7027) [0x7fda58d78027]
samba_1  |  #51 /usr/lib/samba/libtevent.so.0(+0x714f) [0x7fda58d7814f]
samba_1  |  #52 /usr/lib/samba/libtevent.so.0(tevent_common_loop_immediate+0x1f5) [0x7fda58d7733d]
samba_1  |  #53 /usr/lib/samba/libtevent.so.0(+0xf681) [0x7fda58d80681]
samba_1  |  #54 /usr/lib/samba/libtevent.so.0(+0xc417) [0x7fda58d7d417]
samba_1  |  #55 /usr/lib/samba/libtevent.so.0(_tevent_loop_once+0x10f) [0x7fda58d761c6]
samba_1  |  #56 /usr/sbin/smbd(+0x9d4d) [0x558f64c45d4d]
samba_1  |  #57 /usr/lib/libsmbconf.so.0(prefork_create_pool+0x3d9) [0x7fda579410d5]
samba_1  |  #58 /usr/sbin/smbd(start_mdssd+0x308) [0x558f64c46bc3]
samba_1  |  #59 /usr/sbin/smbd(main+0x168a) [0x558f64c4c83c]
samba_1  |  #60 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fda561b1830]
samba_1  |  #61 /usr/sbin/smbd(_start+0x29) [0x558f64c420c9]
samba_1  | smb_panic(): calling panic action [sleep 100000]

To simulate this I mirrored the production share with only the filenames and empty files (using `cp -r --attributes-only`) - which somehow seems to affect this. 

Some background: This is migrated from what was originally a Mac server on that site, to now a Linux/Samba system on ZFS. It's a rather large fileshare with over 350000 files, with names created by Mac users, meaning they're an absolute mess (containing various weird unicode characters, ':', '&', '?', starting with spaces, ending with spaces or dots, ...).
Comment 7 Ralph Böhme 2017-10-26 13:37:38 UTC
(In reply to Bart Meuris from comment #6)
Oh, this looks like a different crash. Is that with or without the proposed patch?
Comment 8 Bart Meuris 2017-10-26 14:30:59 UTC
That is with the patch applied, sorry, should have mentioned that.
Comment 9 Ralph Böhme 2017-10-26 17:54:00 UTC
Hm, at least a different crash.
Comment 10 Bart Meuris 2017-11-06 16:44:24 UTC
Do you need additional information for this?

If necessary, I can setup a test environment to simulate this behavior on a linode box and give you access.
Comment 11 Ralph Böhme 2017-11-06 16:59:52 UTC
(In reply to Bart Meuris from comment #10)
No, thanks. Analyzing this would take quite some time, probably several days and I don't have that atm.