Created attachment 16748 [details] ZSTD-compressed core dump from smbd 4.16.4 in Arch container I can reproduce a core dump upon new authenticated connections from a MacOS Big Sur 11.5.2 client to a Fedora 34 or Arch host with Samba 4.16.4. Repro still exists with F35 (rawhide) and 4.15.0rc2. The issue only appears when variable substitutions are used in the share paths. I was not able to repro on F33 nor Ubuntu 20.04, both using older 4.13.x samba versions. In my case, the crash disrupts the ability to configure a Time Machine destination or perform a backup to the Samba host, and causes dead share mounts on the MacOS client due to connection resets as smbd restarts itself after its core dump. Stack trace in server logs: Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181474, 0] ../../source3/smbd/msdfs.c:360(create_conn_struct_as_root) Aug 23 11:23:53 localhost smbd[1516824]: create_conn_struct_as_root: Failed to canonicalize sharepath Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181704, 0] ../../source3/lib/popt_common.c:68(popt_s3_talloc_log_fn) Aug 23 11:23:53 localhost smbd[1516824]: Bad talloc magic value - unknown value Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181745, 0] ../../lib/util/fault.c:172(smb_panic_log) Aug 23 11:23:53 localhost smbd[1516824]: =============================================================== Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181779, 0] ../../lib/util/fault.c:173(smb_panic_log) Aug 23 11:23:53 localhost smbd[1516824]: INTERNAL ERROR: Bad talloc magic value - unknown value in pid 1516824 (4.14.6) Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181813, 0] ../../lib/util/fault.c:177(smb_panic_log) Aug 23 11:23:53 localhost smbd[1516824]: If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181846, 0] ../../lib/util/fault.c:182(smb_panic_log) Aug 23 11:23:53 localhost smbd[1516824]: =============================================================== Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.181877, 0] ../../lib/util/fault.c:183(smb_panic_log) Aug 23 11:23:53 localhost smbd[1516824]: PANIC (pid 1516824): Bad talloc magic value - unknown value in 4.14.6 Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.182621, 0] ../../lib/util/fault.c:287(log_stack_trace) Aug 23 11:23:53 localhost smbd[1516824]: BACKTRACE: 29 stack frames: Aug 23 11:23:53 localhost smbd[1516824]: #0 /lib64/libsamba-util.so.0(log_stack_trace+0x34) [0x7feac1123804] Aug 23 11:23:53 localhost smbd[1516824]: #1 /lib64/libsamba-util.so.0(smb_panic+0xd) [0x7feac1123a5d] Aug 23 11:23:53 localhost smbd[1516824]: #2 /lib64/libtalloc.so.2(+0x3712) [0x7feac0a37712] Aug 23 11:23:53 localhost smbd[1516824]: #3 /usr/lib64/samba/libsmbd-base-samba4.so(create_conn_struct_cwd+0x89) [0x7feac0eeb9f9] Aug 23 11:23:53 localhost smbd[1516824]: #4 /usr/lib64/samba/libsmbd-base-samba4.so(mds_init_ctx+0x1c8) [0x7feac0f6c0b8] Aug 23 11:23:53 localhost smbd[1516824]: #5 /usr/lib64/samba/libsmbd-base-samba4.so(_mdssvc_open+0x11d) [0x7feac0f6cebd] Aug 23 11:23:53 localhost smbd[1516824]: #6 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1d8b3f) [0x7feac0f6db3f] Aug 23 11:23:53 localhost smbd[1516824]: #7 /lib64/libdcerpc-server-core.so.0(+0xafa8) [0x7feac0cd6fa8] Aug 23 11:23:53 localhost smbd[1516824]: #8 /lib64/libdcerpc-binding.so.0(+0x1167f) [0x7feac0a7067f] Aug 23 11:23:53 localhost smbd[1516824]: #9 /usr/lib64/samba/libsamba-sockets-samba4.so(+0xe04b) [0x7feabfeb004b] Aug 23 11:23:53 localhost smbd[1516824]: #10 /usr/lib64/samba/libsamba-sockets-samba4.so(+0x652e) [0x7feabfea852e] Aug 23 11:23:53 localhost smbd[1516824]: #11 /lib64/libtevent.so.0(tevent_common_invoke_immediate_handler+0x192) [0x7feac0a19742] Aug 23 11:23:53 localhost smbd[1516824]: #12 /lib64/libtevent.so.0(tevent_common_loop_immediate+0x1e) [0x7feac0a1976e] Aug 23 11:23:53 localhost smbd[1516824]: #13 /lib64/libtevent.so.0(+0xe000) [0x7feac0a1d000] Aug 23 11:23:53 localhost smbd[1516824]: #14 /lib64/libtevent.so.0(+0x669b) [0x7feac0a1569b] Aug 23 11:23:53 localhost smbd[1516824]: #15 /lib64/libtevent.so.0(_tevent_loop_once+0x98) [0x7feac0a17da8] Aug 23 11:23:53 localhost smbd[1516824]: #16 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7feac0a17e9b] Aug 23 11:23:53 localhost smbd[1516824]: #17 /lib64/libtevent.so.0(+0x670b) [0x7feac0a1570b] Aug 23 11:23:53 localhost smbd[1516824]: #18 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_process+0x840) [0x7feac0ee6920] Aug 23 11:23:53 localhost smbd[1516824]: #19 /usr/sbin/smbd(+0xcbcd) [0x55796c65dbcd] Aug 23 11:23:53 localhost smbd[1516824]: #20 /lib64/libtevent.so.0(tevent_common_invoke_fd_handler+0x95) [0x7feac0a194f5] Aug 23 11:23:53 localhost smbd[1516824]: #21 /lib64/libtevent.so.0(+0xe21f) [0x7feac0a1d21f] Aug 23 11:23:53 localhost smbd[1516824]: #22 /lib64/libtevent.so.0(+0x669b) [0x7feac0a1569b] Aug 23 11:23:53 localhost smbd[1516824]: #23 /lib64/libtevent.so.0(_tevent_loop_once+0x98) [0x7feac0a17da8] Aug 23 11:23:53 localhost smbd[1516824]: #24 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7feac0a17e9b] Aug 23 11:23:53 localhost smbd[1516824]: #25 /lib64/libtevent.so.0(+0x670b) [0x7feac0a1570b] Aug 23 11:23:53 localhost smbd[1516824]: #26 /usr/sbin/smbd(main+0x1e1d) [0x55796c65a79d] Aug 23 11:23:53 localhost smbd[1516824]: #27 /lib64/libc.so.6(__libc_start_main+0xd5) [0x7feac070ab75] Aug 23 11:23:53 localhost smbd[1516824]: #28 /usr/sbin/smbd(_start+0x2e) [0x55796c65a8fe] Aug 23 11:23:53 localhost smbd[1516824]: [2021/08/23 11:23:53.183093, 0] ../../source3/lib/dumpcore.c:317(dump_core) Aug 23 11:23:53 localhost smbd[1516824]: coredump is handled by helper binary specified at /proc/sys/kernel/core_pattern Example share configurations that trigger the issue: # core dump upon connection from MacOS Big Sur [homes] comment = Networked home for %u path = %H writable = yes browsable = no read only = no map archive = yes # core dump upon connection from MacOS Big Sur # works fine if I replace '%u' for username literal [backups-timemachine] comment = Time Machine Backups path = /data/backups/%u/timemachine browsable = yes writable = yes valid users = @shared create mask = 0600 directory mask = 0700 spotlight = no vfs objects = acl_xattr catia fruit streams_xattr fruit:time machine = yes fruit:time machine max size = 1T fruit:nfs_aces = no Example share configuration without substitutions that works as expected: [shared] comment = Shared files path = /data/shared browsable = yes writable = yes Repro steps: # On Linux host docker run --name f34 -it -p 4450:445 --entrypoint=/bin/bash fedora:34 dnf install -y samba useradd foo smbpasswd -a foo cat << EOF >> /etc/samba/smb.conf [backups] comment = User Data Directories path = /data/backups/%u browseable = Yes read only = No inherit acls = Yes EOF mkdir -p /data/backups/foo /usr/sbin/smbd --foreground --no-process-group # On MacOS ssh linux-host -L 4450:localhost:4450 <connect to localhost:4450 as Foo and mount the 'backup' share> I've attached a core dump from a test similar to above with an Arch docker container.
Can you add a line: panic action = /bin/sleep 999999 to the [global] section of your smb.conf, and then reproduce the problem, attach to the parent of the sleep process with gdb and do "bt" to get a proper stack backtrace please ? That should help us track this down.
Done! Here's the dump after installing the relevant debuginfo packages on f34: #0 0x00007ff16d22aaca in wait4 () from /lib64/libc.so.6 #1 0x00007ff16d1a809b in do_system () from /lib64/libc.so.6 #2 0x00007ff16d7b7faf in smb_panic_s3 (why=<optimized out>) at ../../source3/lib/util.c:840 #3 0x00007ff16db9ea6e in smb_panic (why=0x7ff16d4b9070 "Bad talloc magic value - unknown value") at ../../lib/util/fault.c:197 #4 0x00007ff16d4b2712 in _talloc_free.cold () from /lib64/libtalloc.so.2 #5 0x00007ff16d9669f9 in create_conn_struct_cwd (mem_ctx=0x55d9be556d00, ev=0x55d9be50bc60, msg=0x55d9be5103d0, session_info=0x55d9be54b860, snum=<optimized out>, path=0x55d9be560f80 "/data/backups/%u", c=0x55d9be556d90) at ../../source3/smbd/msdfs.c:529 #6 0x00007ff16d9e70b8 in mds_init_ctx (mem_ctx=mem_ctx@entry=0x55d9be547f90, ev=0x55d9be50bc60, msg_ctx=msg_ctx@entry=0x55d9be5103d0, session_info=session_info@entry=0x55d9be54b860, snum=snum@entry=1, sharename=sharename@entry=0x55d9be546350 "data", path=0x55d9be560f00 "/data/backups/%u") at ../../source3/rpc_server/mdssvc/mdssvc.c:1680 #7 0x00007ff16d9e7ebd in create_mdssvc_policy_handle (handle=0x55d9be509180, path=0x55d9be560f00 "/data/backups/%u", sharename=<optimized out>, snum=1, p=0x55d9be556300, mem_ctx=0x55d9be547f90) at ../../source3/rpc_server/mdssvc/srv_mdssvc_nt.c:97 #8 _mdssvc_open (p=0x55d9be556300, r=0x55d9be546220) at ../../source3/rpc_server/mdssvc/srv_mdssvc_nt.c:147 #9 0x00007ff16d9e8b3f in mdssvc__op_dispatch_internal (dce_call=0x55d9be547f90, mem_ctx=<optimized out>, r=0x55d9be546220, dispatch=S3COMPAT_RPC_DISPATCH_EXTERNAL) at ./librpc/gen_ndr/ndr_mdssvc_scompat.c:120 #10 0x00007ff16d751fa8 in dcesrv_request (call=0x55d9be547f90) at ../../librpc/rpc/dcesrv_core.c:1895 #11 dcesrv_process_ncacn_packet (blob=..., pkt=<optimized out>, dce_conn=0x55d9be53d6a0) at ../../librpc/rpc/dcesrv_core.c:2291 #12 dcesrv_read_fragment_done (subreq=<optimized out>) at ../../librpc/rpc/dcesrv_core.c:2832 #13 0x00007ff16d4eb67f in dcerpc_read_ncacn_packet_done (subreq=<optimized out>) at ../../librpc/rpc/dcerpc_util.c:967 #14 0x00007ff16c92b04b in tstream_readv_pdu_readv_done (subreq=0x55d9be55c9a0) at ../../lib/tsocket/tsocket_helpers.c:319 #15 0x00007ff16c92352e in tstream_readv_done (subreq=<optimized out>) at ../../lib/tsocket/tsocket.c:604 #16 0x00007ff16d494742 in tevent_common_invoke_immediate_handler () from /lib64/libtevent.so.0 #17 0x00007ff16d49476e in tevent_common_loop_immediate () from /lib64/libtevent.so.0 #18 0x00007ff16d498000 in epoll_event_loop_once () from /lib64/libtevent.so.0 #19 0x00007ff16d49069b in std_event_loop_once () from /lib64/libtevent.so.0 #20 0x00007ff16d492da8 in _tevent_loop_once () from /lib64/libtevent.so.0 #21 0x00007ff16d492e9b in tevent_common_loop_wait () from /lib64/libtevent.so.0 #22 0x00007ff16d49070b in std_event_loop_wait () from /lib64/libtevent.so.0 #23 0x00007ff16d961920 in smbd_process (ev_ctx=0x55d9be50bc60, msg_ctx=<optimized out>, dce_ctx=<optimized out>, sock_fd=52, interactive=<optimized out>) at ../../source3/smbd/process.c:4232 #24 0x000055d9bd523bcd in smbd_accept_connection (ev=0x55d9be50bc60, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/server.c:1020 #25 0x00007ff16d4944f5 in tevent_common_invoke_fd_handler () from /lib64/libtevent.so.0 #26 0x00007ff16d49821f in epoll_event_loop_once () from /lib64/libtevent.so.0 #27 0x00007ff16d49069b in std_event_loop_once () from /lib64/libtevent.so.0 #28 0x00007ff16d492da8 in _tevent_loop_once () from /lib64/libtevent.so.0 #29 0x00007ff16d492e9b in tevent_common_loop_wait () from /lib64/libtevent.so.0 #30 0x00007ff16d49070b in std_event_loop_wait () from /lib64/libtevent.so.0 #31 0x000055d9bd52079d in smbd_parent_loop (parent=0x55d9be523500, ev_ctx=0x55d9be50bc60) at ../../source3/smbd/server.c:1367 #32 main (argc=<optimized out>, argv=<optimized out>) at ../../source3/smbd/server.c:2220
Oh, I see the problem. 509 NTSTATUS create_conn_struct_cwd(TALLOC_CTX *mem_ctx, 510 struct tevent_context *ev, 511 struct messaging_context *msg, 512 const struct auth_session_info *session_info, 513 int snum, 514 const char *path, 515 struct connection_struct **c) 516 { 517 NTSTATUS status; 518 519 become_root(); 520 status = create_conn_struct_as_root(mem_ctx, 521 ev, 522 msg, 523 c, 524 snum, 525 path, 526 session_info); 527 unbecome_root(); 528 if (!NT_STATUS_IS_OK(status)) { 529 TALLOC_FREE(c); 530 return status; 531 } 532 533 return NT_STATUS_OK; 534 } It's the TALLOC_FREE(c) on line 529 in the error path that is failing. Just remove that line. In the error case, 'c' has not been assigned to, so it's currently pointing to the address of a pointer *within* a TALLOC'ed struct. That TALLOC_FREE(c) just shouldn't be there, it's a bug in the error case.
Ralph, this one is in your code I think. I have a fix, but I'd like you to look it over :-).
Created attachment 16749 [details] git-am fix for master. Passes "make test TESTS=samba.tests.blackbox.mdsearch". I'm going to put in ci now.
ci passes. MR is: https://gitlab.com/samba-team/samba/-/merge_requests/2125
Stewart - can you confirm this patch fixes the issue for you please ?
This bug was referenced in samba master: b4d8c62c4e8191e05fd03dd096a0bc989e224ed3 857045f3a236dea125200dd09279d677e513682b
Rebuilt my OS package (samba-4.14.6-0.fc34.x86_64) yesterday with the patch and can confirm it fixes the core dumps, and Big Sur can now configure the networked backup destination :) Thanks!
Created attachment 16753 [details] git-am fix for 4.15.rcnext, 4.14.next. Cherry-picked from master. Applies cleanly to 4.15.rcNext, 4.14.next.
Reassigning to Jule for inclusion in 4.14 and 4.15.
Pushed to autobuild-v4-{15,14}-test.
This bug was referenced in samba v4-15-test: 2ed234deee381cd15d7b7867136c5bbd78f5448c 57b266e23c459c8d0675ec17c8a5275f9c797781
This bug was referenced in samba v4-15-stable (Release samba-4.15.0rc5): 2ed234deee381cd15d7b7867136c5bbd78f5448c 57b266e23c459c8d0675ec17c8a5275f9c797781
This bug was referenced in samba v4-14-test: b00fed3b698cc78a377d71e0574c878e262c4808 97dc8c0dcccbcecd3a8f8f3872b47d3a3c6e8036
Closing out bug report. Thanks!
This bug was referenced in samba v4-14-stable (Release samba-4.14.8): b00fed3b698cc78a377d71e0574c878e262c4808 97dc8c0dcccbcecd3a8f8f3872b47d3a3c6e8036