I am trying to run Samba with the glusterfs vfs plugin and ctdb on CentOS 7, but am experiencing a crash in smbd (smbd panic) when a windows client is connected and a virtual public ip address is migrated. Tested on: CentOS 7 X86_64 official repos (4.2.3) and releases in http://download.gluster.org/pub/gluster/glusterfs/samba/ (4.2.4) Replication Instructions: Connect using a windows client, I have tested windows 7 and windows 10. They connect using SMB2_1 and SMB3_0 respectively. Start reading a large file (at this time, it's a video, just started it playing) on the Host that the file happens to be being read from (confirmed with ctdb ip and netstat) run: # ctdb disable or # ctdb moveip $addr_1 $other_node_number This results in the file being read failing to be read and smbd taking up 100% CPU and then crashing (with panic) on the server we just issued ctdb disable on. This also results in smbd processes becoming Zombies Additional Notes: This does not appear to happen if I do not use the glusterfs vfs module and mount the gluster volume locally on each server and re-export it's path. This is my first bug report to samba - so if any additional info is required, please let me know. Crash Logs (journalctl and log.smbd with level 10): journalctl: Dec 17 14:43:29 stor01 smbd[25469]: [2015/12/17 14:43:29.677576, 0] ../source3/modules/vfs_glusterfs.c:257(vfs_gluster_connect) Dec 17 14:43:29 stor01 smbd[25469]: online: Initialized volume from server localhost Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.116591, 0] ../source3/lib/util.c:480(reinit_after_fork) Dec 17 14:44:14 stor01 smbd[25885]: messaging_reinit() failed: NT_STATUS_INTERNAL_DB_ERROR Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.117441, 0] ../source3/lib/util.c:788(smb_panic_s3) Dec 17 14:44:14 stor01 smbd[25885]: PANIC (pid 25885): reinit_after_fork failed Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.119102, 0] ../source3/lib/util.c:899(log_stack_trace) Dec 17 14:44:14 stor01 smbd[25885]: BACKTRACE: 17 stack frames: Dec 17 14:44:14 stor01 smbd[25885]: #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f96c4480d5a] Dec 17 14:44:14 stor01 smbd[25885]: #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f96c4480e30] Dec 17 14:44:14 stor01 smbd[25885]: #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f96c62d087f] Dec 17 14:44:14 stor01 smbd[25885]: #3 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15f815) [0x7f96c5ed7815] Dec 17 14:44:14 stor01 smbd[25885]: #4 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15faf1) [0x7f96c5ed7af1] Dec 17 14:44:14 stor01 smbd[25885]: #5 /usr/lib64/samba/libsmbd-shim-samba4.so(exit_server+0x12) [0x7f96c3e2fb82] Dec 17 14:44:14 stor01 smbd[25885]: #6 /usr/lib64/samba/libsmbd-base-samba4.so(+0x162bac) [0x7f96c5edabac] Dec 17 14:44:14 stor01 smbd[25885]: #7 /lib64/libsmbconf.so.0(messaging_dispatch_rec+0x21d) [0x7f96c448a94d] Dec 17 14:44:14 stor01 smbd[25885]: #8 /lib64/libsmbconf.so.0(+0x2ae35) [0x7f96c448ae35] Dec 17 14:44:14 stor01 smbd[25885]: #9 /lib64/libsmbconf.so.0(+0x48295) [0x7f96c44a8295] Dec 17 14:44:14 stor01 smbd[25885]: #10 /lib64/libsmbconf.so.0(run_events_poll+0x16c) [0x7f96c4496bfc] Dec 17 14:44:14 stor01 smbd[25885]: #11 /lib64/libsmbconf.so.0(+0x36e50) [0x7f96c4496e50] Dec 17 14:44:14 stor01 smbd[25885]: #12 /lib64/libtevent.so.0(_tevent_loop_once+0x8d) [0x7f96c2ebf0fd] Dec 17 14:44:14 stor01 smbd[25885]: #13 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7f96c2ebf29b] Dec 17 14:44:14 stor01 smbd[25885]: #14 /usr/sbin/smbd(main+0x1501) [0x7f96c6931361] Dec 17 14:44:14 stor01 smbd[25885]: #15 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f96c2b1bb15] Dec 17 14:44:14 stor01 smbd[25885]: #16 /usr/sbin/smbd(+0x7765) [0x7f96c6931765] Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.120675, 0] ../source3/lib/dumpcore.c:318(dump_core) Dec 17 14:44:14 stor01 smbd[25885]: dumping core in /var/log/samba/cores/smbd log.smbd: [2015/12/29 11:18:50.526099, 10, pid=15992, effective(0, 0), real(0, 0)] ../source3/lib/messages.c:252(messaging_recv_cb) messaging_recv_cb: Received message 0x315 len 40 (num_fds:0) from 0:19508 [2015/12/29 11:18:50.526161, 10, pid=15992, effective(0, 0), real(0, 0), class=scavenger] ../source3/smbd/scavenger.c:335(smbd_scavenger_msg) smbd_scavenger_msg: 0:15992 got message from 0:19508 [2015/12/29 11:18:50.527411, 10, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:838(ctdbd_control) ctdbd_control: Sending ctdb packet [2015/12/29 11:18:50.527549, 10, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:838(ctdbd_control) ctdbd_control: Sending ctdb packet [2015/12/29 11:18:50.527635, 2, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:180(ctdbd_working) Node has status 4, not active [2015/12/29 11:18:50.527693, 2, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:536(ctdbd_init_connection) Node is not working, can not connect [2015/12/29 11:18:50.527826, 10, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/messages_ctdbd.c:163(messaging_ctdbd_init) ctdbd_messaging_connection failed: NT_STATUS_INTERNAL_DB_ERROR [2015/12/29 11:18:50.527875, 1, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/messages.c:358(messaging_reinit) messaging_ctdbd_init failed: NT_STATUS_INTERNAL_DB_ERROR [2015/12/29 11:18:50.527911, 0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/util.c:480(reinit_after_fork) messaging_reinit() failed: NT_STATUS_INTERNAL_DB_ERROR [2015/12/29 11:18:50.528008, 2, pid=22094, effective(0, 0), real(0, 0), class=scavenger] ../source3/smbd/scavenger.c:253(smbd_scavenger_start) reinit_after_fork failed: NT_STATUS_INTERNAL_DB_ERROR [2015/12/29 11:18:50.528068, 4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0 [2015/12/29 11:18:50.528125, 5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug) Security token: (NULL) [2015/12/29 11:18:50.528173, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups [2015/12/29 11:18:50.528232, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user) change_to_root_user: now uid=(0,0) gid=(0,0) [2015/12/29 11:18:50.528305, 4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0 [2015/12/29 11:18:50.528331, 5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug) Security token: (NULL) [2015/12/29 11:18:50.528356, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups [2015/12/29 11:18:50.528390, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user) change_to_root_user: now uid=(0,0) gid=(0,0) [2015/12/29 11:18:50.528414, 4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0 [2015/12/29 11:18:50.528443, 5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug) Security token: (NULL) [2015/12/29 11:18:50.528473, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups [2015/12/29 11:18:50.528513, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user) change_to_root_user: now uid=(0,0) gid=(0,0) [2015/12/29 11:18:50.528546, 4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0 [2015/12/29 11:18:50.528572, 5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug) Security token: (NULL) [2015/12/29 11:18:50.528616, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups [2015/12/29 11:18:50.528716, 5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user) change_to_root_user: now uid=(0,0) gid=(0,0) [2015/12/29 11:18:50.528938, 0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/util.c:788(smb_panic_s3) PANIC (pid 22094): reinit_after_fork failed [2015/12/29 11:18:50.530575, 0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/util.c:899(log_stack_trace) BACKTRACE: 17 stack frames: #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f0ce2b60d5a] #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f0ce2b60e30] #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f0ce49b087f] #3 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15f815) [0x7f0ce45b7815] #4 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15faf1) [0x7f0ce45b7af1] #5 /usr/lib64/samba/libsmbd-shim-samba4.so(exit_server+0x12) [0x7f0ce250fb82] #6 /usr/lib64/samba/libsmbd-base-samba4.so(+0x162bac) [0x7f0ce45babac] #7 /lib64/libsmbconf.so.0(messaging_dispatch_rec+0x21d) [0x7f0ce2b6a94d] #8 /lib64/libsmbconf.so.0(+0x2ae35) [0x7f0ce2b6ae35] #9 /lib64/libsmbconf.so.0(+0x48295) [0x7f0ce2b88295] #10 /lib64/libsmbconf.so.0(run_events_poll+0x16c) [0x7f0ce2b76bfc] #11 /lib64/libsmbconf.so.0(+0x36e50) [0x7f0ce2b76e50] #12 /lib64/libtevent.so.0(_tevent_loop_once+0x8d) [0x7f0ce159f0fd] #13 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7f0ce159f29b] #14 /usr/sbin/smbd(main+0x1501) [0x7f0ce5011361] #15 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f0ce11fbb15] #16 /usr/sbin/smbd(+0x7765) [0x7f0ce5011765] [2015/12/29 11:18:50.532236, 0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/dumpcore.c:318(dump_core) dumping core in /var/log/samba/cores/smbd
I analyzed this togehter with Stewart and asked him to file the bug. I think this is not related to gluster at all: It seems that when durable handles are enabled, and running (e.g.) ctdb disable on one node, the scavenger daemon fails on the disabled node, because it can't connect to ctdb (obviously). Need to analyze more...
I have just confirmed the issue still exists in samba 4.3.4 by re-building the system with Fedora.
(In reply to Stewart Webb from comment #2) Can we have this re-checked with latest stable versions?