Bug 11689 - smbd panic with glusterfs vfs module, ctdb and windows clients
smbd panic with glusterfs vfs module, ctdb and windows clients
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
4.2.4
x64 Linux
: P5 critical
: ---
Assigned To: Michael Adam
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-22 13:57 UTC by Stewart Webb
Modified: 2016-01-26 11:51 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stewart Webb 2016-01-22 13:57:06 UTC
I am trying to run Samba with the glusterfs vfs plugin and ctdb on CentOS 7, but am experiencing a crash in smbd (smbd panic) when a windows client is connected and a virtual public ip address is migrated.

Tested on:
CentOS 7 X86_64 official repos (4.2.3) and releases in
http://download.gluster.org/pub/gluster/glusterfs/samba/ (4.2.4)


Replication Instructions:
Connect using a windows client, I have tested windows 7 and windows 10. They connect using SMB2_1 and SMB3_0 respectively.

Start reading a large file (at this time, it's a video, just started it playing)

on the Host that the file happens to be being read from (confirmed with ctdb ip and netstat) run:
# ctdb disable
or
# ctdb moveip $addr_1 $other_node_number

This results in the file being read failing to be read and smbd taking up 100% CPU and then crashing (with panic) on the server we just issued ctdb disable on.
This also results in smbd processes becoming Zombies

Additional Notes:

This does not appear to happen if I do not use the glusterfs vfs module and mount the gluster volume locally on each server and re-export it's path.

This is my first bug report to samba - so if any additional info is required, please let me know.

Crash Logs (journalctl and log.smbd with level 10):
journalctl:
Dec 17 14:43:29 stor01 smbd[25469]: [2015/12/17 14:43:29.677576,  0] ../source3/modules/vfs_glusterfs.c:257(vfs_gluster_connect)
Dec 17 14:43:29 stor01 smbd[25469]:   online: Initialized volume from server localhost
Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.116591,  0] ../source3/lib/util.c:480(reinit_after_fork)
Dec 17 14:44:14 stor01 smbd[25885]:   messaging_reinit() failed: NT_STATUS_INTERNAL_DB_ERROR
Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.117441,  0] ../source3/lib/util.c:788(smb_panic_s3)
Dec 17 14:44:14 stor01 smbd[25885]:   PANIC (pid 25885): reinit_after_fork failed
Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.119102,  0] ../source3/lib/util.c:899(log_stack_trace)
Dec 17 14:44:14 stor01 smbd[25885]:   BACKTRACE: 17 stack frames:
Dec 17 14:44:14 stor01 smbd[25885]:    #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f96c4480d5a]
Dec 17 14:44:14 stor01 smbd[25885]:    #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f96c4480e30]
Dec 17 14:44:14 stor01 smbd[25885]:    #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f96c62d087f]
Dec 17 14:44:14 stor01 smbd[25885]:    #3 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15f815) [0x7f96c5ed7815]
Dec 17 14:44:14 stor01 smbd[25885]:    #4 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15faf1) [0x7f96c5ed7af1]
Dec 17 14:44:14 stor01 smbd[25885]:    #5 /usr/lib64/samba/libsmbd-shim-samba4.so(exit_server+0x12) [0x7f96c3e2fb82]
Dec 17 14:44:14 stor01 smbd[25885]:    #6 /usr/lib64/samba/libsmbd-base-samba4.so(+0x162bac) [0x7f96c5edabac]
Dec 17 14:44:14 stor01 smbd[25885]:    #7 /lib64/libsmbconf.so.0(messaging_dispatch_rec+0x21d) [0x7f96c448a94d]
Dec 17 14:44:14 stor01 smbd[25885]:    #8 /lib64/libsmbconf.so.0(+0x2ae35) [0x7f96c448ae35]
Dec 17 14:44:14 stor01 smbd[25885]:    #9 /lib64/libsmbconf.so.0(+0x48295) [0x7f96c44a8295]
Dec 17 14:44:14 stor01 smbd[25885]:    #10 /lib64/libsmbconf.so.0(run_events_poll+0x16c) [0x7f96c4496bfc]
Dec 17 14:44:14 stor01 smbd[25885]:    #11 /lib64/libsmbconf.so.0(+0x36e50) [0x7f96c4496e50]
Dec 17 14:44:14 stor01 smbd[25885]:    #12 /lib64/libtevent.so.0(_tevent_loop_once+0x8d) [0x7f96c2ebf0fd]
Dec 17 14:44:14 stor01 smbd[25885]:    #13 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7f96c2ebf29b]
Dec 17 14:44:14 stor01 smbd[25885]:    #14 /usr/sbin/smbd(main+0x1501) [0x7f96c6931361]
Dec 17 14:44:14 stor01 smbd[25885]:    #15 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f96c2b1bb15]
Dec 17 14:44:14 stor01 smbd[25885]:    #16 /usr/sbin/smbd(+0x7765) [0x7f96c6931765]
Dec 17 14:44:14 stor01 smbd[25885]: [2015/12/17 14:44:14.120675,  0] ../source3/lib/dumpcore.c:318(dump_core)
Dec 17 14:44:14 stor01 smbd[25885]:   dumping core in /var/log/samba/cores/smbd


log.smbd:
[2015/12/29 11:18:50.526099, 10, pid=15992, effective(0, 0), real(0, 0)] ../source3/lib/messages.c:252(messaging_recv_cb)
  messaging_recv_cb: Received message 0x315 len 40 (num_fds:0) from 0:19508
[2015/12/29 11:18:50.526161, 10, pid=15992, effective(0, 0), real(0, 0), class=scavenger] ../source3/smbd/scavenger.c:335(smbd_scavenger_msg)
  smbd_scavenger_msg: 0:15992 got message from 0:19508
[2015/12/29 11:18:50.527411, 10, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:838(ctdbd_control)
  ctdbd_control: Sending ctdb packet
[2015/12/29 11:18:50.527549, 10, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:838(ctdbd_control)
  ctdbd_control: Sending ctdb packet
[2015/12/29 11:18:50.527635,  2, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:180(ctdbd_working)
  Node has status 4, not active
[2015/12/29 11:18:50.527693,  2, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/ctdbd_conn.c:536(ctdbd_init_connection)
  Node is not working, can not connect
[2015/12/29 11:18:50.527826, 10, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/messages_ctdbd.c:163(messaging_ctdbd_init)
  ctdbd_messaging_connection failed: NT_STATUS_INTERNAL_DB_ERROR
[2015/12/29 11:18:50.527875,  1, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/messages.c:358(messaging_reinit)
  messaging_ctdbd_init failed: NT_STATUS_INTERNAL_DB_ERROR
[2015/12/29 11:18:50.527911,  0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/util.c:480(reinit_after_fork)
  messaging_reinit() failed: NT_STATUS_INTERNAL_DB_ERROR
[2015/12/29 11:18:50.528008,  2, pid=22094, effective(0, 0), real(0, 0), class=scavenger] ../source3/smbd/scavenger.c:253(smbd_scavenger_start)
  reinit_after_fork failed: NT_STATUS_INTERNAL_DB_ERROR
[2015/12/29 11:18:50.528068,  4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx)
  setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
[2015/12/29 11:18:50.528125,  5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug)
  Security token: (NULL)
[2015/12/29 11:18:50.528173,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token)
  UNIX token of user 0
  Primary group is 0 and contains 0 supplementary groups
[2015/12/29 11:18:50.528232,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user)
  change_to_root_user: now uid=(0,0) gid=(0,0)
[2015/12/29 11:18:50.528305,  4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx)
  setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
[2015/12/29 11:18:50.528331,  5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug)
  Security token: (NULL)
[2015/12/29 11:18:50.528356,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token)
  UNIX token of user 0
  Primary group is 0 and contains 0 supplementary groups
[2015/12/29 11:18:50.528390,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user)
  change_to_root_user: now uid=(0,0) gid=(0,0)
[2015/12/29 11:18:50.528414,  4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx)
  setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
[2015/12/29 11:18:50.528443,  5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug)
  Security token: (NULL)
[2015/12/29 11:18:50.528473,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token)
  UNIX token of user 0
  Primary group is 0 and contains 0 supplementary groups
[2015/12/29 11:18:50.528513,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user)
  change_to_root_user: now uid=(0,0) gid=(0,0)
[2015/12/29 11:18:50.528546,  4, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/sec_ctx.c:316(set_sec_ctx)
  setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
[2015/12/29 11:18:50.528572,  5, pid=22094, effective(0, 0), real(0, 0)] ../libcli/security/security_token.c:53(security_token_debug)
  Security token: (NULL)
[2015/12/29 11:18:50.528616,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/auth/token_util.c:639(debug_unix_user_token)
  UNIX token of user 0
  Primary group is 0 and contains 0 supplementary groups
[2015/12/29 11:18:50.528716,  5, pid=22094, effective(0, 0), real(0, 0)] ../source3/smbd/uid.c:425(smbd_change_to_root_user)
  change_to_root_user: now uid=(0,0) gid=(0,0)
[2015/12/29 11:18:50.528938,  0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/util.c:788(smb_panic_s3)
  PANIC (pid 22094): reinit_after_fork failed
[2015/12/29 11:18:50.530575,  0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/util.c:899(log_stack_trace)
  BACKTRACE: 17 stack frames:
   #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f0ce2b60d5a]
   #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f0ce2b60e30]
   #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f0ce49b087f]
   #3 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15f815) [0x7f0ce45b7815]
   #4 /usr/lib64/samba/libsmbd-base-samba4.so(+0x15faf1) [0x7f0ce45b7af1]
   #5 /usr/lib64/samba/libsmbd-shim-samba4.so(exit_server+0x12) [0x7f0ce250fb82]
   #6 /usr/lib64/samba/libsmbd-base-samba4.so(+0x162bac) [0x7f0ce45babac]
   #7 /lib64/libsmbconf.so.0(messaging_dispatch_rec+0x21d) [0x7f0ce2b6a94d]
   #8 /lib64/libsmbconf.so.0(+0x2ae35) [0x7f0ce2b6ae35]
   #9 /lib64/libsmbconf.so.0(+0x48295) [0x7f0ce2b88295]
   #10 /lib64/libsmbconf.so.0(run_events_poll+0x16c) [0x7f0ce2b76bfc]
   #11 /lib64/libsmbconf.so.0(+0x36e50) [0x7f0ce2b76e50]
   #12 /lib64/libtevent.so.0(_tevent_loop_once+0x8d) [0x7f0ce159f0fd]
   #13 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7f0ce159f29b]
   #14 /usr/sbin/smbd(main+0x1501) [0x7f0ce5011361]
   #15 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f0ce11fbb15]
   #16 /usr/sbin/smbd(+0x7765) [0x7f0ce5011765]
[2015/12/29 11:18:50.532236,  0, pid=22094, effective(0, 0), real(0, 0)] ../source3/lib/dumpcore.c:318(dump_core)
  dumping core in /var/log/samba/cores/smbd
Comment 1 Michael Adam 2016-01-24 15:31:31 UTC
I analyzed this togehter with Stewart and asked him to file the bug.

I think this is not related to gluster at all:

It seems that when durable handles are enabled,
and running (e.g.) ctdb disable on one node,
the scavenger daemon fails on the disabled node,
because it can't connect to ctdb (obviously).

Need to analyze more...
Comment 2 Stewart Webb 2016-01-25 12:14:12 UTC
I have just confirmed the issue still exists in samba 4.3.4 by re-building the system with Fedora.