Created attachment 16323 [details] Capture of the network interface of the SAMBA server while issue was happening continuously The issue (having this kind of core dumps continuously) made the service totally unusable, we needed to rollback to 4.11 in order to fix this. Also adding a tcpdump capture monitoring the port 445. [`tcpdump -i enp59s0f0 port 445 -w samba_issue_03_nov.pcap`] The IP 128.142.53.37 is one (out of 4) of the server's in the clusters. The IPs starting with 188.x.x.x are Windows clients connected to this server while the issue was happening. 2020-11-03T17:09:09.914149+01:00 samba-13 smbd[3137521]: PANIC: assert failed at ../../source3/locking/share_mode_lock.c(448): !share_entries _exist 2020-11-03T17:09:09.914278+01:00 samba-13 smbd[3137521]: PANIC (pid 3137521): assert failed: !share_entries_exist 2020-11-03T17:09:09.914309+01:00 samba-13 smbd[3137521]: BACKTRACE: 2020-11-03T17:09:09.914724+01:00 samba-13 smbd[3137521]: #0 <unknown symbol> [ip=0x7ffb8120341d] [sp=0x7ffe3f47f0b0] 2020-11-03T17:09:09.915172+01:00 samba-13 smbd[3137521]: #1 <unknown symbol> [ip=0x7ffb80c5d619] [sp=0x7ffe3f47f9f0] 2020-11-03T17:09:09.915587+01:00 samba-13 smbd[3137521]: #2 <unknown symbol> [ip=0x7ffb81203631] [sp=0x7ffe3f47fa10] 2020-11-03T17:09:09.915970+01:00 samba-13 smbd[3137521]: #3 <unknown symbol> [ip=0x7ffb80d340ba] [sp=0x7ffe3f47fb20] 2020-11-03T17:09:09.916368+01:00 samba-13 smbd[3137521]: #4 _talloc_free + 0x384 [ip=0x7ffb8041b3a4] [sp=0x7ffe3f47fbb0] 2020-11-03T17:09:09.916769+01:00 samba-13 smbd[3137521]: #5 <unknown symbol> [ip=0x7ffb80e6f9ef] [sp=0x7ffe3f47fbe0] 2020-11-03T17:09:09.917150+01:00 samba-13 smbd[3137521]: #6 <unknown symbol> [ip=0x7ffb80e72c76] [sp=0x7ffe3f47fcb0] 2020-11-03T17:09:09.917534+01:00 samba-13 smbd[3137521]: #7 <unknown symbol> [ip=0x7ffb80e74ae6] [sp=0x7ffe3f47fe90] 2020-11-03T17:09:09.917912+01:00 samba-13 smbd[3137521]: #8 <unknown symbol> [ip=0x7ffb80eabc91] [sp=0x7ffe3f47ff80] 2020-11-03T17:09:09.918287+01:00 samba-13 smbd[3137521]: #9 <unknown symbol> [ip=0x7ffb80ea27cb] [sp=0x7ffe3f4800e0] 2020-11-03T17:09:09.918491+01:00 samba-13 smbd[3137019]: pccaz4493 (ipv4:128.141.155.67:54367) closed connection to service eos 2020-11-03T17:09:09.918664+01:00 samba-13 smbd[3137521]: #10 <unknown symbol> [ip=0x7ffb80ea3f10] [sp=0x7ffe3f480180] 2020-11-03T17:09:09.919041+01:00 samba-13 smbd[3137521]: #11 tevent_common_invoke_fd_handler + 0x83 [ip=0x7ffb7fffa443] [sp=0x7ffe3f480200] 2020-11-03T17:09:09.919406+01:00 samba-13 smbd[3137521]: #12 tevent_wakeup_recv + 0x120f [ip=0x7ffb800009bf] [sp=0x7ffe3f480230] 2020-11-03T17:09:09.919770+01:00 samba-13 smbd[3137521]: #13 tevent_cleanup_pending_signal_handlers + 0xcb [ip=0x7ffb7fffe99b] [sp=0x7ffe3f48 0290] 2020-11-03T17:09:09.920125+01:00 samba-13 smbd[3137521]: #14 _tevent_loop_once + 0x95 [ip=0x7ffb7fff9b15] [sp=0x7ffe3f4802b0] 2020-11-03T17:09:09.920530+01:00 samba-13 smbd[3137521]: #15 tevent_common_loop_wait + 0x1b [ip=0x7ffb7fff9dbb] [sp=0x7ffe3f4802e0] 2020-11-03T17:09:09.920906+01:00 samba-13 smbd[3137521]: #16 tevent_cleanup_pending_signal_handlers + 0x5b [ip=0x7ffb7fffe92b] [sp=0x7ffe3f480300] 2020-11-03T17:09:09.921277+01:00 samba-13 smbd[3137521]: #17 <unknown symbol> [ip=0x7ffb80e9209f] [sp=0x7ffe3f480320] 2020-11-03T17:09:09.921345+01:00 samba-13 smbd[3137521]: #18 <unknown symbol> [ip=0x55ab7f177d00] [sp=0x7ffe3f4803c0] 2020-11-03T17:09:09.921718+01:00 samba-13 smbd[3137521]: #19 tevent_common_invoke_fd_handler + 0x83 [ip=0x7ffb7fffa443] [sp=0x7ffe3f480490] 2020-11-03T17:09:09.922087+01:00 samba-13 smbd[3137521]: #20 tevent_wakeup_recv + 0x120f [ip=0x7ffb800009bf] [sp=0x7ffe3f4804c0] 2020-11-03T17:09:09.922441+01:00 samba-13 smbd[3137521]: #21 tevent_cleanup_pending_signal_handlers + 0xcb [ip=0x7ffb7fffe99b] [sp=0x7ffe3f480520] 2020-11-03T09:09:16.062867+01:00 samba-13 smbd[2735111]: #22 _tevent_loop_once + 0x95 [ip=0x7f28b7963b15] [sp=0x7ffcccbc2690] 2020-11-03T09:09:16.063220+01:00 samba-13 smbd[2735111]: #23 tevent_common_loop_wait + 0x1b [ip=0x7f28b7963dbb] [sp=0x7ffcccbc26c0] 2020-11-03T09:09:16.063492+01:00 samba-13 smbd[2735129]: pctecrg45 (ipv4:128.141.167.164:64469) connect to service eos initially as user lbos sink (uid=131995, gid=2766) (pid 29) 2020-11-03T09:09:16.063630+01:00 samba-13 smbd[2735111]: #24 tevent_cleanup_pending_signal_handlers + 0x5b [ip=0x7f28b796892b] [sp=0x7ffcccbc 26e0] 2020-11-03T09:09:16.063659+01:00 samba-13 smbd[2735111]: #25 main + 0x1bd2 [ip=0x556e37f1c2a2] [sp=0x7ffcccbc2700] 2020-11-03T09:09:16.064156+01:00 samba-13 smbd[2735111]: #26 __libc_start_main + 0xf3 [ip=0x7f28b739e6a3] [sp=0x7ffcccbc2ac0] 2020-11-03T09:09:16.064236+01:00 samba-13 smbd[2735111]: #27 _start + 0x2e [ip=0x556e37f1c65e] [sp=0x7ffcccbc2b80] 2020-11-03T09:09:16.064271+01:00 samba-13 smbd[2735111]: coredump is handled by helper binary specified at /proc/sys/kernel/core_pattern
We tried to analyze the issue with the core dumps we got using gdb: the 'share_entries_exist' variable seems not to be present, as in in the previous line, the 'key' variable doesn't seem to exist either. We have available the full core dump if you would like to have the full information. In our case this was the most disruptive core dump we have had so far, so we are mostly interested in finding the cause of this problem. (gdb) up 5 #5 0x00007f6b971700ba in share_mode_data_store (rec=0x555fb57a6b00, d=0x555fb57b55a0) at ../../source3/locking/share_mode_lock.c:448 448 SMB_ASSERT(!share_entries_exist); (gdb) l 443 444 if (!d->have_share_modes) { 445 TDB_DATA key = dbwrap_record_get_key(rec); 446 bool share_entries_exist; 447 share_entries_exist = dbwrap_exists(share_entries_db, key); 448 SMB_ASSERT(!share_entries_exist); 449 450 TALLOC_FREE(d->delete_tokens); 451 d->num_delete_tokens = 0; 452 (gdb) p share_entries_db $2 = (struct db_context *) 0x555fb5779240 (gdb) p *share_entries_db $3 = {fetch_locked = 0x7f6b96e38d90 <db_ctdb_fetch_locked>, try_fetch_locked = 0x7f6b96e38ce0 <db_ctdb_try_fetch_locked>, traverse = 0x7f6b96e37350 <db_ctdb_traverse>, traverse_read = 0x7f6b96e37290 <db_ctdb_traverse_read>, get_seqnum = 0x7f6b96e36fc0 <db_ctdb_get_seqnum>, transaction_start = 0x7f6b96e36c60 <db_ctdb_transaction_start>, transaction_start_nonblock = 0x0, transaction_commit = 0x7f6b96e39330 <db_ctdb_transaction_commit>, transaction_cancel = 0x7f6b96e36770 <db_ctdb_transaction_cancel>, parse_record = 0x7f6b96e377d0 <db_ctdb_parse_record>, parse_record_send = 0x7f6b96e37560 <db_ctdb_parse_record_send>, parse_record_recv = 0x7f6b96e37550 <db_ctdb_parse_record_recv>, do_locked = 0x0, exists = 0x0, wipe = 0x0, check = 0x0, id = 0x7f6b96e368e0 <db_ctdb_id>, name = 0x555fb5779350 "share_entries.tdb", private_data = 0x555fb5777a30, lock_order = DBWRAP_LOCK_ORDER_3, persistent = false} (gdb) p key $4 = <optimized out> (gdb) p share_entries_exist $5 = <optimized out>
I dump the backtrace we got form the coredump we got in abrt with gdb and debug symbols enabled, which will be more meaningful: #0 0x00007f6b95e8470f in raise () from /lib64/libc.so.6 #1 0x00007f6b95e6eb25 in abort () from /lib64/libc.so.6 #2 0x00007f6b9708a054 in dump_core () at ../../source3/lib/dumpcore.c:338 #3 0x00007f6b9709965c in smb_panic_s3 ( why=0x7f6b9733fd58 "assert failed: !share_entries_exist") at ../../source3/lib/util.c:853 #4 0x00007f6b9763f631 in smb_panic ( why=why@entry=0x7f6b9733fd58 "assert failed: !share_entries_exist") at ../../lib/util/fault.c:174 #5 0x00007f6b971700ba in share_mode_data_store (rec=0x555fb57a6b00, d=0x555fb57b55a0) at ../../source3/locking/share_mode_lock.c:448 #6 share_mode_lock_destructor (lck=<optimized out>) at ../../source3/locking/share_mode_lock.c:686 #7 share_mode_lock_destructor (lck=<optimized out>) at ../../source3/locking/share_mode_lock.c:675 #8 0x00007f6b968573a4 in _talloc_free () from /lib64/libtalloc.so.2 #9 0x00007f6b972ab9ef in open_directory (conn=conn@entry=0x555fb5753520, req=req@entry=0x555fb57a4000, smb_dname=smb_dname@entry=0x555fb574ed70, access_mask=<optimized out>, access_mask@entry=1048705, share_access=share_access@entry=7, create_disposition=create_disposition@entry=1, create_options=1, file_attributes=16, pinfo=0x7ffec433af54, result=0x7ffec433af60) at ../../source3/smbd/open.c:4602 #10 0x00007f6b972aec76 in create_file_unixpath (conn=conn@entry=0x555fb5753520, try=0x555fb57a4000, smb_fname=smb_fname@entry=0x555fb574ed70, access_mask=access_mask@entry=1048705, share_access=share_access@entry=7, create_disposition=create_disposition@entry=1, create_options=1, file_attributes=0, oplock_request=0, lease=<optimized out>, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x7ffec433b118, pinfo=0x7ffec433b114) at ../../source3/smbd/open.c:5593 #11 0x00007f6b972b0ae6 in create_file_default (conn=0x555fb5753520, req=0x555fb57a4000, root_dir_fid=<optimized out>, smb_fname=0x555fb574ed70, access_mask=1048705, share_access=7, create_disposition=1, create_options=1, file_attributes=0, oplock_request=0, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x555fb57a5dc8, pinfo=0x555fb57a5ddc, in_context_blobs=0x7ffec433b260, out_context_blobs=0x555fb57b5430) at ../../source3/smbd/open.c:6045 #12 0x00007f6b972e7c91 in smbd_smb2_create_send (in_context_blobs=..., in_name=<optimized out>, in_create_options=<optimized out>, in_create_disposition=<optimized out>, in_share_access=<optimized out>, in_file_attributes=<optimized out>, in_desired_access=<optimized out>, in_impersonation_level=<optimized out>, in_oplock_level=<optimized out>, smb2req=0x555fb579efe0, ev=<optimized out>, mem_ctx=0x555fb579efe0) at ../../source3/smbd/smb2_create.c:983 #13 smbd_smb2_request_process_create (smb2req=smb2req@entry=0x555fb579efe0) at ../../source3/smbd/smb2_create.c:268 #14 0x00007f6b972de7cb in smbd_smb2_request_dispatch (req=req@entry=0x555fb579efe0) at ../../source3/smbd/smb2_server.c:2709 #15 0x00007f6b972dff10 in smbd_smb2_io_handler (fde_flags=<optimized out>, xconn=0x555fb5795ba0) at ../../source3/smbd/smb2_server.c:4060 #16 smbd_smb2_connection_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/smb2_server.c:4098 #17 0x00007f6b96436443 in tevent_common_invoke_fd_handler () from /lib64/libtevent.so.0 #18 0x00007f6b9643c9bf in epoll_event_loop_once () from /lib64/libtevent.so.0 #19 0x00007f6b9643a99b in std_event_loop_once () from /lib64/libtevent.so.0 #20 0x00007f6b96435b15 in _tevent_loop_once () from /lib64/libtevent.so.0 #21 0x00007f6b96435dbb in tevent_common_loop_wait () from /lib64/libtevent.so.0 #22 0x00007f6b9643a92b in std_event_loop_wait () from /lib64/libtevent.so.0 #23 0x00007f6b972ce09f in smbd_process (ev_ctx=0x555fb5752440, msg_ctx=<optimized out>, sock_fd=42, interactive=<optimized out>) at ../../source3/smbd/process.c:4170 #24 0x0000555fb3d50d00 in smbd_accept_connection (ev=0x555fb5752440, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/server.c:1012 #25 0x00007f6b96436443 in tevent_common_invoke_fd_handler () from /lib64/libtevent.so.0 #26 0x00007f6b9643c9bf in epoll_event_loop_once () from /lib64/libtevent.so.0 #27 0x00007f6b9643a99b in std_event_loop_once () from /lib64/libtevent.so.0 #28 0x00007f6b96435b15 in _tevent_loop_once () from /lib64/libtevent.so.0 #29 0x00007f6b96435dbb in tevent_common_loop_wait () from /lib64/libtevent.so.0 #30 0x00007f6b9643a92b in std_event_loop_wait () from /lib64/libtevent.so.0 #31 0x0000555fb3d4b2a2 in smbd_parent_loop (parent=0x555fb576a140, ev_ctx=0x555fb5752440) at ../../source3/smbd/server.c:1359 #32 main (argc=<optimized out>, argv=<optimized out>) at ../../source3/smbd/server.c:2197
Any update on this? Is there any more input needed for investigating it? As mentioned, this issue was quite disruptive in our machines, the service was completely unusable during the time it happened. We tested the version 4.12.7 for a couple of weeks and everything was ready, but when we hit the peak of users was when it started happenning, so on our opinion it's quite important to have a look to this issue, and fix it for newer versions.
(In reply to Aritz Brosa from comment #3) If this is urgent and you don't have the time to wait for someone to volunteer and look into this, you might want to consider dedicated support from you OS vendor or from one of the companies listed here: https://www.samba.org/samba/support/globalsupport.html.
(In reply to Ralph Böhme from comment #4) We understand and appreciate the support is based on community effort, nevertheless it seemed that the failed assert was quite visible. At the same time, looking at how the code has evolved, version 4.13 got an important refactoring of the problematic part: https://github.com/samba-team/samba/commit/49951b283d98d81a2201341624b4bcdd3eba9ec4 Therefore an eventual fix for this issue could only have targeted version 4.12, not future ones. We will test version 4.13 to see if it reproduces a similar issue around the share_entries struct, and will update this bug report once we have a conclusion on that for the latest version.