Bug 14769 - smbd panic on force-close share during offload write
Summary: smbd panic on force-close share during offload write
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.12.14
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-25 15:35 UTC by Lev
Modified: 2021-09-07 07:14 UTC (History)
1 user (show)

See Also:


Attachments
smbd logs (238.55 KB, text/plain)
2021-07-25 15:35 UTC, Lev
no flags Details
Raw patch for 4.12.next. (1.63 KB, patch)
2021-08-04 01:26 UTC, Jeremy Allison
no flags Details
git-am fix for master. (2.57 KB, patch)
2021-08-04 19:30 UTC, Jeremy Allison
no flags Details
git-am fix for master. (1.28 KB, patch)
2021-08-05 17:24 UTC, Jeremy Allison
no flags Details
git-am fix for master. (19.30 KB, patch)
2021-08-07 06:39 UTC, Jeremy Allison
no flags Details
git-am fix for 4.15.rcNext. (20.56 KB, patch)
2021-08-11 20:56 UTC, Jeremy Allison
slow: review+
Details
git-am fix for 4.14.next, 4.13.next. (1.45 KB, patch)
2021-08-11 21:49 UTC, Jeremy Allison
slow: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lev 2021-07-25 15:35:43 UTC
Created attachment 16701 [details]
smbd logs

When the connection is force closed, all relevant file descriptors are freed:

smbXsrv_tcon_disconnect() ->
	close_cnum() ->
		file_close_conn() ->
			close_file() ->
				file_free() ->
					fsp_free()
					
If offload write finishes after the connection was already closed, it uses already freed fsp. The panic is easily reproducible, if enable talloc_fill and add some sleep in smb_vfs_call_offload_write_send(). On the Windows client I called CopyFile() and during the sleep issued `smbcontrol all close-share vol-1`. smbd panics with SIGSEGV:

Thread 1 "smbd" received signal SIGSEGV, Segmentation fault.
0x00007f3ce6c5c31c in change_to_user_and_service (conn=0x4141414141410000, vuid=4702111237675155455) at ../../source3/smbd/uid.c:440
440             int snum = SNUM(conn);
(gdb) bt
#0  0x00007f3ce6c5c31c in change_to_user_and_service (conn=0x4141414141410000, vuid=4702111237675155455) at ../../source3/smbd/uid.c:440
#1  0x00007f3ce6c5c510 in change_to_user_and_service_by_fsp (fsp=0x7f3ccd8828e0) at ../../source3/smbd/uid.c:485
#2  0x00007f3ce6dd8e8d in vfswrap_offload_write_read_done (subreq=0x0) at ../../source3/modules/vfs_default.c:2194
#3  0x00007f3ce59ef91e in _tevent_req_notify_callback (req=0x7f3ccd8e9180, location=0x7f3ce6e5e340 "../../source3/smbd/vfs.c:1869") at ../../lib/tevent/tevent_req.c:141
#4  0x00007f3ce59efa7f in tevent_req_finish (req=0x7f3ccd8e9180, state=TEVENT_REQ_DONE, location=0x7f3ce6e5e340 "../../source3/smbd/vfs.c:1869") at ../../lib/tevent/tevent_req.c:193
#5  0x00007f3ce59efaac in _tevent_req_done (req=0x7f3ccd8e9180, location=0x7f3ce6e5e340 "../../source3/smbd/vfs.c:1869") at ../../lib/tevent/tevent_req.c:199
#6  0x00007f3ce6c7f3bb in smb_vfs_call_pread_done (subreq=0x0) at ../../source3/smbd/vfs.c:1869
#7  0x00007f3ce59ef91e in _tevent_req_notify_callback (req=0x7f3ccd8e9500, location=0x7f3ce6ee5940 "../../source3/modules/vfs_default.c:904") at ../../lib/tevent/tevent_req.c:141
#8  0x00007f3ce59efa7f in tevent_req_finish (req=0x7f3ccd8e9500, state=TEVENT_REQ_DONE, location=0x7f3ce6ee5940 "../../source3/modules/vfs_default.c:904") at ../../lib/tevent/tevent_req.c:193
#9  0x00007f3ce59efaac in _tevent_req_done (req=0x7f3ccd8e9500, location=0x7f3ce6ee5940 "../../source3/modules/vfs_default.c:904") at ../../lib/tevent/tevent_req.c:199
#10 0x00007f3ce6dd4db1 in vfs_pread_done (subreq=0x0) at ../../source3/modules/vfs_default.c:904
#11 0x00007f3ce59ef91e in _tevent_req_notify_callback (req=0x7f3ccd8e4f00, location=0x7f3cde52ab18 "../../lib/pthreadpool/pthreadpool_tevent.c:422") at ../../lib/tevent/tevent_req.c:141
#12 0x00007f3ce59efa7f in tevent_req_finish (req=0x7f3ccd8e4f00, state=TEVENT_REQ_DONE, location=0x7f3cde52ab18 "../../lib/pthreadpool/pthreadpool_tevent.c:422") at ../../lib/tevent/tevent_req.c:193
#13 0x00007f3ce59efaac in _tevent_req_done (req=0x7f3ccd8e4f00, location=0x7f3cde52ab18 "../../lib/pthreadpool/pthreadpool_tevent.c:422") at ../../lib/tevent/tevent_req.c:199
#14 0x00007f3cde525dfa in pthreadpool_tevent_job_done (ctx=0x7f3ccd8a1560, im=0x7f3ccd8e5150, private_data=0x7f3ccd8e50b0) at ../../lib/pthreadpool/pthreadpool_tevent.c:422
#15 0x00007f3ce59ee812 in tevent_common_invoke_immediate_handler (im=0x7f3ccd8e5150, removed=0x0) at ../../lib/tevent/tevent_immediate.c:166
#16 0x00007f3ce59ee918 in tevent_common_loop_immediate (ev=0x7f3ccd8a1560) at ../../lib/tevent/tevent_immediate.c:203
#17 0x00007f3ce59f9135 in epoll_event_loop_once (ev=0x7f3ccd8a1560, location=0x7f3ce6e67a98 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent_epoll.c:918
#18 0x00007f3ce59f5931 in std_event_loop_once (ev=0x7f3ccd8a1560, location=0x7f3ce6e67a98 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent_standard.c:110
#19 0x00007f3ce59ed2a4 in _tevent_loop_once (ev=0x7f3ccd8a1560, location=0x7f3ce6e67a98 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent.c:772
#20 0x00007f3ce59ed5f6 in tevent_common_loop_wait (ev=0x7f3ccd8a1560, location=0x7f3ce6e67a98 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent.c:895
#21 0x00007f3ce59f59d3 in std_event_loop_wait (ev=0x7f3ccd8a1560, location=0x7f3ce6e67a98 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent_standard.c:141
#22 0x00007f3ce59ed699 in _tevent_loop_wait (ev=0x7f3ccd8a1560, location=0x7f3ce6e67a98 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent.c:914
#23 0x00007f3ce6c9ed46 in smbd_process (ev_ctx=0x7f3ccd8a1560, msg_ctx=0x7f3ccd8d9320, sock_fd=38, interactive=false) at ../../source3/smbd/process.c:4170
#24 0x0000556c18fcd39b in smbd_accept_connection (ev=0x7f3ccd8a1560, fde=0x7f3ccd8da9e0, flags=1, private_data=0x7f3ccd93d7a0) at ../../source3/smbd/server.c:1019
#25 0x00007f3ce59ee085 in tevent_common_invoke_fd_handler (fde=0x7f3ccd8da9e0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#26 0x00007f3ce59f8aec in epoll_event_loop (epoll_ev=0x7f3ccd82d660, tvalp=0x7ffc3d3a3d50) at ../../lib/tevent/tevent_epoll.c:736
#27 0x00007f3ce59f91cf in epoll_event_loop_once (ev=0x7f3ccd8a1560, location=0x556c18fd5a08 "../../source3/smbd/server.c:1366") at ../../lib/tevent/tevent_epoll.c:937
#28 0x00007f3ce59f5931 in std_event_loop_once (ev=0x7f3ccd8a1560, location=0x556c18fd5a08 "../../source3/smbd/server.c:1366") at ../../lib/tevent/tevent_standard.c:110
#29 0x00007f3ce59ed2a4 in _tevent_loop_once (ev=0x7f3ccd8a1560, location=0x556c18fd5a08 "../../source3/smbd/server.c:1366") at ../../lib/tevent/tevent.c:772
#30 0x00007f3ce59ed5f6 in tevent_common_loop_wait (ev=0x7f3ccd8a1560, location=0x556c18fd5a08 "../../source3/smbd/server.c:1366") at ../../lib/tevent/tevent.c:895
#31 0x00007f3ce59f59d3 in std_event_loop_wait (ev=0x7f3ccd8a1560, location=0x556c18fd5a08 "../../source3/smbd/server.c:1366") at ../../lib/tevent/tevent_standard.c:141
#32 0x00007f3ce59ed699 in _tevent_loop_wait (ev=0x7f3ccd8a1560, location=0x556c18fd5a08 "../../source3/smbd/server.c:1366") at ../../lib/tevent/tevent.c:914
#33 0x0000556c18fce152 in smbd_parent_loop (ev_ctx=0x7f3ccd8a1560, parent=0x7f3ccd8d9940) at ../../source3/smbd/server.c:1366
#34 0x0000556c18fd0647 in main (argc=3, argv=0x7ffc3d3a4378) at ../../source3/smbd/server.c:2226
(gdb) f 1
#1  0x00007f3ce6c5c510 in change_to_user_and_service_by_fsp (fsp=0x7f3ccd8828e0) at ../../source3/smbd/uid.c:485
485             return change_to_user_and_service(fsp->conn, fsp->vuid);
(gdb) p/x *fsp
$1 = {next = 0x0, prev = 0x0, fnum = 0x8041414141, op = 0x4141414141414141, conn = 0x4141414141410000, fh = 0x7f3ccd882980, num_smb_operations = 0x41414141, file_id = {devid = 0x4141414141414141, inode = 0xffffffffffffffff, extid = 0x0}, 
  initial_allocation_size = 0x4141414141414141, file_pid = 0x4141, vuid = 0x41414141ffffffff, open_time = {tv_sec = 0x4141414141414141, tv_usec = 0x4141414141414141}, access_mask = 0xe2c2dee0, kernel_share_modes_taken = 0x3c, 
  update_write_time_triggered = 0x7f, update_write_time_event = 0x7f3ce2c3c610, update_write_time_on_close = 0x60, close_write_time = {tv_sec = 0x7ffc3d3a1b80, tv_nsec = 0x7ffc3d3a1b88}, write_time_forced = 0x0, oplock_type = 0x0, leases_db_seqnum = 0x0, 
  lease_type = 0x0, lease = 0x4141414141414141, sent_oplock_break = 0x41414141, oplock_timeout = 0x4141414141414141, last_lock_failure = {context = {smblctx = 0x4141414141414141, tid = 0x41414141, pid = {pid = 0x4141414141414141, task_id = 0x41414141, 
        vnn = 0x41414141, unique_id = 0x4141414141414141}}, start = 0x4141414141414141, size = 0x4141414141414141, fnum = 0x4141414141414141, lock_type = 0x41414141, lock_flav = 0x41414141}, current_lock_count = 0x41414141, can_lock = 0x41, 
  can_read = 0x41, can_write = 0x41, modified = 0x41, is_directory = 0x41, aio_write_behind = 0x41, initial_delete_on_close = 0x41, delete_on_close = 0x41, posix_flags = 0x4141414141414141, is_sparse = 0x41, backup_intent = 0x41, use_ofd_locks = 0x41, 
  fsp_name = 0x4141414141414141, name_hash = 0x41414141, mid = 0x4141414141414141, vfs_extension = 0x4141414141414141, fake_file_handle = 0x4141414141414141, notify = 0x4141414141414141, base_fsp = 0x4141414141414141, 
  share_mode_flags_seqnum = 0x41414141, share_mode_flags = 0x4141, brlock_seqnum = 0x41414141, brlock_rec = 0x4141414141414141, dptr = 0x4141414141414141, print_file = 0x4141414141414141, num_aio_requests = 0x41414141, aio_requests = 0x4141414141414141, 
  closing = 0x41, blocked_smb1_lock_reqs = 0x4141414141414141, lock_failure_seen = 0x41, lock_failure_offset = 0x4141414141414141, deferred_close = 0x4141414141414141}
  
If talloc_fill is not enabled, fsp->conn usually is NULL (if the memory was not reallocated), in this case change_to_user_and_service() returns false (with error message "Connection not open\n") and later smbd panics on assert in vfswrap_offload_write_cleanup():

[2021/07/25 17:43:21.838902,  1, pid=15161] ../../source3/smbd/uid.c:446(change_to_user_and_service)
  change_to_user_and_service: Connection not open
[2021/07/25 17:43:21.838954,  0, pid=15161, class=vfs] ../../source3/modules/vfs_default.c:1964(vfswrap_offload_write_cleanup)
  PANIC: assert failed at ../../source3/modules/vfs_default.c(1964): ok
Comment 1 Jeremy Allison 2021-08-04 01:26:36 UTC
Created attachment 16712 [details]
Raw patch for 4.12.next.

Lev <lev@zadarastorage.com>. Can you test this patch for 4.12.next ?

I have a similar one for master, but want to see if this fixes it for you first.
Comment 2 Lev 2021-08-04 07:01:40 UTC
(In reply to Jeremy Allison from comment #1)

Thanks, Jeremy, yes, the patch fixes the issue in my environment.
-Lev.
Comment 3 Jeremy Allison 2021-08-04 19:30:25 UTC
Created attachment 16716 [details]
git-am fix for master.

Passes ci. MR is:

https://gitlab.com/samba-team/samba/-/merge_requests/2103
Comment 4 Jeremy Allison 2021-08-05 17:24:21 UTC
Created attachment 16719 [details]
git-am fix for master.

Lev, Metze would like a more generic fix the prevents us from forgetting to add the aio to the fsp handle as we make more FSCTL calls async. Here is the generic fix (applies cleanly to 4.12).

Can you give this a test to make sure it still fixes the problem for you whilst I work on adding the tests he requires ?

Thanks !

Jeremy.
Comment 5 Jeremy Allison 2021-08-07 06:39:46 UTC
Created attachment 16720 [details]
git-am fix for master.

Generic fix requested by Metze, also with tests. The tests probably won't back-port, but the fix will.

In CI at:

https://gitlab.com/samba-team/devel/samba/-/pipelines/349475260

as MR:

https://gitlab.com/samba-team/samba/-/merge_requests/2103
Comment 6 Lev 2021-08-08 09:10:06 UTC
(In reply to Jeremy Allison from comment #4)

Hi Jeremy, 
The latest (generic) patch fixes the issue as well.
Thanks,
 Lev.
Comment 7 Samba QA Contact 2021-08-11 20:03:04 UTC
This bug was referenced in samba master:

6b6770c2ba83bf25da31623443c19a8de34e5ba4
62cd95096a76d5064b105c1b4971fa3eabd5f85d
0f4a8d26888ec156979a00480ed9886dcac7d426
c551d33c6bd2e74ea3a36bec5575a70d6833b98a
7e7ea761a37f46f758582981bc40404ffd815513
c013509680742ff45b2f5965a5564015da7d466b
Comment 8 Jeremy Allison 2021-08-11 20:56:19 UTC
Created attachment 16729 [details]
git-am fix for 4.15.rcNext.

Cherry-picked from master. Includes test. Back ports to 4.14 etc. won't include the test.
Comment 9 Jeremy Allison 2021-08-11 21:49:44 UTC
Created attachment 16730 [details]
git-am fix for 4.14.next, 4.13.next.

Back port (without test) of fix that went into master and will go into 4.15.rcNext.
Comment 10 Jeremy Allison 2021-08-16 23:31:46 UTC
Ping, can we get this into 4.15.0 ?
Comment 11 Ralph Böhme 2021-08-17 05:07:57 UTC
Reassigning to Jule for inclusion in 4.13, 4.14 and 4.15.
Comment 12 Jule Anger 2021-08-17 07:22:47 UTC
Pushed to autobuild-v4-{15,14,13}-test.
Comment 13 Samba QA Contact 2021-08-17 08:50:05 UTC
This bug was referenced in samba v4-15-test:

04af36c4916713001a3a8b72b81946eb7a084cd1
69c5ab711066871b82bbb6db67642d808d104775
68ceb6c8f05d6c12aa2e1618ac205a9740126458
24b661c01ef54a01c1e73abfd7628781693a3224
654430f6f6f1a0c300be77e215fdb95fb808bf4e
9a23ff2ca2b101cba5614b238afca0c58658fc77
Comment 14 Samba QA Contact 2021-08-17 09:08:03 UTC
This bug was referenced in samba v4-14-test:

618fd6c259458f97530fcde0fb6600e70e6f2478
Comment 15 Samba QA Contact 2021-08-17 10:31:03 UTC
This bug was referenced in samba v4-13-test:

7c9aabe2dd01e77442d95582f17037b006bac9dd
Comment 16 Jule Anger 2021-08-17 12:16:42 UTC
Closing out bug report.

Thanks!
Comment 17 Samba QA Contact 2021-08-24 13:25:39 UTC
This bug was referenced in samba v4-14-stable (Release samba-4.14.7):

618fd6c259458f97530fcde0fb6600e70e6f2478
Comment 18 Samba QA Contact 2021-08-26 09:25:33 UTC
This bug was referenced in samba v4-15-stable (Release samba-4.15.0rc3):

04af36c4916713001a3a8b72b81946eb7a084cd1
69c5ab711066871b82bbb6db67642d808d104775
68ceb6c8f05d6c12aa2e1618ac205a9740126458
24b661c01ef54a01c1e73abfd7628781693a3224
654430f6f6f1a0c300be77e215fdb95fb808bf4e
9a23ff2ca2b101cba5614b238afca0c58658fc77
Comment 19 Samba QA Contact 2021-09-07 07:14:03 UTC
This bug was referenced in samba v4-13-stable (Release samba-4.13.11):

7c9aabe2dd01e77442d95582f17037b006bac9dd