We have a samba server and 2 windows clients, connected to the same share. Client-1 opens a file and immediately closes it, but Windows delays the close request. Then client-2 also tries to open a file. The oplock on client-1 is broken, it now issues real close request. The problem is that process handling connection with client-1 sends a message about oplock break BEFORE it removes share_mode: static NTSTATUS close_normal_file(struct smb_request *req, files_struct *fsp, enum file_close_type close_type) { ... if(fsp->oplock_type) { remove_oplock(fsp); } if (fsp->fh->ref_count == 1) { /* Should we return on error here... ? */ tmp = close_remove_share_mode(fsp, close_type); status = ntstatus_keeperror(status, tmp); } ... } So the client-2 process may see old share_mode, and fail the open request with error NT_STATUS_SHARING_VIOLATION, although client-2 tried to open a file AFTER client-1 closed it. To reproduce the issue I inserted artificial 5sec sleep into close_normal_file() after call to remove_oplock(). Scenario: Client-1: vm80, pid=17685. Client-2: vm86, pid=17743 1. Client-1 creates a file b4 with dwDesiredAccess=GENERIC_READ|GENERIC_WRITE and dwShareMode=0 and immediately closes it. Windows delays the close request, so the file remains with access_mask=0x12019f and oplock=0x3. [2016/08/10 15:25:12.055645, 10, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:4983(create_file_default) create_file: access_mask = 0x12019f file_attributes = 0x80, share_access = 0x0, create_disposition = 0x3 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = b4 [2016/08/10 15:25:12.069567, 5, pid=17685, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:87(set_file_oplock) set_file_oplock: granted oplock 0x3 on file /export/v1:b4, fd1b:90:0/2512722705, tv_sec = 57ab1d28, tv_usec = d978 2. Client-2 tries to open a file with dwDesiredAccess=GENERIC_READ and dwShareMode=FILE_SHARE_READ. [2016/08/10 15:25:17.854027, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:4983(create_file_default) create_file: access_mask = 0x120089 file_attributes = 0x80, share_access = 0x1, create_disposition = 0x3 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = b4 3. share_conflict() returns conflict, so break request is sent to client-1: [2016/08/10 15:25:17.864630, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1025(share_conflict) share_conflict: entry->access_mask = 0x12019f, entry->share_access = 0x0, entry->private_options = 0x0 [2016/08/10 15:25:17.864651, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1036(share_conflict) share_conflict: access_mask = 0x120089, share_access = 0x1 [2016/08/10 15:25:17.864669, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1085(share_conflict) share_conflict: [1] am (0x12019f) & right (0x6) = 0x6 [2016/08/10 15:25:17.864705, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1085(share_conflict) share_conflict: [1] sa (0x1) & share (0x2) = 0x0 [2016/08/10 15:25:17.864725, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1085(share_conflict) share_conflict: check 1 conflict am = 0x12019f, right = 0x6, sa = 0x1, share = 0x2 [2016/08/10 15:25:17.864829, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1484(delay_for_oplock) breaking from 7 to 1 [2016/08/10 15:25:17.864847, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1256(send_break_message) Sending break request to PID 17685 [2016/08/10 15:25:17.866427, 10, pid=17685, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:806(process_oplock_break_message) Got oplock break to 1 message from pid 17743: fd1b:90:0/2512722705 [2016/08/10 15:25:17.867347, 10, pid=17685, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:921(process_oplock_break_message) break_from=7, break_to=1 [2016/08/10 15:25:17.867441, 10, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_break.c:474(send_break_message_smb2) send_break_message_smb2: sending oplock break for file b4, fnum 2050546187, smb2 level 1 4. Client-1 closes the file: [2016/08/10 15:25:17.868508, 10, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_server.c:1967(smbd_smb2_request_dispatch) smbd_smb2_request_dispatch: opcode[SMB2_OP_CLOSE] mid = 5 4.1. Process 17685 removes oplock and sends a message to process 17743: [2016/08/10 15:25:17.868722, 10, pid=17685, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:244(remove_oplock) remove_oplock called for b4 [2016/08/10 15:25:17.871852, 10, pid=17685, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm.c:325(messaging_dgm_send) messaging_dgm_send: Sending message to 17743 4.3. Process 17685 goes to artificial 5sec sleep: [2016/08/10 15:25:17.871991, 0, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:746(close_normal_file) close_normal_file: sleep 0 [2016/08/10 15:25:18.873361, 0, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:746(close_normal_file) close_normal_file: sleep 1 [2016/08/10 15:25:19.874045, 0, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:746(close_normal_file) close_normal_file: sleep 2 [2016/08/10 15:25:20.874759, 0, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:746(close_normal_file) close_normal_file: sleep 3 [2016/08/10 15:25:21.874967, 0, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:746(close_normal_file) close_normal_file: sleep 4 4.2. Process 17743 wakes up, retries file open and fails again in share_conflict, because it still sees share_modes with access_mask=0x120089: [2016/08/10 15:25:17.875106, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/lib/messages.c:252(messaging_recv_cb) messaging_recv_cb: Received message 0xfa3 len 44 (num_fds:0) from 17685 [2016/08/10 15:25:17.883919, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:510(smbd_smb2_create_send) smbd_smb2_create_send: reentrant for file b4 [2016/08/10 15:25:17.885296, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2478(open_file_ntcreate) open_file_ntcreate: fname=b4, dos_attrs=0x80 access_mask=0x120089 share_access=0x1 create_disposition = 0x3 create_options=0x40 unix mode=0744 oplock_request=2 private_flags = 0x0 [2016/08/10 15:25:17.892258, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1025(share_conflict) share_conflict: entry->access_mask = 0x12019f, entry->share_access = 0x0, entry->private_options = 0x0 [2016/08/10 15:25:17.892280, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1036(share_conflict) share_conflict: access_mask = 0x120089, share_access = 0x1 [2016/08/10 15:25:17.892298, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1085(share_conflict) share_conflict: [1] am (0x12019f) & right (0x6) = 0x6 [2016/08/10 15:25:17.892316, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1085(share_conflict) share_conflict: [1] sa (0x1) & share (0x2) = 0x0 [2016/08/10 15:25:17.892334, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1085(share_conflict) share_conflict: check 1 conflict am = 0x12019f, right = 0x6, sa = 0x1, share = 0x2 [2016/08/10 15:25:17.892376, 10, pid=17743, effective(65534, 65534), real(65534, 0), class=locking] ../source3/locking/locking.c:749(share_mode_stale_pid) PID 17685 (index 0 out of 1) still exists [2016/08/10 15:25:17.898002, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:4804(create_file_unixpath) create_file_unixpath: NT_STATUS_SHARING_VIOLATION [2016/08/10 15:25:17.898563, 10, pid=17743, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_server.c:2617(smbd_smb2_request_done_ex) smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_SHARING_VIOLATION] body[8] dyn[yes:1] at ../source3/smbd/smb2_server.c:2788 4.4. Process 17685 removes share_modes, but it's too late, process 17743 already returned error to the client: [2016/08/10 15:25:22.875228, 0, pid=17685, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:751(close_normal_file) close_normal_file: ref_count=1 [2016/08/10 15:25:22.875511, 10, pid=17685, effective(65534, 65534), real(65534, 0), class=locking] ../source3/locking/share_mode_lock.c:167(parse_share_modes) parse_share_modes: share_modes: ARRAY(1) [2016/08/10 15:25:22.876001, 10, pid=17685, effective(65534, 65534), real(65534, 0), class=locking] ../source3/locking/share_mode_lock.c:187(unparse_share_modes) unparse_share_modes: share_modes: ARRAY(0) Attached are testparm output and full logs from both smbd processes.
Created attachment 12352 [details] Output of testparm -sv
Created attachment 12353 [details] Client-1 log
Created attachment 12354 [details] Client-2 log
What exact Samba version are you running? This should not happen since we changed locking.tdb to dbwrap_record_watch for this type of operation.
"remove_oplock called for b4" -- in 4.2.12 this message is in line 235, not 244... :-)
I'm using 4.2.12. I added some my prints/sleeps, so line numbers may differ.
Ok, I see how this can happen. Can you try a quick patch?
Sure, I have an environment ready. What's the patch?
Created attachment 12355 [details] bad hack Here it is... Don't tell anybody it's mine, this is REALLY only for testing!
Thanks, Volker. Yes, the patch solves the issue.
Great analysis - thanks ! Assigning to Volker as he's already looked at it (but this one interests too :-).
Created attachment 12359 [details] Proposed patch for master. Volker, what about this as a possible patch for master ? It splits out the internals of remove_oplock() into a new function remove_oplock_under_lock(), allowing it to be called from both remove_oplock() and also from close_remove_share_mode() whilst the lock is already held.
Created attachment 12360 [details] Test patch for 4.2.x. lev@zadarastorage.com - can you test this back-ported version of the proposed patch for master in your 4.2.x code ? Thanks !
Thanks, Jeremy. The patch fixes the issue in 4.2.12.
(In reply to Jeremy Allison from comment #12) > Volker, what about this as a possible patch for master ? Pushed, thanks! Volker
Created attachment 12362 [details] git-am fix for 4.5.0, 4.4.x, 4.3.x Contains cherry-pick info from master. Applies cleanly to 4.5.0, 4.4.x, 4.3.x.
Jeremy, if there is so much urgency on the review, you should alert me separately. I have it in my inbox, but other customer priorities prevented me from looking at this today.
Volker, there's no point in asking you specifically, and I'm not trying to bug you. I assumed you're busy, that's why I asked others to look at it. It's a pretty easy fix so doing the +1 is a formality for any of the requested engineers. One of my vendors wanted it in the next 4.x releases so I asked others to look. Don't take things too personally. Cheers, Jeremy.
Reassigning to Karolin for inclusion in 4.3, 4.4 and 4.5.
Pushed to autobuild-v4-5-test.
Pushed to v4-5-test.
Pushed to autobuild-v4-{3,4}-test.
Pushed to all branches. Closing out bug report. Thanks!