Created attachment 13296 [details] Log file Errno EINTR is converted into NT_STATUS_RETRY. When open_file_ntcreate() gets status NT_STATUS_RETRY from open_file(), it calls schedule_async_open(). schedule_async_open() creates timer that panics smbd in 20sec: (gdb) bt #0 0x00007fbef1d9ac37 in read_alias_file (fname=<optimized out>, fname_len=<optimized out>) at localealias.c:335 #1 0x00007fbef34a349a in smb_panic_s3 (why=0x7fbef5c8666a "async open timeout") at ../source3/lib/util.c:822 #2 0x00007fbef5f8c14d in smb_panic (why=0x7fbef5c8666a "async open timeout") at ../lib/util/fault.c:166 #3 0x00007fbef5b34a43 in exit_server_common (how=SERVER_EXIT_ABNORMAL, reason=0x7fbef5c8666a "async open timeout") at ../source3/smbd/server_exit.c:240 #4 0x00007fbef5b34afa in smbd_exit_server (explanation=0x7fbef5c8666a "async open timeout") at ../source3/smbd/server_exit.c:258 #5 0x00007fbef2e60e1b in exit_server (reason=0x7fbef5c8666a "async open timeout") at ../source3/lib/smbd_shim.c:123 #6 0x00007fbef5aac0a5 in schedule_async_open_timer (ev=0x7fbef7946e20, te=0x7fbef79d1d70, current_time=..., private_data=0x7fbef7a489c0) at ../source3/smbd/open.c:2607 #7 0x00007fbef4f39ec3 in tevent_common_loop_timer_delay (ev=0x7fbef7946e20) at ../lib/tevent/tevent_timed.c:341 #8 0x00007fbef4f3b9d9 in epoll_event_loop (epoll_ev=0x7fbef79497b0, tvalp=0x7ffd48f56f70) at ../lib/tevent/tevent_epoll.c:659 #9 0x00007fbef4f3c234 in epoll_event_loop_once (ev=0x7fbef7946e20, location=0x7fbef5c95788 "../source3/smbd/process.c:4125") at ../lib/tevent/tevent_epoll.c:926 #10 0x00007fbef4f3911e in std_event_loop_once (ev=0x7fbef7946e20, location=0x7fbef5c95788 "../source3/smbd/process.c:4125") at ../lib/tevent/tevent_standard.c:114 #11 0x00007fbef4f328c3 in _tevent_loop_once (ev=0x7fbef7946e20, location=0x7fbef5c95788 "../source3/smbd/process.c:4125") at ../lib/tevent/tevent.c:533 #12 0x00007fbef4f32b0d in tevent_common_loop_wait (ev=0x7fbef7946e20, location=0x7fbef5c95788 "../source3/smbd/process.c:4125") at ../lib/tevent/tevent.c:637 #13 0x00007fbef4f391c0 in std_event_loop_wait (ev=0x7fbef7946e20, location=0x7fbef5c95788 "../source3/smbd/process.c:4125") at ../lib/tevent/tevent_standard.c:145 #14 0x00007fbef4f32bd8 in _tevent_loop_wait (ev=0x7fbef7946e20, location=0x7fbef5c95788 "../source3/smbd/process.c:4125") at ../lib/tevent/tevent.c:656 #15 0x00007fbef5adde16 in smbd_process (ev_ctx=0x7fbef7946e20, msg_ctx=0x7fbef79473a0, sock_fd=41, interactive=false) at ../source3/smbd/process.c:4125 #16 0x00007fbef6a4ed16 in smbd_accept_connection (ev=0x7fbef7946e20, fde=0x7fbef795f820, flags=1, private_data=0x7fbef795f790) at ../source3/smbd/server.c:1023 #17 0x00007fbef4f3bc16 in epoll_event_loop (epoll_ev=0x7fbef7946fa0, tvalp=0x7ffd48f57330) at ../lib/tevent/tevent_epoll.c:728 #18 0x00007fbef4f3c234 in epoll_event_loop_once (ev=0x7fbef7946e20, location=0x7fbef6a552b7 "../source3/smbd/server.c:1390") at ../lib/tevent/tevent_epoll.c:926 #19 0x00007fbef4f3911e in std_event_loop_once (ev=0x7fbef7946e20, location=0x7fbef6a552b7 "../source3/smbd/server.c:1390") at ../lib/tevent/tevent_standard.c:114 #20 0x00007fbef4f328c3 in _tevent_loop_once (ev=0x7fbef7946e20, location=0x7fbef6a552b7 "../source3/smbd/server.c:1390") at ../lib/tevent/tevent.c:533 #21 0x00007fbef4f32b0d in tevent_common_loop_wait (ev=0x7fbef7946e20, location=0x7fbef6a552b7 "../source3/smbd/server.c:1390") at ../lib/tevent/tevent.c:637 #22 0x00007fbef4f391c0 in std_event_loop_wait (ev=0x7fbef7946e20, location=0x7fbef6a552b7 "../source3/smbd/server.c:1390") at ../lib/tevent/tevent_standard.c:145 #23 0x00007fbef4f32bd8 in _tevent_loop_wait (ev=0x7fbef7946e20, location=0x7fbef6a552b7 "../source3/smbd/server.c:1390") at ../lib/tevent/tevent.c:656 #24 0x00007fbef6a4fb26 in smbd_parent_loop (ev_ctx=0x7fbef7946e20, parent=0x7fbef794a5d0) at ../source3/smbd/server.c:1390 #25 0x00007fbef6a51579 in main (argc=2, argv=0x7ffd48f57938) at ../source3/smbd/server.c:2051
this is by design. Under what circumstances can an open() on a disk file return -1,EINTR ? Note the man page: EINTR While blocked waiting to complete an open of a slow device (e.g., a FIFO; see fifo(7)), the call was interrupted by a sig‐ nal handler; see signal(7). are you doing an open on a slow device here ?
(In reply to Jeremy Allison from comment #1) Be careful, this is man 2 open on Linux, POSIX leaves this door wide open and just states: [EINTR] A signal was caught during open(). It seems we can't rely on open not returning -1/EINTR for a filesystem open. Maybe we should use a different errno from vfs_aio_pthread to signal *real* async open in-progress, so we can distinguish it from a failing open with -1/EINTR.
Yeah let's find out the *exact* circumstances before we start rushing to make changes here. What OS, what filesystem, what type of file etc. This code has been robust for a *long* while, so what changed for them ?
To be blunt - if open (2) on a filesystem file can return EINTR then we're screwed in many more code paths than just this one. This has never been the case on just about any system I've ever worked on (and I've worked on a lot :-). On a fifo or filsysytem pipe etc. maybe. But not on a regular file.
If a system returns -1,EINTR on open (2) the fix should be a custom patch that does: $ git diff diff --git a/source3/modules/vfs_default.c b/source3/modules/vfs_default.c index d339f39aa06..cb87ad2b026 100644 --- a/source3/modules/vfs_default.c +++ b/source3/modules/vfs_default.c @@ -566,7 +566,10 @@ static int vfswrap_open(vfs_handle_struct *handle, goto out; } - result = open(smb_fname->base_name, flags, mode); + do { + result = open(smb_fname->base_name, flags, mode); + } while (result == -1 && errno == EINTR); + out: END_PROFILE(syscall_open); return result; because that system is broken :-).
I guess what happens is that we somehow mistakenly did a blocking open on a file that is kernel oplocked by another process. We do a blocking open on a file when be the oplock holding smbd released the oplock and we got signalled via dbwrap_watch on the locking.tdb record of the file. So the real question is: why is the second blocking open still being blocked? Race condition? Something else? This would also mean that checking for EINTR in vfs_default is probably not the correct fix.
But we shouldn't be doing the open on a file with a kernel oplock, we should have requested the holding smbd to remove that oplock before trying to do an open. Looking briefly at the logic inside oplock.c we do remove the kernel oplock under lock, so the sending smbd shouldn't be proceeding until the kernel oplock is gone.
(In reply to Jeremy Allison from comment #7) Exactly, something is fishy here. For some unknown reason we are doing a blocking open on a kernel oplocked file that later get's the EINTR.
Can we get a reproducer that shows how we end up in this position ? I'm glad though - I was pretty sure that under normal conditions no system returns -1,EINTR on a disk file :-).
Hi Jeremy and Ralph, I finally found the scenario how smbd can do a blocking open on a kernel oplocked file. The flow is: 1. smbd-1 opens a file(nonblocking), and sets oplock. 2. smbd-2 opens a file(nonblocking), open is deferred. 3. smbd-1 releases an oplock and notifies smbd-2. 4. smbd-2 calls defer_open_done() and reschedules the open. 5. smbd-1 closes the file, opens it again and sets the oplock back. 6. smbd-2 calls open_file_ntcreate(). first_open_attempt will be false, so open is blocking. If smbd-2 has opened and took oplock for some other file beforehand, and now this file is opened by some 3rd process, the blocking open will return EINTR. Attached are log files from both smbd processes. Log summary is below. Isn't is safer to do always a non blocking open? -Lev. ================================================== OPEN file-0 [2017/09/10 11:00:47.688781, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:501(smbd_smb2_create_send) smbd_smb2_create: name[file-0] [2017/09/10 11:00:47.690660, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5527(create_file_default) create_file: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = file-0 [2017/09/10 11:00:47.690706, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5000(create_file_unixpath) create_file_unixpath: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 ea_list = 0x(nil), sd = 0x(nil), fname = file-0 [2017/09/10 11:00:47.696769, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2972(open_file_ntcreate) open_file_ntcreate: fname=file-0, dos_attrs=0x80 access_mask=0x120196 share_access=0x3 create_disposition = 0x5 create_options=0x40 unix mode=0744 oplock_request=2 private_flags = 0x0 [2017/09/10 11:00:47.697416, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3190(open_file_ntcreate) open_file_ntcreate: file file-0 - O_NONBLOCK [2017/09/10 11:00:47.697449, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3239(open_file_ntcreate) calling open_file with flags=0x2 flags2=0xA40 mode=0744, access_mask = 0x120196, open_access_mask = 0x120196 [2017/09/10 11:00:47.700334, 5, pid=7301, effective(65534, 65534), real(65534, 0), class=vfs] ../source3/modules/vfs_default.c:564(vfswrap_open) open(file-0, flags=0x20802, mode=0x1e4) [2017/09/10 11:00:47.700371, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:689(fd_open) fd_open: name file-0, flags = 04002 mode = 0744, fd = 37. [2017/09/10 11:00:47.700866, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1322(open_file) nobody opened file file-0 read=No write=Yes (numopen=2) SET OPLOCK file-0 [2017/09/10 11:00:47.708230, 0, pid=7301, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock_linux.c:155(linux_set_kernel_oplock) linux_set_kernel_oplock: got kernel oplock on file file-0, file_id = fd2c:83:0 gen_id = 919656476 [2017/09/10 11:00:47.708269, 0, pid=7301, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:89(set_file_oplock) set_file_oplock: granted oplock 0x3 on file /export/v1:file-0, fd2c:83:0/919656476, tv_sec = 59b4f12f, tv_usec = a8a3a [2017/09/10 11:00:47.708507, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2208(grant_fsp_oplock_type) grant_fsp_oplock_type: oplock type 0x3 on file file-0 OPEN file-1 [2017/09/10 11:01:36.144809, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:501(smbd_smb2_create_send) smbd_smb2_create: name[file-1] [2017/09/10 11:01:36.146468, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5527(create_file_default) create_file: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:01:36.146535, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5000(create_file_unixpath) create_file_unixpath: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:01:36.153974, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2972(open_file_ntcreate) open_file_ntcreate: fname=file-1, dos_attrs=0x80 access_mask=0x120196 share_access=0x3 create_disposition = 0x5 create_options=0x40 unix mode=0744 oplock_request=2 private_flags = 0x0 [2017/09/10 11:01:36.154079, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3190(open_file_ntcreate) open_file_ntcreate: file file-1 - O_NONBLOCK [2017/09/10 11:01:36.155113, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3239(open_file_ntcreate) calling open_file with flags=0x2 flags2=0xA40 mode=0744, access_mask = 0x120196, open_access_mask = 0x120196 [2017/09/10 11:01:36.159259, 5, pid=6519, effective(65534, 65534), real(65534, 0), class=vfs] ../source3/modules/vfs_default.c:564(vfswrap_open) open(file-1, flags=0x20802, mode=0x1e4) [2017/09/10 11:01:36.159294, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:689(fd_open) fd_open: name file-1, flags = 04002 mode = 0744, fd = 37. [2017/09/10 11:01:36.159332, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1322(open_file) nobody opened file file-1 read=No write=Yes (numopen=2) SET OPLOCK file-1 [2017/09/10 11:01:36.166851, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock_linux.c:155(linux_set_kernel_oplock) linux_set_kernel_oplock: got kernel oplock on file file-1, file_id = fd2c:84:0 gen_id = 3899747005 [2017/09/10 11:01:36.166887, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:89(set_file_oplock) set_file_oplock: granted oplock 0x3 on file /export/v1:file-1, fd2c:84:0/3899747005, tv_sec = 59b4f160, tv_usec = 23cbf [2017/09/10 11:01:36.167239, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2208(grant_fsp_oplock_type) grant_fsp_oplock_type: oplock type 0x3 on file file-1 OPEN file-1 [2017/09/10 11:01:55.834259, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:501(smbd_smb2_create_send) smbd_smb2_create: name[file-1] [2017/09/10 11:01:55.835836, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5527(create_file_default) create_file: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:01:55.835910, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5000(create_file_unixpath) create_file_unixpath: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:01:55.842655, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2972(open_file_ntcreate) open_file_ntcreate: fname=file-1, dos_attrs=0x80 access_mask=0x120196 share_access=0x3 create_disposition = 0x5 create_options=0x40 unix mode=0744 oplock_request=2 private_flags = 0x0 [2017/09/10 11:01:55.843386, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3190(open_file_ntcreate) open_file_ntcreate: file file-1 - O_NONBLOCK [2017/09/10 11:01:55.843426, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3239(open_file_ntcreate) calling open_file with flags=0x2 flags2=0xA40 mode=0744, access_mask = 0x120196, open_access_mask = 0x120196 BREAK OPLOCK file-1 [2017/09/10 11:01:55.847982, 0, pid=6519, effective(0, 0), real(0, 0), class=locking] ../source3/smbd/oplock.c:43(break_kernel_oplock) break_kernel_oplock: break oplock 0x3 on file /export/v1:file-1 [2017/09/10 11:01:55.849370, 0, pid=6519, effective(0, 0), real(0, 0), class=locking] ../source3/smbd/oplock.c:1002(process_kernel_oplock_break) Got kernel oplock break message from pid 6519: fd2c:84:0/3899747005 [2017/09/10 11:01:55.849407, 0, pid=6519, effective(0, 0), real(0, 0), class=locking] ../source3/smbd/oplock.c:658(initial_break_processing) initial_break_processing: called for fd2c:84:0/3899747005 Current oplocks_open (exclusive = 1, levelII = 0) [2017/09/10 11:01:55.849456, 0, pid=6519, effective(0, 0), real(0, 0)] ../source3/smbd/smb2_break.c:477(send_break_message_smb2) send_break_message_smb2: sending oplock break for file file-1, fnum 4224302802, smb2 level 0 [2017/09/10 11:01:55.855116, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_break.c:183(smbd_smb2_oplock_break_send) smbd_smb2_oplock_break_send: file-1 - fnum 4224302802, samba level 0 [2017/09/10 11:01:55.855815, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_break.c:194(smbd_smb2_oplock_break_send) smbd_smb2_oplock_break_send: got SMB2 oplock break (0) from client for file file-1, fnum 4224302802 DEFER OPEN file-1 [2017/09/10 11:01:55.850443, 3, pid=7301, effective(65534, 65534), real(65534, 0), class=vfs] ../source3/modules/vfs_default.c:566(vfswrap_open) open(file-1, flags=0x20802, mode=0x1e4): errno=11 [2017/09/10 11:01:55.850783, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:689(fd_open) fd_open: name file-1, flags = 04002 mode = 0744, fd = -1. Resource temporarily unavailable [2017/09/10 11:01:55.856243, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1179(open_file) Error opening file file-1 (NT_STATUS_NETWORK_BUSY) (local_flags=2626) (flags=2626) [2017/09/10 11:01:55.856286, 1, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3249(open_file_ntcreate) open_file /export/v1:file-1 with flags=0x2 flags2=0xA40 mode=0744, access_mask = 0x120196, open_access_mask = 0x120196 returned NT_STATUS_NETWORK_BUSY [2017/09/10 11:02:00.862790, 10, pid=7301, effective(65534, 65534), real(65534, 0), class=locking] ../source3/locking/locking.c:758(share_mode_stale_pid) PID 6519 (index 0 out of 1) still exists [2017/09/10 11:02:00.862855, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1802(delay_for_oplock) breaking from 7 to 0 [2017/09/10 11:02:00.862946, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1574(send_break_message) Sending break request to PID 6519 [2017/09/10 11:02:00.865682, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2287(defer_open) defer_open: request time [2017/09/10 11:01:55] timeout [2017/09/10 11:02:55] mid [63] delayed_for_oplocks [yes] kernel_oplock [yes] file_id [fd2c:84:0] [2017/09/10 11:02:00.865782, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2309(defer_open) defer_open: defering mid 63 [2017/09/10 11:02:00.865875, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:1729(push_deferred_open_message_smb2) push_deferred_open_message_smb2: timeout at 2017/09/10 11:02:55.835958 [2017/09/10 11:02:00.866334, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3296(open_file_ntcreate) Sent oplock break request to kernel oplock holder [2017/09/10 11:02:00.868842, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5346(create_file_unixpath) create_file_unixpath: NT_STATUS_SHARING_VIOLATION [2017/09/10 11:02:00.869548, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5624(create_file_default) create_file: NT_STATUS_SHARING_VIOLATION [2017/09/10 11:02:05.856380, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:260(remove_oplock) remove_oplock: remove_oplock called for file-1 [2017/09/10 11:02:05.870051, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:227(remove_oplock_under_lock) remove_oplock_under_lock: remove share oplock for file file-1, fnum 4224302802, fd2c:84:0 [2017/09/10 11:02:05.870099, 10, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock_linux.c:167(linux_release_kernel_oplock) linux_release_kernel_oplock: file file-1, file_id = fd2c:84:0 gen_id = 3899747005 has kernel oplock state of 2. [2017/09/10 11:02:05.886035, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:808(process_oplock_break_message) Got oplock break to 0 message from pid 7301: fd2c:84:0/3899747005 [2017/09/10 11:02:05.886092, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:658(initial_break_processing) initial_break_processing: called for fd2c:84:0/3899747005 Current oplocks_open (exclusive = 0, levelII = 0) [2017/09/10 11:02:05.886139, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:690(initial_break_processing) initial_break_processing: file file-1 (file_id = fd2c:84:0 gen_id = 3899747005) has no oplock. Allowing break to succeed regardless. DEFERRED OPEN file-1 [2017/09/10 11:02:05.877719, 10, pid=7301, effective(0, 0), real(0, 0)] ../source3/lib/messages.c:152(messaging_recv_cb) messaging_recv_cb: Received message 0xfa3 len 44 (num_fds:0) from 6519 [2017/09/10 11:02:05.887345, 0, pid=7301, effective(0, 0), real(0, 0)] ../source3/smbd/open.c:2385(defer_open_done) defer_open_done: scheduling mid 63 [2017/09/10 11:02:05.888029, 10, pid=7301, effective(0, 0), real(0, 0)] ../source3/smbd/smb2_create.c:1657(schedule_deferred_open_message_smb2) schedule_deferred_open_message_smb2: re-processing mid 63 [2017/09/10 11:02:05.893839, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5527(create_file_default) create_file: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:02:05.901149, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2972(open_file_ntcreate) open_file_ntcreate: fname=file-1, dos_attrs=0x80 access_mask=0x120196 share_access=0x3 create_disposition = 0x5 create_options=0x40 unix mode=0744 oplock_request=2 private_flags = 0x0 [2017/09/10 11:02:05.901187, 10, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:1562(remove_deferred_open_message_smb2_internal) remove_deferred_open_message_smb2_internal: mid 63 [2017/09/10 11:02:05.901791, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3193(open_file_ntcreate) open_file_ntcreate: file file-1 - O_BLOCK CLOSE file-1 [2017/09/10 11:02:13.401314, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_close.c:227(smbd_smb2_close) smbd_smb2_close: file-1 - fnum 4224302802 [2017/09/10 11:02:13.405565, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/close.c:788(close_normal_file) nobody closed file file-1 (numopen=1) NT_STATUS_OK OPEN file-1 [2017/09/10 11:02:15.706929, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/smb2_create.c:501(smbd_smb2_create_send) smbd_smb2_create: name[file-1] [2017/09/10 11:02:15.708977, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5527(create_file_default) create_file: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 root_dir_fid = 0x0, ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:02:15.709024, 10, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:5000(create_file_unixpath) create_file_unixpath: access_mask = 0x120196 file_attributes = 0x80, share_access = 0x3, create_disposition = 0x5 create_options = 0x40 oplock_request = 0x2 private_flags = 0x0 ea_list = 0x(nil), sd = 0x(nil), fname = file-1 [2017/09/10 11:02:15.716262, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2972(open_file_ntcreate) open_file_ntcreate: fname=file-1, dos_attrs=0x80 access_mask=0x120196 share_access=0x3 create_disposition = 0x5 create_options=0x40 unix mode=0744 oplock_request=2 private_flags = 0x0 [2017/09/10 11:02:15.717056, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3190(open_file_ntcreate) open_file_ntcreate: file file-1 - O_NONBLOCK [2017/09/10 11:02:15.717089, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3239(open_file_ntcreate) calling open_file with flags=0x2 flags2=0xA40 mode=0744, access_mask = 0x120196, open_access_mask = 0x120196 [2017/09/10 11:02:15.723667, 5, pid=6519, effective(65534, 65534), real(65534, 0), class=vfs] ../source3/modules/vfs_default.c:564(vfswrap_open) open(file-1, flags=0x20802, mode=0x1e4) [2017/09/10 11:02:15.723702, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:689(fd_open) fd_open: name file-1, flags = 04002 mode = 0744, fd = 28. [2017/09/10 11:02:15.723759, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:1322(open_file) nobody opened file file-1 read=No write=Yes (numopen=2) SET OPLOCK file-1 [2017/09/10 11:02:15.731063, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock_linux.c:155(linux_set_kernel_oplock) linux_set_kernel_oplock: got kernel oplock on file file-1, file_id = fd2c:84:0 gen_id = 2610068764 [2017/09/10 11:02:15.731179, 0, pid=6519, effective(65534, 65534), real(65534, 0), class=locking] ../source3/smbd/oplock.c:89(set_file_oplock) set_file_oplock: granted oplock 0x3 on file /export/v1:file-1, fd2c:84:0/2610068764, tv_sec = 59b4f187, tv_usec = ad1de [2017/09/10 11:02:15.731224, 0, pid=6519, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:2208(grant_fsp_oplock_type) grant_fsp_oplock_type: oplock type 0x3 on file file-1 CONTINUE DEFERRED OPEN file-1 - BLOCKING OPEN ON A KERNEL OPLOCKED FILE [2017/09/10 11:02:35.902060, 0, pid=7301, effective(65534, 65534), real(65534, 0)] ../source3/smbd/open.c:3239(open_file_ntcreate) calling open_file with flags=0x2 flags2=0x240 mode=0744, access_mask = 0x120196, open_access_mask = 0x120196 OPEN file-0 FROM NFS - blocked open(file-1) returns EINTR: [2017/09/10 11:02:39.051401, 0, pid=7301, effective(65534, 65534), real(65534, 0), class=vfs] ../source3/modules/vfs_default.c:566(vfswrap_open) open(file-1, flags=0x20002, mode=0x1e4): errno=4
Created attachment 13564 [details] Reproduction logs
Yes, I can't see why we don't just always add in O_NONBLOCK by default here. Lev, can you add this and run 'make test' ? Thanks, Jeremy.
(In reply to Jeremy Allison from comment #12) Lev - did you do this change and test it ? I noticed that by opening bug: https://bugzilla.samba.org/show_bug.cgi?id=13058 you're still getting issues with the smbd1 does a break -> close -> open code paths, so did removing the 'first_open_attempt' at least solve this issue ? Getting feedback here from you would be very helpful, as I think the same change may fix both this bug *and* 13058.
(In reply to Jeremy Allison from comment #13) Hello Jeremy, Unfortunately I do not have an environment to run the complete samba test suite (BTW is there some manual describing what is needed for this environment, how to setup it, just running "make test" generated lots of AD related errors). I tried to apply the following simple patch: diff --git a/source3/smbd/open.c b/source3/smbd/open.c index 3368bae..f87a7d2 100644 --- a/source3/smbd/open.c +++ b/source3/smbd/open.c @@ -3171,7 +3171,7 @@ static NTSTATUS open_file_ntcreate(connection_struct *conn, flags2 &= ~(O_CREAT|O_TRUNC); } - if (first_open_attempt && lp_kernel_oplocks(SNUM(conn))) { + if (lp_kernel_oplocks(SNUM(conn))) { /* * With kernel oplocks the open breaking an oplock * blocks until the oplock holder has given up the Notice that delay_for_oplock() still gets first_open_attempt as a parameter. I manually ran few torture tests, but this is definitely not enough. At the same time at customer site I applied the patch above and reverted commit b35a296a27a0807c780f2a9e7af2f2e93feefaa8. I only left panic on kernel oplock break timeout. While samba is the only running server, everything looks fine, no blocking opens, no panics. But once customer also runs docker container on the same share (this is the reason we have to enable kernel oplocks), we get "Kernel oplock holder didn't respond to break message" few times a day. So for some reason oplock is not released even without b35a296a27a0807c780f2a9e7af2f2e93feefaa8. I do not know if it is related to the patch above or not. I'm debugging these panics now. Thanks, -Lev.
Created attachment 13743 [details] git-am *TEST* patch for 4.7.next, 4.6.next. Can you try this patch ? It's a test that patch that may address both this issue and also bug #13058 (I've also posted that patch there).
It seems to me that the proposed fix for this defect is included in the fix for bug #13121. Can we close this defect?
Yes, you're correct. Closing this one - thanks !