Bug 13121 - Non-smbd processes using kernel oplocks can hang smbd.
Summary: Non-smbd processes using kernel oplocks can hang smbd.
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on: 13058
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-06 18:29 UTC by Jeremy Allison
Modified: 2020-12-11 08:25 UTC (History)
3 users (show)

See Also:


Attachments
git-am fix for master. (13.46 KB, patch)
2017-11-09 21:28 UTC, Jeremy Allison
slow: review+
Details
git-am fix for 4.7.next. (13.85 KB, patch)
2017-11-13 19:54 UTC, Jeremy Allison
slow: review+
Details
git-am fix for 4.6.next. (13.89 KB, patch)
2017-11-13 19:57 UTC, Jeremy Allison
slow: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2017-11-06 18:29:02 UTC
Currently smbd tries a non-blocking open for a kernel oplock if there is no smbd open record the *first* time it gets a EWOULDBLOCK error from the kernel.

Subsequent opens try non-blocking. This is incorrect. Without inter-process communication from the non-smbd process the best we can do is to try non-blocking opens every <poll> time-periods (where my heuristic is poll=1 second) until either the break timeout period expires of we get success.

Patch to follow.
Comment 1 Jeremy Allison 2017-11-06 18:30:11 UTC
Arg. The above comment should say:

"Subsequent opens try a blocking open"

of course.
Comment 2 Jeremy Allison 2017-11-09 21:28:40 UTC
Created attachment 13763 [details]
git-am fix for master.

Includes regression test case. NB. The patch attached to bug:

https://bugzilla.samba.org/show_bug.cgi?id=13058

must be applied first.
Comment 3 Ralph Böhme 2017-11-10 21:55:33 UTC
Generally looks good. This is not a formal review yet, I'd like to take a closer look tomorrow.

One thing that itches me is: anyone actually bothered testing whether we call poll that damn oplocked fd?
Comment 4 Jeremy Allison 2017-11-10 22:03:39 UTC
My regression test patch tests that case doesn't it ? If you run at debug level 10 you can certainly see the smbd polling for an allowed open every 1 second. Or am I not understanding your test case ?
Comment 5 Ralph Böhme 2017-11-10 22:11:12 UTC
(In reply to Jeremy Allison from comment #4)
poll as in poll()? I don't see that in the test, but maybe I'm missing something. I'll take a closer look tomorrow.
Comment 6 Jeremy Allison 2017-11-10 22:15:13 UTC
I don't understand. Who would call poll() ? The test code forks, then the child process takes the oplock locally and just waits for a signal from the kernel using pause(). The parent client process uses smb2 calls to drive the smbd server to request an open on the oplocked file. That then causes pause() to return (got a signal) and we check it was a RT_SIGNAL meaning the oplock was requested to be broken. The child process waits 3 seconds to allow the parent to check that the requesting smbd isn't blocked by doing another SMB2_CREATE request, then just removes the lease and exits.

Who needs to poll() ?
Comment 7 Jeremy Allison 2017-11-10 22:30:39 UTC
Are you confused by my initial comment where I say:

"the best we can do is to try non-blocking opens every <poll> time-periods (where my heuristic is poll=1 second)" ?

I don't mean poll() the system call here, but poll in the generic English "polling for a change" sense.

And yeah, I tested that the smbd does do the check every 1 second whilst the non-smbd process has the file oplocked. I did this by doing a debug level 10 and counting the number of:

DBG_DEBUG("kernel_oplock_poll_open_timer fired. Retying open !\n");

message I got from the timer firing. And yeah, it was correct :-).
Comment 8 Ralph Böhme 2017-11-10 22:39:29 UTC
I meant what I asked: can we poll() a file fd such that when poll() indicates readability the kernel oplock was released by the other process. That would give us a signalling mechanism and we don't have to do a sleep/retry dance.

This is not about the test, if that worked we could use it in smbd. I guess it does not work, but I was wondering if anyone ever actually tested this.
Comment 9 Jeremy Allison 2017-11-10 22:42:02 UTC
Sadly no, I don't believe that works. I think a periodic wakeup and retry is the best we can do. I'm happy to write test code and check though once I'm dug out of my current work hole :-).

In the meantime, I still think my patch is good :-).
Comment 10 Jeremy Allison 2017-11-10 23:29:22 UTC
(In reply to Jeremy Allison from comment #9)

Thinking about it some, that *can't* work. Remember, we don't even get an fd to poll on until the open completes. We just get -1, EWOULDBLOCK back when trying to open the leased file.

So yep, doing a 1 second check is the best we can do without cooperation from the holding process.

Cheer up, I've done worse things in Samba in the past :-).
Comment 11 Ralph Böhme 2017-11-11 14:16:29 UTC
(In reply to Jeremy Allison from comment #10)
Wuahahaha, I shouldn't come up with clever ideas late at night! :)))
Comment 12 Ralph Böhme 2017-11-11 14:43:26 UTC
Comment on attachment 13763 [details]
git-am fix for master.

Lgtm & pushed.
Comment 13 Jeremy Allison 2017-11-13 19:54:09 UTC
Created attachment 13775 [details]
git-am fix for 4.7.next.

Cherry-picked from master.
Comment 14 Jeremy Allison 2017-11-13 19:57:46 UTC
Created attachment 13776 [details]
git-am fix for 4.6.next.

Back-port from master.
Comment 15 Ralph Böhme 2017-11-13 20:00:46 UTC
Reassigning to Karolin for inclusion in 4.6 and 4.7.
Comment 16 Karolin Seeger 2017-11-14 11:47:19 UTC
(In reply to Ralph Böhme from comment #15)
Pushed to autobuild-v4-{7,6}-test.
Comment 17 Karolin Seeger 2017-11-15 11:57:25 UTC
(In reply to Karolin Seeger from comment #16)
Pushed to both branches.
Closing out bug report.

Thanks!