Bug 11173 - SMB1 Server disconnect can cause timeout on client write error.
Summary: SMB1 Server disconnect can cause timeout on client write error.
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: libsmbclient (show other bugs)
Version: 4.2.0rc4
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Blocks: 11079
  Show dependency treegraph
Reported: 2015-03-19 17:43 UTC by Jeremy Allison
Modified: 2015-06-15 14:54 UTC (History)
2 users (show)

See Also:

git-am fix for master. (1.20 KB, patch)
2015-03-19 17:45 UTC, Jeremy Allison
no flags Details
git-am fix for 4.2.next, 4.1.next. (1.44 KB, patch)
2015-03-20 16:45 UTC, Jeremy Allison
obnox: review+
ddiss: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2015-03-19 17:43:04 UTC
This is a pretty interesting one !

In the SMB1 case, if a writev fails we notive inside libcli/smb/smbXcli_base.c:smb1cli_req_writev_done() when 

nwritten = writev_recv(subreq, &err);

comes back as -1. We then call:

        if (nwritten == -1) {
                NTSTATUS status = map_nt_error_from_unix_common(err);
                smbXcli_conn_disconnect(state->conn, status);

smbXcli_conn_disconnect() calls tevent_req_nterror(req, status) on all the reqs on the pending queue.

However, in the SMB1 case - because we can have one way writes, we don't add the req onto the pending queue until *after* the writev_recv() completed *successfully*.

We can see this in the original code for smb1cli_req_writev_done(), which looks like:

static void smb1cli_req_writev_done(struct tevent_req *subreq)
        struct tevent_req *req =
                struct tevent_req);
        struct smbXcli_req_state *state =
                struct smbXcli_req_state);
        ssize_t nwritten;
        int err;

        nwritten = writev_recv(subreq, &err);
        if (nwritten == -1) {
                NTSTATUS status = map_nt_error_from_unix_common(err);
                smbXcli_conn_disconnect(state->conn, status);

        if (state->one_way) {
                state->inbuf = NULL;

        if (!smbXcli_req_set_pending(req)) {
                tevent_req_nterror(req, NT_STATUS_NO_MEMORY);

Note how smbXcli_req_set_pending(req) isn't called until *after* we would have returned early if nwritten == -1. So the call to smbXcli_conn_disconnect() in this case doesn't cancel the current outstanding tevent req, and so it's left on the event queue until it times out with the default timeout of 20 seconds.

The solution is a patch which adds a:

tevent_req_nterror(req, status);

just after the call to smbXcli_conn_disconnect(state->conn, status) and so correctly cancels the current request.

Note this isn't needed in any other calls that use smbXcli_conn_disconnect() on error, as they are either SMB1 reads (which by definition already have the original sender on the pending queue) or are SMB2 requests - and in the SMB2 code path we add the tevent req's to the pending queue *before* we do the writev_send - there are no one-way sends in SMB2.

Patch has been tested in bug 11079, which is where this behavior was first seen.

Comment 1 Jeremy Allison 2015-03-19 17:45:29 UTC
Created attachment 10892 [details]
git-am fix for master.
Comment 2 Jeremy Allison 2015-03-19 17:51:58 UTC
Comment on attachment 10892 [details]
git-am fix for master.

FYI: Fix confirmed by reporter in bug report:

Comment 3 Jeremy Allison 2015-03-20 16:45:56 UTC
Created attachment 10897 [details]
git-am fix for 4.2.next, 4.1.next.

Cherry-pick from master. Applies cleanly to 4.2.next, 4.1.next.
Comment 4 David Disseldorp 2015-03-20 16:58:06 UTC
@Karolin, please merge to the maintenance branches.
Comment 5 Karolin Seeger 2015-03-23 20:19:22 UTC
Pushed to autobuild-v4-[1|2]-test.
Comment 6 Karolin Seeger 2015-03-27 20:14:33 UTC
(In reply to Karolin Seeger from comment #5)
Pushed to bith branches.
Closing out bug report.