Bug 11224 - Bug in cancelling of compound requests that want to go async but aren't allowed
Summary: Bug in cancelling of compound requests that want to go async but aren't allowed
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-18 12:33 UTC by Ralph Böhme
Modified: 2015-06-16 13:45 UTC (History)
4 users (show)

See Also:


Attachments
Patch for master (2.77 KB, patch)
2015-04-18 12:33 UTC, Ralph Böhme
slow: review? (metze)
slow: review? (jra)
Details
Capture showing the compound request and hang (6.71 KB, application/octet-stream)
2015-04-18 12:36 UTC, Ralph Böhme
no flags Details
git-am back-port for 4.2.next, 4.1.next. (3.17 KB, patch)
2015-04-21 23:27 UTC, Jeremy Allison
slow: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ralph Böhme 2015-04-18 12:33:46 UTC
Created attachment 10965 [details]
Patch for master

I think we have a subtle bug in our handling of compound requests where one of the requests is a read or write that wants to go async. It's caused by a OS X client sending a compound request containing a read or write where the read or write is not the last operation in the chain. This is normally seen in a request by a Mac to create or read a Minshall/French symlink:

smb2 compound request chain: create/getinfo/write/close

In this case, according to [MS-SMB2] section 3.3.5.2.7, footnote 206 we must fail the request with STATUS_INTERNAL_ERROR in case the write requires async processing.

It seems on a normal, that is non-clustered, fileserver the write will usually not require async processing thus the bug is never triggered. I saw this bug on a clustered Samba server, it can be triggered by adding a delay to asys_pwrite_do():

--- a/source3/lib/asys/asys.c
+++ b/source3/lib/asys/asys.c
@@ -189,6 +189,7 @@ static void asys_pwrite_do(void *private_data)
        struct asys_job *job = (struct asys_job *)private_data;
        struct asys_pwrite_args *args = &job->args.pwrite_args;
 
+       sleep(1);
        job->ret = pwrite(args->fildes, args->buf, args->nbyte, args->offset);
        if (job->ret == -1) {
                job->err = errno;

This ensures the aio request is not already finished at the beginning of smbd_smb2_request_pending_queue().

In smbd_smb2_request_pending_queue() the request is then cancelled, but we miss to call tevent_req_error on the smb2 subreq, as a result the server hangs.

Attached patch fixes this for me.

Thoughts?
Comment 1 Ralph Böhme 2015-04-18 12:36:39 UTC
Created attachment 10966 [details]
Capture showing the compound request and hang
Comment 2 Jeremy Allison 2015-04-21 19:38:14 UTC
Oh, that's a *really* smart catch Ralph - well done !

I'll push to master and when it's in we'll get it into 4.2.next, 4.1.next.
Comment 3 Jeremy Allison 2015-04-21 23:27:31 UTC
Created attachment 10976 [details]
git-am back-port for 4.2.next, 4.1.next.

Cherry-picked from the commit that went into master, + bug url in comment.
Comment 4 Jeremy Allison 2015-04-22 17:47:48 UTC
Re-assigning to Karolin for inclusion in 4.2.next, 4.1.next.
Comment 5 Karolin Seeger 2015-04-27 19:52:08 UTC
Pushed to autobuild-v4-[1|2]-test.
Comment 6 Karolin Seeger 2015-05-04 18:42:16 UTC
Pushed to both branches.
Closing out bug report.

Thanks!