Bug 3808 - smbd sending second illegal transact2 response to a single request after file not found error
Summary: smbd sending second illegal transact2 response to a single request after file...
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.23
Hardware: x86 Linux
: P1 regression
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-26 12:33 UTC by Steve French
Modified: 2006-05-30 14:02 UTC (History)
1 user (show)

See Also:


Attachments
pruned ethereal trace (9.81 KB, application/octet-stream)
2006-05-26 12:59 UTC, Dave Kleikamp
no flags Details
level 10 trace of smbd, ethereal trace (128 byte capture size) (943.54 KB, application/octet-stream)
2006-05-29 16:22 UTC, Steve French
no flags Details
samba 3.0.23pre2 level 10 log and ethereal trace (750.30 KB, application/octet-stream)
2006-05-29 18:29 UTC, Steve French
no flags Details
Patch for 3.0 SVN code. (4.33 KB, patch)
2006-05-30 00:08 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve French 2006-05-26 12:33:56 UTC
In testing from the cifs vfs, shaggy and I noticed errors similar to:

CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8040

 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 47e20001 . . . . . . . . . . . . . . � G
 07270064 0000020a d . ' . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8200

We have seen these errors reproducibly on both x86 and ppc64 and on both samba 3.0.22 (SuSE) and on current samba 3 trunk.  We have an ethereal trace of it as well.  The 1st response is always the same length, and always has been the same nt status code, the 2nd always the same length just slightly larger since it is not an error (wct=10, bcc=0).
 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 47ef0001 . . . . . . . . . . . . . . � G
 12850064 0000020a d . . . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8740

 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 47fd0001 . . . . . . . . . . . . . . � G
 1ddd0064 0000020a d . � . .
 CIFS VFS: close with pending writes
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8200

 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 480d0001 . . . . . . . . . . . . . . . H
 24710064 0000020a d . q $ .

CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8040

 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 47e20001 . . . . . . . . . . . . . . � G
 07270064 0000020a d . ' . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8200

 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 47ef0001 . . . . . . . . . . . . . . � G
 12850064 0000020a d . . . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xd07f8740

 0000003a 424d53ff 00000032 c0418000 : . . . � S M B 2 . . . . . A �
 00000000 00000000 00000000 47fd0001 . . . . . . . . . . . . . . � G
 1ddd0064 0000020a d . � . .
 CIFS VFS: close with pending writes
Comment 1 Steve French 2006-05-26 12:41:44 UTC
This does not hurt the cifs vfs much with signing off since the duplicate is discarded, but it could indicate a more serious problem ... but not sure what would happen with signing on but I would expect that this would be a disaster with signing turned on as the second response would probably cause the session to get killed.
Comment 2 Steve French 2006-05-26 12:51:37 UTC
I have an ethereal trace generated by 10 process run of fsstress from a single client (current mainline kernel cifs code, but fails similarly in samba when earlier cli versions run too)  "/usr/libexec/ltp/testcases/bin/fsstress -d /mnt/cifs/fstress  -l 100 -n 100 -p 10 -X"

Unfortunately since it is over 1MB (actually over 20MB) it is too big to provide as attachment to defect
Comment 3 Steve French 2006-05-26 12:56:14 UTC
I have an ethereal trace generated by 10 process run of fsstress from a single client (current mainline kernel cifs code, but fails similarly in samba when earlier cli versions run too)  "/usr/libexec/ltp/testcases/bin/fsstress -d /mnt/cifs/fstress  -l 100 -n 100 -p 10 -X"

Unfortunately since it is over 1MB (actually over 20MB) it is too big to provide as attachment to defect
Comment 4 Dave Kleikamp 2006-05-26 12:59:55 UTC
Created attachment 1916 [details]
pruned ethereal trace

This is the trace file that only contains 3 ranges of packets.  Filter on smb.mid == 1831, 4741 & 7645 for the applicable packets
Comment 5 Dave Kleikamp 2006-05-26 13:04:15 UTC
Should change the version to 3.0.22 - server is running ubuntu Hoary.  I can't change that field.
Comment 6 Volker Lendecke 2006-05-26 16:29:58 UTC
Can this be reproduced against 3.0.23rc1? We've changed the trans, trans2 and nttrans handling since 3.0.22.

Volker
Comment 7 Steve French 2006-05-29 10:17:14 UTC
This fails on trunk and 3.0.22 so almost certainly fails against 3.0.23rc1.

Will extract SAMBA_3_0 branch and check
Comment 8 Steve French 2006-05-29 10:35:10 UTC
Yes - always fails against 3.0.22rc (SAMBA_3_0 branch extracted 30 minutes ago)

In a short run of fstress (mounted loopback, current mainline kernel 2.6.17-rc)

        ./fsstress -d /mnt/stress-test-dir -n 200 -p 20

generated seven duplicates
Comment 9 Volker Lendecke 2006-05-29 13:52:34 UTC
Can you please upload a debug level 10 log of latest 3_0 with this error?

Thanks,

Volker
Comment 10 Jeremy Allison 2006-05-29 14:16:18 UTC
Ok - this is a blocker for 3.0.23. Steve - can you upload an ethereal trace with the request and duplicate replies in it, also a debug level 10. Can you tell us how to reproduce this with tools we have (uploading or putting on samba.org a cifsfs build for SuSE 9.3, 10 or 10.1 would do with instructions to reproduce).
Jeremy.
Comment 11 Steve French 2006-05-29 16:07:50 UTC
OK - will see if level 10 will slow it down too much or not.

Note that I think this would fail with various cifs - but certainly SuSE 10.1 or SLES10 betas should generate it with the fsstress syntax below (fsstress is part of the standard open source testcases in the LTP, Linux Test Project).  Although it generates fewer failures with less load - the amount of traffic is not that bad with 20 procs and number = 2000
Comment 12 Jeremy Allison 2006-05-29 16:20:13 UTC
Steve, you know better than that how to tell us how to reproduce a problem :-).
I take it what you were trying to say was :

"Use SuSE 10.1 and mount a CIFS share using these parameters (missing) to localhost to the same box, then run fsstress using the following parameters (missing) against the mounted share".

Or not ? If so, please tell me (a) what paramters you using on the CIFS mount (b) where to download fsstress and (c) what command line you ran it with.

Jeremy.
Comment 13 Steve French 2006-05-29 16:22:31 UTC
Created attachment 1917 [details]
level 10 trace of smbd, ethereal trace (128 byte capture size)

At least six occurances: 

 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xe9150040

 0000003a 424d53ff 00000032 c0418000 : . . . \uffff S M B 2 . . . . . A
 00000000 00000000 00000000 12050001 . . . . . . . . . . . . . . . .
 81580064 0000020a d . X . .
 CIFS VFS: close with pending writes
 CIFS VFS: close with pending writes
 CIFS VFS: close with pending writes
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xe9150740

 0000003a 424d53ff 00000032 c0418000 : . . . \uffff S M B 2 . . . . . A
 00000000 00000000 00000000 12080001 . . . . . . . . . . . . . . . .
 83760064 0000020a d . v . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xe91503c0

 0000003a 424d53ff 00000032 c0418000 : . . . \uffff S M B 2 . . . . . A
 00000000 00000000 00000000 120c0001 . . . . . . . . . . . . . . . .
 83a20064 0000020a d . \uffff . .
 CIFS VFS: close with pending writes
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xe56a9c80

 0000003a 424d53ff 00000032 c0418000 : . . . \uffff S M B 2 . . . . . A
 00000000 00000000 00000000 12090001 . . . . . . . . . . . . . . . .
 8a3c0064 0000020a d . < . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xe56a9ac0

 0000003a 424d53ff 00000032 c0418000 : . . . \uffff S M B 2 . . . . . A
 00000000 00000000 00000000 12060001 . . . . . . . . . . . . . . . .
 8bd80064 0000020a d . \uffff . .
 CIFS VFS: No task to wake, unknown frame rcvd!
Received Data is: : dump of 37 bytes of data at 0xe9150900

 0000003a 424d53ff 00000032 c0418000 : . . . \uffff S M B 2 . . . . . A
 00000000 00000000 00000000 120d0001 . . . . . . . . . . . . . . . .
 8fce0064 0000020a d . \uffff . .

Note that this was generated with less load (only 10 processes, and count = 150)
./fsstress -d /mnt/stress-test-dir -n 150 -p 10
Comment 14 Steve French 2006-05-29 16:23:18 UTC
One suspicious frame are the two (!) responses to request at frame 5901 - frames 5902 and 5905
Comment 15 Jeremy Allison 2006-05-29 17:14:10 UTC
It's hard to tell what's going on here as the packets are truncated. Do you have full traces with the entire packets ? Interesting this happens after an oplock break request/reply - although I can't see anything in the server code that is related to this.
Jeremy.
Comment 16 Jeremy Allison 2006-05-29 17:17:16 UTC
To be honest I'm suspicious of a client bug here. The server code is very simple to handle trans2 - it doesn't appear possible for a duplicate mid. The fact that it happens against 3.0.22 and 3.0.23RC1 which have *completely* different implementations of trans2 packet handling is also suspicious. If we stuffed it up it's likely we'd have done it in one or the other implementation, not both. Very little of this code is load dependent in the server....
Jeremy.
Comment 17 Steve French 2006-05-29 18:12:36 UTC
This seems pretty cut and dried ... we have at least three ethereal traces some on client some on server confirming that there are two responses to a single request.  Search by mid for the mids which match what the client reports in its dmesg (across the whole ethereal trace) shows the matching triples (request, response, response)
Comment 18 Steve French 2006-05-29 18:29:01 UTC
Created attachment 1918 [details]
samba 3.0.23pre2 level 10 log and ethereal trace

OK - here is the same thing.  Smaller run (fstress -n 120 -p 8 ie only eight processes) but with capture size set larger to 256 bytes.   The duplicates are to requests in frame 1720 (frames 1721 and 1724) and 1675 (frames 1676 and 1679).

This time I traced against Samba Version 3.0.23pre2-SVN-build-15928
Comment 19 Jeremy Allison 2006-05-29 19:41:00 UTC
Ok, the really interesting thing is this only happens after an oplock break request has been sent and the client responds. The server seems to be sending a duplicate mid reply for the response to this....
Let me look closer...
Jeremy.
Comment 20 Jeremy Allison 2006-05-29 20:08:36 UTC
Your log.smbd's don't match the dumps with the mid's. Your smbd logs start after the mid's in question making them useless...
Can you set 
max log size = 0
so they don't get truncated. I'm *very* interested in seeing a debug level 10 log showing the activity over those duplicate mids.
Jeremy.
Comment 21 Jeremy Allison 2006-05-29 20:27:10 UTC
Ok - I see the bug.... very subtle. Delay for oplocks biting us in the behind....
Give me a few hours to fix this.
Jeremy.
Comment 22 Jeremy Allison 2006-05-29 20:55:13 UTC
Explaination of the problem :

client :

open (/foo) - with exclusive oplock. Server returns file handle
setfileinfo (/foo) -> set EOF. - using the *pathname*.

Internally smbd does a second open with the INTERNAL_OPEN_ONLY flag set. This causes an oplock break (should it ? I need to do smbtorture tests)- and returns. The trans2 setfilepathinfo caller doesn't check if the open was deferred and just returns PATH_NOT_FOUND. When the client responds to the oplock break the server reschedules the open from "wait for oplocks" and calls into the trans2 setfilepathinfo code again. This time the set eof call succeeds and the code returns a success code - FOR THE SAME MID that the PATH_NOT_FOUND error was returned on.

Clearly there are 3 possible fixes. 

1). Modify the code so that INTERNAL_OPEN_ONLY calls never defer or break oplocks (this is how it used to work). This may or may not be correct, depending on how W2K3 behaves.
2). Change the INTERNAL_OPEN_ONLY callers so they'll cope with defered opens - this is harder.
3). For qfilepathinfo and setfilepathinfo calls that may call open_file_ntcreate, make them do a search for any handles open on the same pathname and re-use the handle and turn it into a handle based call. Only smbtorture will tell if this is correct.

Volker (or others) please don't fix this if I don't get to the fix before May 30th (tomorrow my time) as I'm pretty confident in working through this and making it robust - I just might need a little time (and some Samba4 torture code).

Jeremy.

Comment 23 Steve French 2006-05-29 21:07:07 UTC
Yes - that explanation makes sense.
Comment 24 Steve French 2006-05-29 21:10:16 UTC
Yes - that explanation makes sense.  I hope option three works out - but in general, anything that reduces oplock breaks is preferred from my perspective (no sense killing performance on heavy use single client unless absolutely required).
Comment 25 Jeremy Allison 2006-05-29 21:13:00 UTC
Well you already know if you have the pathname you're given open with appropriate permissions. Why doesn't the client turn this into a handle based call (or will this violate encapsulation) ?

To be honest option 3 is unlikely, as it's probably not what W2K3 does as this might violate encapsulation on the server too.

Damn you Steve :-). A client bug would have been much easier :-) :-).

Jeremy.
Comment 26 Steve French 2006-05-29 21:40:29 UTC
Remember that Linux kernel has the strange idea of defining setattr (e.g. setting the file size) and getattr as (inode based) and not being handle based even if the operation that triggerred it was.

The difficulty with changing setpathinfo to setfileinfo is not just finding a writeable file handle with the correct permissions (I hope generic write is good enough) but also is that I can't guarantee how long it would stay open - although I can probably play tricks and refuse to close the file while a setfileinfo is pending.  It may be doable to change a few cases to setfileinfo but not all



 
Comment 27 Jeremy Allison 2006-05-29 23:38:11 UTC
I now have a patch for 3.0 that is a combination of techniques (1) and (2).
I'd like to remove INTERNAL_OPEN_ONLY but that may have to wait a little, it still has valid uses in the printing code. The copy_file code still uses it but probably needs converting to deferred mode.
The bad news for you (Steve) is that W2K3 will break oplocks on set-eof and set-allocation on a pathname - so I'll have to do the same. Sucks to be a client I guess :-). I'll post the patch once I've tested it.
Jeremy.
Comment 28 Jeremy Allison 2006-05-30 00:08:10 UTC
Created attachment 1919 [details]
Patch for 3.0 SVN code.

Patch for bug #3308 to stop us returning duplicate
mid replies on path based set-eof trans2 calls.
Needs modification for HEAD (as in head open_file_ntcreateX
properly returns NTSTATUS - I'll fix this tomorrow my
time). Secondly it still fails the Samba4 RAW-OPLOCK
smbtorture because of an interesting case. Our oplock
code always returns "break to level 2" if it can.
In this case (path-based set-eof or set-allocation size
on an exclusive oplocked file) W2K3 always sends a 
break-to-none. We send the break to none (from level2)
after we've done the write for eof or allocation size.
I need to work out some way of telling our break code
to always break to none (might need to extend the message
field).

Steve, this code has been committed to SAMBA_3_0 (but not
HEAD) for you to test against.

Jeremy.
Comment 29 Steve French 2006-05-30 09:43:30 UTC
Fix seems to work.  Tried as many as 30 processes with fsstress. No duplicate responses logged. May even get fewer "close with pending writes" cases although that may be coincidental not a sideffect of the change (cifs tries not to request to close a file while a write is in flight on it)
Comment 30 Steve French 2006-05-30 09:56:29 UTC
interesting case - client has exclusive oplock but no write permission to the file that is why we see the setpathinfo. cifs_setattr already had code to try to avoid oplock break by doing setfileinfo (rather than setpathinfo) of the file size but in this case the file is open GENERIC_READ so find_writeable_file routine can not find a handle to use - even though we have the exclusive oplock.  It might be possible to defer the setpathinfo in this kind of case but risky.
Comment 31 Jeremy Allison 2006-05-30 14:02:47 UTC
Should be fixed in SAMBA_3_0 and HEAD.
Jeremy.