Hi, when appending text to a file, Samba duplicates the content of the file before it appends the new text. client% echo 1 > test; echo 2 >> test; cat test; echo +++; echo 3 >> test; cat -A test 1 2 +++ 1$ 2$ 3$ ^@^@^@^@ server% cat -A test 1$ 2$ 1$ 2$ 3$ Running the same test on the server gives: server% echo 1 > test; echo 2 >> test; cat test; echo +++; echo 3 >> test; cat -A test 1 2 +++ 1$ 2$ 3$ I'm using Samba 3.4.2 from Debian. The filesystem on the server is ext3. The OS of the server is Linux 2.6.30 and the OS of the client is Linux 2.6.31-rc5. Regards, Jörg
Created attachment 4957 [details] Dump of the network traffic while reproducing this bug
From that network trace I don't see what Samba does wrong. After exactly those actions, what is the file contents on the server (log in with ssh, don't look via smbfs/cifs please!) Volker
Ah, you've shown that already, right? Can you create a new file and strace the smbd while doing that? Thanks, Volker
Created attachment 4958 [details] Log with log level 10 while reproducing this bug
Created attachment 4959 [details] Strace of the smbd process while reproducing this bug
Thanks a lot! It's very likely it's the O_APPEND that kills us. Need to filter that out, I need to find the right spot. Give me some minutes. Volker
Comment on attachment 4959 [details] Strace of the smbd process while reproducing this bug I think interesting in this strace is that the appending of 3 causes the appending of the whole content: % grep -E '^(pwr|open.*WR)' samba.st open("joerg/test", O_WRONLY|O_CREAT|O_LARGEFILE|O_NOFOLLOW, 0644) = 30 pwrite64(30, "1\n"..., 2, 0) = 2 open("joerg/test", O_WRONLY|O_APPEND|O_LARGEFILE|O_NOFOLLOW) = 30 pwrite64(30, "2\n"..., 2, 2) = 2 open("joerg/test", O_WRONLY|O_APPEND|O_LARGEFILE|O_NOFOLLOW) = 30 pwrite64(30, "1\n2\n3\n"..., 6, 0) = 6
Created attachment 4960 [details] Patch Can you try the attached patch? Thanks, Volker
I've started the build with the patch. I'll report tomorrow if it fixed the problem.
wireshark 1.3.1 doesn't decode the set_path_info posix open which makes this harder to read, but presumably O_APPEND passed in on smb open?
Maybe. There's another, internal variant of this bug (57908 for reference) where the client asks for GENERIC_WRITE in the ntcreate. This together with POSIX_SEMANTICS in the open flags leads to smbd using O_APPEND in its open(2) syscall which then leads to the same kind of data corruption, coming from frame 36 in the sniff. Volker
Created attachment 4962 [details] kernel patch -- don't send SMB_O_APPEND on O_APPEND opens This patch is untested... Yes cifs does send SMB_O_APPEND when the file is opened with O_APPEND. It probably shouldn't for the reasons detailed in this patch description. At some point we should probably define what the offset in an SMB write call means in conjunction with a SMB_O_APPEND open, but for now it's probably best to just avoid using that flag altogether. Steve, thoughts?
Volker pointed out in comment #11 that GENERIC_WRITE also means an implicit O_APPEND too. We'll have to fix cifs to use a combination of flags too. It's just as well though since the open flag conversion routines are a real mess anyway.
(In reply to comment #12) Jeff, then what distinguishes between (posix) open of a file with and without O_APPEND flag?
(In reply to comment #14) > (In reply to comment #12) > > Jeff, then what distinguishes between (posix) open of a file with and without > O_APPEND flag? > If I understand your question correctly, the answer is "nothing". Appends in this case would be entirely handled by the client writing to the correct offset rather than expecting the server to append writes to the file. I think trying to pass O_APPEND to the server is going to be *really* difficult with buffered writes. Suppose we do this: Open a file that's currently 1k long with O_APPEND and append 1K of data to it via buffered writes. The pagecache now has this data appended. How do we ensure that when we go to flush this page to the server, that we just append the data that was written and *don't* try to write out the existing data that was in the file? On a related note, how would we handle seeks here? I don't think we have anything like a SMB seek call, do we? I think the only thing we can do for the time being is just try to make sure that we never ask the server to open in append mode.
So, if an app does not open a file with O_APPEND, cifs should send appropriate flags (e.g. instead of GENERIC_WRITE, split it and send a collection of rights sans FILE_APPEND_DATA). But if an app opens a file with O_APPEND, cifs clients needs to maintain two offsets, one which is used to write data to its own pages on the client and other implicit/explicit offset as pagecache flushes those pages via routines like cifs_writepages out to the server? Is it possible that pagecache could decide to evict some random dirty pages that belong to this file?
(In reply to comment #16) > So, if an app does not open a file with O_APPEND, cifs should send > appropriate flags (e.g. instead of GENERIC_WRITE, split it and send > a collection of rights sans FILE_APPEND_DATA). Correct. > But if an app opens a file with O_APPEND, cifs clients needs to maintain > two offsets, one which is used to write data to its own pages on the client > and other implicit/explicit offset as pagecache flushes those pages > via routines like cifs_writepages out to the server? > Is it possible that pagecache could decide to evict some random dirty pages > that belong to this file? > I think this is more complex than it needs to be. O_APPEND is generally just dealt with on open, at least with network filesystems. Unlike with local filesystems, we can never guarantee that the page and attribute caches are consistent with the actual state of the file on the server. I think basically just ignoring O_APPEND within CIFS is the best thing for now. The only thing we probably need to do is make sure that we revalidate the file size when we see that O_APPEND Is present. The VFS will take care of making sure that the r/w offset of the filp is set correctly based on the file size.
Actually there is an SMBleek call, but it's essentially meaningless. Currently, when the server gets O_APPEND from a posix client it will open the file with O_APPEND. Which means all offsets in further write calls are ignored. This means you can actually do a race-free append (unlike NFS), but if it's causing the clients so much trouble maybe we should just filter O_APPEND out permanently from the server as Volker's patch does. Jeremy.
Really? I still think we should keep this in the server for future, smarter clients. Volker
Another comment: The subtle problem with GENERIC_WRITE and POSIX_SEMANTICS only was caused by piggybacking posix open on top of an already overloaded call (createfile). This should never have happened in the first place. In the trans2 posix open it's much easier to handle. Volker
(In reply to comment #18) > Actually there is an SMBleek call, but it's essentially meaningless. > > Currently, when the server gets O_APPEND from a posix client it will open the > file with O_APPEND. Which means all offsets in further write calls are ignored. > > This means you can actually do a race-free append (unlike NFS), but if it's > causing the clients so much trouble maybe we should just filter O_APPEND out > permanently from the server as Volker's patch does. Yes...given that the protocol is stateful you can do a race-free append here. The problem however is how to manage that with buffered writes and the pagecache. It's this last part that introduces a lot of complexity. I'm not at all sure how to deal with that in a race-free way.
True, this is probably a real nightmare to get right. But I would like to keep this element in the protocol to enable it when a smart client comes around the corner. In particular as there's an easy fix: Just don't ask for APPEND when opening a file if you (for good reason) can't handle it. Volker
I can understand why sending O_APPEND explicitly with a file offset is confusing (for posix open) - but is there a reason why GENERIC_WRITE on NTCreateX must imply append? If we filter O_APPEND out of flags we send on the client on (posix) open for the buffered case (since it is hard to get right in that case presumably) - for the directio case could O_APPEND make sense? Is their an analogous case in NFSv4 where both the file offset and O_APPEND are sent?
GENERIC_WRITE includes the APPEND right by definition of MS. The fact that this leads to O_APPEND in the smbd open(2) call is unfortunate, but a direct consequence of trying to piggyback posix semantics on top of an already overloaded createfile call. Volker
This is the part I don't quite understand... Just because we asked for permission to append to the file doesn't mean that we want to open it with O_APPEND. So why is samba doing that here? We'll obviously need to fix this in the client too to account for older servers, but I wonder whether the server side needs to be fixed too? On another note, 2.6.32 will be shipping soon. If we want this patch to make it we probably need to act fast.
Because when defining the unix extensions, apparently someone decided to do so. It must have been between Jeremy, Steve and others defining that protocol detail. Please ask Steve about the history of this, I am certain he can answer that. There is just no other bit in the already fixed set of bits in the SMB createfile call. Where would you have put it? That is what I mean that createfile should have never been overloaded with the posix semantics. Volker
I do not see this problem with older Samba Version 3.0.28-0.el5.8. The contents of the file on the server are 1 2 3 And smb.conf has really simple stanza [smbt] path = /tmp/smb_t browseable = Yes read only = No guest ok = Yes writable = yes and nothing special in [global] section as well. cifs client is 1.61 Not sure if this version of Samba had posix semantics over NTCreateAndX.
(In reply to comment #8) > Created an attachment (id=4960) [details] > Patch > > Can you try the attached patch? I've applied this patch to Samba 3.4.3 and this fixed my problem. The content of the file stays as expected. Thanks.
Ok, here is what MSDN says about FILE_APPEND_DATA: http://msdn.microsoft.com/en-us/library/ms804358.aspx "If the caller sets only the FILE_APPEND_DATA and SYNCHRONIZE flags, it can write only to the end of the file, and any offset information about write operations to the file is ignored. The file will automatically be extended as necessary for this type of operation." So Windows will do O_APPEND in certain cases. The problem isn't in the specification of the POSIX on the wire flags - there is a separate SMB_O_APPEND flag for this. Actually, in the 3.4.x and 3.3.x code we are safe from this problem occurring with a NtCreateX call - there is no codepath in these branches that ever sets FILE_FLAG_POSIX_SEMANTICS for a client issued Windows open - only on a client issued POSIX open. So doing NTCreateX or any other Windows open call from the client even when POSIX pathnames has been selected will be safe from doing O_APPEND. Volker - in comment 11 you say: "Maybe. There's another, internal variant of this bug (57908 for reference) where the client asks for GENERIC_WRITE in the ntcreate. This together with POSIX_SEMANTICS in the open flags leads to smbd using O_APPEND in its open(2) syscall which then leads to the same kind of data corruption, coming from frame 36 in the sniff." I don't have access to that internal bug report. What version of Samba is that using ? I currently can't see a way an NTCreateX call can end up with FILE_FLAG_POSIX_SEMANTICS set. Jeremy.
(In reply to comment #29) > I don't have access to that internal bug report. What version of Samba is that > using ? /usr/sbin/smbd -V Version 3.4.2-ctdb-12
>> Just because we asked for permission to append to the file doesn't mean that we >> want to open it with O_APPEND > when defining the unix extensions, apparently someone decided to do so. > It must have been between Jeremy, Steve and others defining that protocol > detail. Please ask Steve about the history of this In defining the Unix and POSIX extensions AFAIK O_APPEND did not come up, and certainly not GENERIC_WRITE as a proxy for the O_APPEND flag. For the O_APPEND case, the clients revalidate file size and write to the corresponding end of file. As Jeremy noted, asking for FILE_APPEND_DATA but not asking for write permission could be used to indicate O_APPEND (in Windows), but I don't know of unix clients which do that (in Linux the VFS's page cache gets in the way so would be hard to do this unless we bypass the page cache for files open for append). GENERIC_WRITE, since it includes write, not just append, permission was not meant to indicate append only. In NTCreateX we did think that the posix flag would be useful to indicate that we wanted posix semantics (in particular posix pathnames) but I don't remember any discussions about the posix flag changing the meaning for append on NTCreateX.
What I'm seeing in that logfile: reply_ntcreate_and_X: flags = 0x2, access_mask = 0xc0000000 file_attributes = 0x1000080, share_access = 0x7, create_disposition = 0x5 create_options = 0x40 root_dir_fid = 0x0, fname = snlrk51_1/cifs/cifs4/test3/f8.blt access_mask = 0xc0000000 to me means SEC_GENERIC_READ|SEC_GENERIC_WRITE. When looking at file_generic_mapping, SEC_GENERIC_WRITE will lead to FILE_GENERIC_WRITE which implies FILE_APPEND_DATA. file_attributes = 0x1000080 to me means FILE_ATTRIBUTE_NORMAL|FILE_FLAG_POSIX_SEMANTICS. To me it also seems that file_attributes end up in open_file_ntcreate as new_dos_attributes. Looking at line open.c:1506, FILE_FLAG_POSIX_SEMANTICS in new_dos_attributes leads to posix_open=True. Then looking at line 1738ff this together with FILE_APPEND_DATA in access_mask (coming in from FILE_GENERIC_WRITE) leads to O_APPEND in flags2. I might be wrong, but this is what I think is happening. The simple fix for this would be not to use SEC_GENERIC_WRITE but the translation just omitting FILE_APPEND_DATA. Volker
Oh - is the FILE_FLAG_POSIX_SEMANTICS coming in *OFF THE WIRE* !!! That explains it. Ok, I think the correct server fix is to AND off FILE_FLAG_POSIX_SEMANTICS from all Windows open calls that send 32-bit attributes. No Windows open should be able to set that flag - it should only get set in the posix_open, posix_mkdir calls. Thanks for clarifying. Jeremy.
And of course the client fix for older servers is to never set the FILE_FLAG_POSIX_SEMANTICS bit on any Windows create calls - that avoids the FILE_APPEND_DATA -> O_APPEND problem. Jeremy.
Created attachment 4982 [details] Server patch. I think this is the correct server patch - keeps O_APPEND for POSIX clients but prevents any Windows opens from setting it - even if they're using GENERIC_WRITE access on open request. Volker please confirm. Jeremy.
Created attachment 4983 [details] git-am format patch for 3.4.4.
Created attachment 4984 [details] git-am format patch for 3.3.10.
Created attachment 4985 [details] git-am format patch for 3.2.x One for the (obsolete) 3.2.x patches page ? Jeremy.
Comment on attachment 4982 [details] Server patch. I disagree. Why is it more difficult for the client to not send something that it can not swallow than for the server to filter it out, forever blocking correct clients?
This all comes down to what POSIX_SEMANTICS is supposed to indicate. The SNIA spec says: ----------[snip]----------- Indicates that the file is to be accessed according to POSIX rules. This includes allowing multiple files with names differing only in case, for file systems that support such naming. (Use care when using this option because files created with this flag may not be accessible by applications written for MS-DOS, Windows 3.x, or Windows NT.) ----------[snip]----------- ...which seems to indicate that it's just intended to affect how the server parses pathnames. I haven't found any references to the idea that server should be treating the permission flags as open intent flags when POSIX_SEMANTICS is set (then again, I haven't looked for that info either). If that's the case, should samba be changing how it deals with flags when doing an open with the POSIX_SEMANTICS bit set? Note that I'm OK with fixing the client clearing this bit, but I'd like to understand why you think that samba's behavior is correct here.
It's essentially over. 3.5 will not respect POSIX_SEMANTICS on ntcreate at all anymore. Volker
Volker is correct - we need to allow FILE_FLAG_POSIX_SEMANTICS to change the pathname processing, but not the append behavior. Updated patches to follow. Jeremy.
Created attachment 4987 [details] git-am format patch for 3.4.4.
Comment on attachment 4984 [details] git-am format patch for 3.3.10. This patch isn't needed for 3.3.10 - it NtCreateX and the NTTrans varient already use a different open code path (create_file()) which strips off the FILE_FLAG_POSIX_SEMANTICS from file_attributes. The posix open and mkdir paths use a different call (open_file_ntcreate()) which doesn't remove FILE_FLAG_POSIX_SEMANTICS. Jeremy.
Comment on attachment 4985 [details] git-am format patch for 3.2.x This patch isn't needed for 3.2.x - NtCreateX and the NTTrans varient already use a different open code path (create_file()) which strips off the FILE_FLAG_POSIX_SEMANTICS from file_attributes. The posix open and mkdir paths use a different call (open_file_ntcreate()) which doesn't remove FILE_FLAG_POSIX_SEMANTICS. Jeremy.
Ok I think the only patch needed is the git-am patch for 3.4.4 in comment #43. Jeremy.
Created attachment 4990 [details] Simplified version of the 3.4.4 patch. This is the more simple version of the patch (removed the trans2.c changes as they are not needed). This restores the behavior of 3.3.x and 3.2.x of masking off the FILE_FLAG_POSIX_SEMANTICS in NtCreateX and NTTrans-CreateX once the pathname has been processed. Jeremy.
http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=4ff1c3f3f0cfa96e88c4bfef06897d95549ee1d9 checked into cifs-2.6.git (will request upstream merge soon if no objections) so will fix posix open on 2.6.29 through 2.6.31 to not set this flag (O_APPEND) - eventually can add it back in for the non-cached case ("forcedirectio" e.g.)
Comment on attachment 4990 [details] Simplified version of the 3.4.4 patch. Tested against POSIX-APPEND, works. Thanks! Volker
(In reply to comment #49) > (From update of attachment 4990 [details]) > Tested against POSIX-APPEND, works. Thanks! > > Volker > Pushed to v3-4-test. Closing out bug report. Thanks!