Bug 6898 - Samba duplicates file content on appending
Summary: Samba duplicates file content on appending
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.4.2
Hardware: x86 Linux
: P3 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-17 15:31 UTC by Jörg Sommer (mail address dead)
Modified: 2015-03-26 21:31 UTC (History)
5 users (show)

See Also:


Attachments
Dump of the network traffic while reproducing this bug (9.02 KB, application/octet-stream)
2009-11-17 15:33 UTC, Jörg Sommer (mail address dead)
no flags Details
Log with log level 10 while reproducing this bug (187.00 KB, text/plain)
2009-11-17 15:41 UTC, Jörg Sommer (mail address dead)
no flags Details
Strace of the smbd process while reproducing this bug (21.49 KB, text/plain)
2009-11-17 15:45 UTC, Jörg Sommer (mail address dead)
no flags Details
Patch (782 bytes, text/x-diff)
2009-11-17 16:00 UTC, Volker Lendecke
no flags Details
kernel patch -- don't send SMB_O_APPEND on O_APPEND opens (1.12 KB, patch)
2009-11-18 05:42 UTC, Jeff Layton
no flags Details
Server patch. (839 bytes, patch)
2009-11-23 11:44 UTC, Jeremy Allison
vl: review-
Details
git-am format patch for 3.4.4. (1.46 KB, patch)
2009-11-23 12:16 UTC, Jeremy Allison
no flags Details
git-am format patch for 3.3.10. (1.59 KB, patch)
2009-11-23 12:28 UTC, Jeremy Allison
no flags Details
git-am format patch for 3.2.x (1.59 KB, patch)
2009-11-23 12:30 UTC, Jeremy Allison
no flags Details
git-am format patch for 3.4.4. (5.72 KB, patch)
2009-11-23 19:08 UTC, Jeremy Allison
no flags Details
Simplified version of the 3.4.4 patch. (3.84 KB, patch)
2009-11-23 21:22 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Sommer (mail address dead) 2009-11-17 15:31:24 UTC
Hi,

when appending text to a file, Samba duplicates the content of the file before it appends the new text.

client% echo 1 > test; echo 2 >> test; cat test; echo +++; echo 3 >> test; cat -A test
1
2
+++
1$
2$
3$
^@^@^@^@

server% cat -A test
1$
2$
1$
2$
3$

Running the same test on the server gives:
server% echo 1 > test; echo 2 >> test; cat test; echo +++; echo 3 >> test; cat -A test
1
2
+++
1$
2$
3$

I'm using Samba 3.4.2 from Debian. The filesystem on the server is ext3. The OS of the server is Linux 2.6.30 and the OS of the client is Linux 2.6.31-rc5.

Regards, Jörg
Comment 1 Jörg Sommer (mail address dead) 2009-11-17 15:33:48 UTC
Created attachment 4957 [details]
Dump of the network traffic while reproducing this bug
Comment 2 Volker Lendecke 2009-11-17 15:38:34 UTC
From that network trace I don't see what Samba does wrong. After exactly those actions, what is the file contents on the server (log in with ssh, don't look via smbfs/cifs please!)

Volker
Comment 3 Volker Lendecke 2009-11-17 15:40:26 UTC
Ah, you've shown that already, right?

Can you create a new file and strace the smbd while doing that?

Thanks,

Volker
Comment 4 Jörg Sommer (mail address dead) 2009-11-17 15:41:08 UTC
Created attachment 4958 [details]
Log with log level 10 while reproducing this bug
Comment 5 Jörg Sommer (mail address dead) 2009-11-17 15:45:33 UTC
Created attachment 4959 [details]
Strace of the smbd process while reproducing this bug
Comment 6 Volker Lendecke 2009-11-17 15:54:36 UTC
Thanks a lot! It's very likely it's the O_APPEND that kills us. Need to filter that out, I need to find the right spot. Give me some minutes.

Volker
Comment 7 Jörg Sommer (mail address dead) 2009-11-17 15:55:57 UTC
Comment on attachment 4959 [details]
Strace of the smbd process while reproducing this bug

I think interesting in this strace is that the appending of 3 causes the appending of the whole content:

% grep -E '^(pwr|open.*WR)' samba.st
open("joerg/test", O_WRONLY|O_CREAT|O_LARGEFILE|O_NOFOLLOW, 0644) = 30
pwrite64(30, "1\n"..., 2, 0)            = 2
open("joerg/test", O_WRONLY|O_APPEND|O_LARGEFILE|O_NOFOLLOW) = 30
pwrite64(30, "2\n"..., 2, 2)            = 2
open("joerg/test", O_WRONLY|O_APPEND|O_LARGEFILE|O_NOFOLLOW) = 30
pwrite64(30, "1\n2\n3\n"..., 6, 0)      = 6
Comment 8 Volker Lendecke 2009-11-17 16:00:44 UTC
Created attachment 4960 [details]
Patch

Can you try the attached patch?

Thanks,

Volker
Comment 9 Jörg Sommer (mail address dead) 2009-11-17 16:55:39 UTC
I've started the build with the patch. I'll report tomorrow if it fixed the problem.
Comment 10 Steve French 2009-11-17 19:48:23 UTC
wireshark 1.3.1 doesn't decode the set_path_info posix open which makes this harder to read, but presumably O_APPEND passed in on smb open?
Comment 11 Volker Lendecke 2009-11-17 23:52:28 UTC
Maybe. There's another, internal variant of this bug (57908 for reference) where the client asks for GENERIC_WRITE in the ntcreate. This together with POSIX_SEMANTICS in the open flags leads to smbd using O_APPEND in its open(2) syscall which then leads to the same kind of data corruption, coming from frame 36 in the sniff.

Volker
Comment 12 Jeff Layton 2009-11-18 05:42:11 UTC
Created attachment 4962 [details]
kernel patch -- don't send SMB_O_APPEND on O_APPEND opens

This patch is untested...

Yes cifs does send SMB_O_APPEND when the file is opened with O_APPEND. It probably shouldn't for the reasons detailed in this patch description.

At some point we should probably define what the offset in an SMB write call means in conjunction with a SMB_O_APPEND open, but for now it's probably best to just avoid using that flag altogether.

Steve, thoughts?
Comment 13 Jeff Layton 2009-11-18 06:03:55 UTC
Volker pointed out in comment #11 that GENERIC_WRITE also means an implicit O_APPEND too. We'll have to fix cifs to use a combination of flags too. It's just as well though since the open flag conversion routines are a real mess anyway.
Comment 14 shirishpargaonkar@gmail.com 2009-11-18 06:07:45 UTC
(In reply to comment #12)

Jeff, then what distinguishes between (posix) open of a file with and without
O_APPEND flag?
Comment 15 Jeff Layton 2009-11-18 08:17:47 UTC
(In reply to comment #14)
> (In reply to comment #12)
> 
> Jeff, then what distinguishes between (posix) open of a file with and without
> O_APPEND flag?
> 

If I understand your question correctly, the answer is "nothing". Appends in this case would be entirely handled by the client writing to the correct offset rather than expecting the server to append writes to the file.

I think trying to pass O_APPEND to the server is going to be *really* difficult with buffered writes. Suppose we do this:

Open a file that's currently 1k long with O_APPEND and append 1K of data to it via buffered writes. The pagecache now has this data appended. How do we ensure that when we go to flush this page to the server, that we just append the data that was written and *don't* try to write out the existing data that was in the file?

On a related note, how would we handle seeks here? I don't think we have anything like a SMB seek call, do we?

I think the only thing we can do for the time being is just try to make sure that we never ask the server to open in append mode.
Comment 16 shirishpargaonkar@gmail.com 2009-11-18 11:51:45 UTC
So, if an app does not open a file with O_APPEND, cifs should send
appropriate flags (e.g. instead of GENERIC_WRITE, split it and send
a collection of rights sans FILE_APPEND_DATA).
But if an app opens a file with O_APPEND, cifs clients needs to maintain
two offsets, one which is used to write data to its own pages on the client
and other implicit/explicit offset as pagecache flushes those pages 
via routines like cifs_writepages out to the server?
Is it possible that pagecache could decide to evict some random dirty pages
that belong to this file?
Comment 17 Jeff Layton 2009-11-18 12:29:11 UTC
(In reply to comment #16)
> So, if an app does not open a file with O_APPEND, cifs should send
> appropriate flags (e.g. instead of GENERIC_WRITE, split it and send
> a collection of rights sans FILE_APPEND_DATA).

Correct.

> But if an app opens a file with O_APPEND, cifs clients needs to maintain
> two offsets, one which is used to write data to its own pages on the client
> and other implicit/explicit offset as pagecache flushes those pages 
> via routines like cifs_writepages out to the server?
> Is it possible that pagecache could decide to evict some random dirty pages
> that belong to this file?
> 

I think this is more complex than it needs to be. O_APPEND is generally just dealt with on open, at least with network filesystems. Unlike with local filesystems, we can never guarantee that the page and attribute caches are consistent with the actual state of the file on the server. I think basically just ignoring O_APPEND within CIFS is the best thing for now.

The only thing we probably need to do is make sure that we revalidate the file size when we see that O_APPEND Is present. The VFS will take care of making sure that the r/w offset of the filp is set correctly based on the file size.
Comment 18 Jeremy Allison 2009-11-18 12:30:26 UTC
Actually there is an SMBleek call, but it's essentially meaningless.

Currently, when the server gets O_APPEND from a posix client it will open the file with O_APPEND. Which means all offsets in further write calls are ignored.

This means you can actually do a race-free append (unlike NFS), but if it's causing the clients so much trouble maybe we should just filter O_APPEND out permanently from the server as Volker's patch does.

Jeremy.
Comment 19 Volker Lendecke 2009-11-18 12:39:25 UTC
Really? I still think we should keep this in the server for future, smarter clients.

Volker
Comment 20 Volker Lendecke 2009-11-18 12:41:15 UTC
Another comment: The subtle problem with GENERIC_WRITE and POSIX_SEMANTICS only was caused by piggybacking posix open on top of an already overloaded call (createfile). This should never have happened in the first place. In the trans2 posix open it's much easier to handle.

Volker
Comment 21 Jeff Layton 2009-11-18 13:01:17 UTC
(In reply to comment #18)
> Actually there is an SMBleek call, but it's essentially meaningless.
> 
> Currently, when the server gets O_APPEND from a posix client it will open the
> file with O_APPEND. Which means all offsets in further write calls are ignored.
> 
> This means you can actually do a race-free append (unlike NFS), but if it's
> causing the clients so much trouble maybe we should just filter O_APPEND out
> permanently from the server as Volker's patch does.

Yes...given that the protocol is stateful you can do a race-free append here. The problem however is how to manage that with buffered writes and the pagecache. It's this last part that introduces a lot of complexity. I'm not at all sure how to deal with that in a race-free way.
Comment 22 Volker Lendecke 2009-11-18 13:07:43 UTC
True, this is probably a real nightmare to get right. But I would like to keep this element in the protocol to enable it when a smart client comes around the corner. In particular as there's an easy fix: Just don't ask for APPEND when opening a file if you (for good reason) can't handle it.

Volker
Comment 23 Steve French 2009-11-19 21:48:43 UTC
I can understand why sending O_APPEND explicitly with a file offset is confusing (for posix open) - but is there a reason why GENERIC_WRITE on NTCreateX must imply append?

If we filter O_APPEND out of flags we send on the client on (posix) open for the buffered case (since it is hard to get right in that case presumably) - for the directio case could O_APPEND make sense?

Is their an analogous case in NFSv4 where both the file offset and O_APPEND are sent?
Comment 24 Volker Lendecke 2009-11-20 05:54:55 UTC
GENERIC_WRITE includes the APPEND right by definition of MS. The fact that this leads to O_APPEND in the smbd open(2) call is unfortunate, but a direct consequence of trying to piggyback posix semantics on top of an already overloaded createfile call.

Volker
Comment 25 Jeff Layton 2009-11-20 06:01:44 UTC
This is the part I don't quite understand...

Just because we asked for permission to append to the file doesn't mean that we want to open it with O_APPEND. So why is samba doing that here?

We'll obviously need to fix this in the client too to account for older servers, but I wonder whether the server side needs to be fixed too?

On another note, 2.6.32 will be shipping soon. If we want this patch to make it we probably need to act fast.
Comment 26 Volker Lendecke 2009-11-20 06:18:51 UTC
Because when defining the unix extensions, apparently someone decided to do so. It must have been between Jeremy, Steve and others defining that protocol detail. Please ask Steve about the history of this, I am certain he can answer that.

There is just no other bit in the already fixed set of bits in the SMB createfile call. Where would you have put it? That is what I mean that createfile should have never been overloaded with the posix semantics.

Volker
Comment 27 shirishpargaonkar@gmail.com 2009-11-20 12:29:01 UTC
I do not see this problem with older Samba Version 3.0.28-0.el5.8.
The contents of the file on the server are
1
2
3

And smb.conf has really simple stanza

[smbt]
        path =  /tmp/smb_t
        browseable = Yes
        read only = No
        guest ok = Yes
        writable = yes

and nothing special in [global] section as well.
cifs client is 1.61

Not sure if this version of Samba had posix semantics over NTCreateAndX.
Comment 28 Jörg Sommer (mail address dead) 2009-11-20 14:36:06 UTC
(In reply to comment #8)
> Created an attachment (id=4960) [details]
> Patch
> 
> Can you try the attached patch?

I've applied this patch to Samba 3.4.3 and this fixed my problem. The content of the file stays as expected. Thanks.

Comment 29 Jeremy Allison 2009-11-20 19:12:42 UTC
Ok, here is what MSDN says about FILE_APPEND_DATA:

http://msdn.microsoft.com/en-us/library/ms804358.aspx

"If the caller sets only the FILE_APPEND_DATA and SYNCHRONIZE flags, it can write only to the end of the file, and any offset information about write operations to the file is ignored. The file will automatically be extended as necessary for this type of operation."

So Windows will do O_APPEND in certain cases. The problem isn't in the specification of the POSIX on the wire flags - there is a separate SMB_O_APPEND flag for this.

Actually, in the 3.4.x and 3.3.x code we are safe from this problem occurring with a NtCreateX call - there is no codepath in these branches that ever sets FILE_FLAG_POSIX_SEMANTICS for a client issued Windows open - only on a client issued POSIX open.

So doing NTCreateX or any other Windows open call from the client even when POSIX pathnames has been selected will be safe from doing O_APPEND.

Volker - in comment 11 you say:

"Maybe. There's another, internal variant of this bug (57908 for reference)
where the client asks for GENERIC_WRITE in the ntcreate. This together with
POSIX_SEMANTICS in the open flags leads to smbd using O_APPEND in its open(2)
syscall which then leads to the same kind of data corruption, coming from frame
36 in the sniff."

I don't have access to that internal bug report. What version of Samba is that using ? I currently can't see a way an NTCreateX call can end up with FILE_FLAG_POSIX_SEMANTICS set.

Jeremy.
Comment 30 shirishpargaonkar@gmail.com 2009-11-21 06:21:44 UTC
(In reply to comment #29)
> I don't have access to that internal bug report. What version of Samba is that
> using ?

/usr/sbin/smbd -V
Version 3.4.2-ctdb-12
Comment 31 Steve French 2009-11-22 19:28:08 UTC
>> Just because we asked for permission to append to the file doesn't mean that we
>> want to open it with O_APPEND
> when defining the unix extensions, apparently someone decided to do so.
> It must have been between Jeremy, Steve and others defining that protocol
> detail. Please ask Steve about the history of this

In defining the Unix and POSIX extensions AFAIK O_APPEND did not come up, and certainly not GENERIC_WRITE as a proxy for the O_APPEND flag.  For the O_APPEND case, the clients revalidate file size and write to the corresponding end of file.   As Jeremy noted, asking for FILE_APPEND_DATA but not asking for write permission could be used to indicate O_APPEND (in Windows), but I don't know of unix clients which do that (in Linux the VFS's page cache gets in the way so would be hard to do this unless we bypass the page cache for files open for append).   GENERIC_WRITE, since it includes write, not just append, permission was not meant to indicate append only.  In NTCreateX we did think that the posix flag would be useful to indicate that we wanted posix semantics (in particular posix pathnames) but I don't remember any discussions about the posix flag changing the meaning for append on NTCreateX.

Comment 32 Volker Lendecke 2009-11-23 03:36:56 UTC
What I'm seeing in that logfile:

  reply_ntcreate_and_X: flags = 0x2, access_mask = 0xc0000000 file_attributes = 0x1000080, share_access = 0x7, create_disposition = 0x5 create_options = 0x40 root_dir_fid = 0x0, fname = snlrk51_1/cifs/cifs4/test3/f8.blt

access_mask = 0xc0000000 to me means SEC_GENERIC_READ|SEC_GENERIC_WRITE. When looking at file_generic_mapping, SEC_GENERIC_WRITE will lead to FILE_GENERIC_WRITE which implies FILE_APPEND_DATA.

file_attributes = 0x1000080 to me means FILE_ATTRIBUTE_NORMAL|FILE_FLAG_POSIX_SEMANTICS. To me it also seems that file_attributes end up in open_file_ntcreate as new_dos_attributes. Looking at line open.c:1506, FILE_FLAG_POSIX_SEMANTICS in new_dos_attributes leads to posix_open=True. Then looking at line 1738ff this together with FILE_APPEND_DATA in access_mask (coming in from FILE_GENERIC_WRITE) leads to O_APPEND in flags2.

I might be wrong, but this is what I think is happening. The simple fix for this would be not to use SEC_GENERIC_WRITE but the translation just omitting FILE_APPEND_DATA.

Volker
Comment 33 Jeremy Allison 2009-11-23 11:32:55 UTC
Oh - is the FILE_FLAG_POSIX_SEMANTICS coming in *OFF THE WIRE* !!! That explains it. Ok, I think the correct server fix is to AND off FILE_FLAG_POSIX_SEMANTICS from all Windows open calls that send 32-bit attributes. No Windows open should be able to set that flag - it should only get set in the posix_open, posix_mkdir calls.

Thanks for clarifying.

Jeremy.
Comment 34 Jeremy Allison 2009-11-23 11:34:53 UTC
And of course the client fix for older servers is to never set the FILE_FLAG_POSIX_SEMANTICS bit on any Windows create calls - that avoids the FILE_APPEND_DATA -> O_APPEND problem.
Jeremy.
Comment 35 Jeremy Allison 2009-11-23 11:44:24 UTC
Created attachment 4982 [details]
Server patch.

I think this is the correct server patch - keeps O_APPEND for POSIX clients but prevents any Windows opens from setting it - even if they're using GENERIC_WRITE access on open request.

Volker please confirm.

Jeremy.
Comment 36 Jeremy Allison 2009-11-23 12:16:38 UTC
Created attachment 4983 [details]
git-am format patch for 3.4.4.
Comment 37 Jeremy Allison 2009-11-23 12:28:00 UTC
Created attachment 4984 [details]
git-am format patch for 3.3.10.
Comment 38 Jeremy Allison 2009-11-23 12:30:27 UTC
Created attachment 4985 [details]
git-am format patch for 3.2.x

One for the (obsolete) 3.2.x patches page ?
Jeremy.
Comment 39 Volker Lendecke 2009-11-23 14:02:12 UTC
Comment on attachment 4982 [details]
Server patch.

I disagree. Why is it more difficult for the client to not send something that it can not swallow than for the server to filter it out, forever blocking correct clients?
Comment 40 Jeff Layton 2009-11-23 14:39:23 UTC
This all comes down to what POSIX_SEMANTICS is supposed to indicate.
The SNIA spec says:

----------[snip]-----------
Indicates that the file is to be accessed according to POSIX rules.
This includes allowing multiple files with names differing only in
case, for file systems that support such naming. (Use care when
using this option because files created with this flag may not be
accessible by applications written for MS-DOS, Windows 3.x, or
Windows NT.)
----------[snip]-----------

...which seems to indicate that it's just intended to affect how the
server parses pathnames. I haven't found any references to the idea
that server should be treating the permission flags as open intent flags
when POSIX_SEMANTICS is set (then again, I haven't looked for that info
either).

If that's the case, should samba be changing how it deals with flags
when doing an open with the POSIX_SEMANTICS bit set?

Note that I'm OK with fixing the client clearing this bit, but I'd like to understand why you think that samba's behavior is correct here.
Comment 41 Volker Lendecke 2009-11-23 14:48:31 UTC
It's essentially over. 3.5 will not respect POSIX_SEMANTICS on ntcreate at all anymore.

Volker
Comment 42 Jeremy Allison 2009-11-23 15:51:04 UTC
Volker is correct - we need to allow FILE_FLAG_POSIX_SEMANTICS to change the pathname processing, but not the append behavior. Updated patches to follow.
Jeremy.
Comment 43 Jeremy Allison 2009-11-23 19:08:06 UTC
Created attachment 4987 [details]
git-am format patch for 3.4.4.
Comment 44 Jeremy Allison 2009-11-23 19:19:07 UTC
Comment on attachment 4984 [details]
git-am format patch for 3.3.10.

This patch isn't needed for 3.3.10 - it NtCreateX and the NTTrans varient already use a different open code path (create_file()) which strips off the FILE_FLAG_POSIX_SEMANTICS from file_attributes. The posix open and mkdir paths use a different call (open_file_ntcreate()) which doesn't remove FILE_FLAG_POSIX_SEMANTICS.
Jeremy.
Comment 45 Jeremy Allison 2009-11-23 19:19:33 UTC
Comment on attachment 4985 [details]
git-am format patch for 3.2.x

This patch isn't needed for 3.2.x - NtCreateX and the NTTrans varient already use a different open code path (create_file()) which strips off the FILE_FLAG_POSIX_SEMANTICS from file_attributes. The posix open and mkdir paths use a different call (open_file_ntcreate()) which doesn't remove FILE_FLAG_POSIX_SEMANTICS.
Jeremy.
Comment 46 Jeremy Allison 2009-11-23 19:20:38 UTC
Ok I think the only patch needed is the git-am patch for 3.4.4 in comment #43.
Jeremy.
Comment 47 Jeremy Allison 2009-11-23 21:22:31 UTC
Created attachment 4990 [details]
Simplified version of the 3.4.4 patch.

This is the more simple version of the patch (removed the trans2.c changes as they are not needed). This restores the behavior of 3.3.x and 3.2.x of masking off the FILE_FLAG_POSIX_SEMANTICS in NtCreateX and NTTrans-CreateX once the pathname has been processed.
Jeremy.
Comment 48 Steve French 2009-11-24 16:58:11 UTC
http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=4ff1c3f3f0cfa96e88c4bfef06897d95549ee1d9  checked into cifs-2.6.git (will request upstream merge soon if no objections) so will fix posix open on 2.6.29 through 2.6.31 to not set this flag (O_APPEND) - eventually can add it back in for the non-cached case ("forcedirectio" e.g.)
Comment 49 Volker Lendecke 2009-11-29 09:18:44 UTC
Comment on attachment 4990 [details]
Simplified version of the 3.4.4 patch.

Tested against POSIX-APPEND, works. Thanks!

Volker
Comment 50 Karolin Seeger 2009-12-01 02:36:07 UTC
(In reply to comment #49)
> (From update of attachment 4990 [details])
> Tested against POSIX-APPEND, works. Thanks!
> 
> Volker
> 

Pushed to v3-4-test.
Closing out bug report.

Thanks!