Bug 7048 - File content duplication on append
Summary: File content duplication on append
Status: RESOLVED FIXED
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: x64 Linux
: P3 major
Target Milestone: ---
Assignee: Jeff Layton
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-19 06:50 UTC by john
Modified: 2010-02-22 15:11 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description john 2010-01-19 06:50:42 UTC
Hi,

We're still seeing a bug similar to that in #6898 (https://bugzilla.samba.org/show_bug.cgi?id=6898 marked RESOLVED FIXED) except that the simple test given in that report no longer demonstrates the problems.

It has been causing us major data corruption and has meant we've currently had to suspend the use of Samba file services altogether.

We have 100 or so RHEL5.4 clients talking CIFS to a Solaris 10 server for remote home directories; the server has been updated to 3.4.4 since the fix but we are still having append problems.

I'm having difficulty coming up with a minimal test case at the shell prompt, as trivial examples now seem to work ok, but for example:

-bash-3.2$ echo test1>>a
-bash-3.2$ echo test2>>a
-bash-3.2$ echo test3>>a
-bash-3.2$ cat a
test1
test2
test3
-bash-3.2$ echo test4>>a
-bash-3.2$ cat a
test1
test2
test3
test1
test2
test3
test4
-bash-3.2$ uname -a
Linux [...] 2.6.18-164.6.1.el5 #1 SMP Tue Oct 27 11:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

This reproduces the bug most of the time, though it seems to be affected by local caching.


I can, however, reproduce it consistently with a simple C program.

TEST:
#include <stdio.h>
void wl(char* fn, char* a, char* l) { FILE* f=fopen(fn,a); fprintf(f,l); fclose(f); }

int main() {
        wl("effort.txt", "w", "Line A"); wl("effort.txt", "a+", "Line B");
        wl("effort.txt", "a+", "Line C"); wl("effort.txt", "a+", "Line D");
        return 0;
}

EXPECTED:
Line A
Line B
Line C
Line D

RECEIVED:
Line A
Line A
Line B
Line A
Line B
Line C
Line A
Line B
Line C
Line D

Regards,  John.
Comment 1 Volker Lendecke 2010-01-19 07:06:03 UTC
Can you please upload a network trace and straces of both the client and smbds? See http://wiki.samba.org/index.php/Capture_Packets and for the strace of smbd please mount the share, then look with smbstatus which smbd pid is serving your client. Then issue

strace -ttT -o /tmp/smbd.trace -p <smbd-pid>

Please upload the smbd.strace.

Thanks,

Volker
Comment 2 Jeremy Allison 2010-01-19 18:58:26 UTC
Ok, this small program easily reproduces the issue with the CIFS client shipped in Ubuntu 9.10. But I think the server is doing the right thing here.

The client is doing a POSIX open, and sending an access mask of 0x7, which maps to:

#define FILE_READ_DATA        0x00000001
#define FILE_WRITE_DATA       0x00000002
#define FILE_APPEND_DATA      0x00000004

As this is a POSIX open, the server deliberately passes through the append request as O_APPEND, to the system open() call, which gives you the result we see in the output file (as the client is still trying to write at offset zero, which it can't do once a file is opened for O_APPEND).

This is a client bug (IMHO), in that the client must not send FILE_APPEND_DATA in a POSIX open when it can't cope with this in the client buffer cache. I'm going to re-assign this bug to Jeff Layton for comment. Jeff, is this something that got fixed when the original discussion for bug #6898 occurred ?

Bug #6898 was a different issue, where O_APPEND was being selected when the client was using NTCreateX, not POSIX open with the UNIX extensions. I'm also changing this to a CIFSFS bug.

Jeremy.
Comment 3 Jeremy Allison 2010-01-19 19:56:25 UTC
Ok, Kukks pointed out on IRC that this bug has been fixed in the kernel - see here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=cea62343956c24452700c06cf028b72414c58a74

for the patch. Apply this to your cifsfs kernel sources for RHEL5.4 and it will fix the problem.

Jeff is a Red Hat employee so he should be able to help you get this fix through official channels.

Jeremy.
Comment 4 Jeff Layton 2010-01-20 05:54:16 UTC
Yes, the fix is already slated for 5.5. If you need it sooner, then please open a support case and request a hotfix or make a case for getting this into an async errata release.
Comment 5 Jeff Layton 2010-02-22 15:11:58 UTC
I believe this is now fixed in recent kernels and can now be closed