Bug 7162 - high risk of corruption when writing a file with a lock on cifs
high risk of corruption when writing a file with a lock on cifs
Product: CifsVFS
Classification: Unclassified
Component: kernel fs
x86 Linux
: P3 major
: ---
Assigned To: Jeff Layton
Depends on:
  Show dependency treegraph
Reported: 2010-02-20 02:27 UTC by fdupoux
Modified: 2012-10-12 10:45 UTC (History)
0 users

See Also:

Program to reproduce the bug (5.17 KB, text/x-csrc)
2010-02-20 02:35 UTC, fdupoux
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description fdupoux 2010-02-20 02:27:42 UTC
If you create a file on a cifs filesystem and then lock it using lockf(fd, F_LOCK, 0), then there is a very high risk of corruption when you write data in that file. The application is not aware of the problem since write() returns the number of bytes that should have been written. The corruption is detected by checksum errors when the application reads the file.

When this happens we can see the following error in syslog:
CIFS VFS: Write2 ret -13, wrote 0

I am running linux- on Debian Lenny, but it seems that all recent linux versions are affected.

Linux debian 2.6.32-2-amd64 #1 SMP Fri Feb 12 00:01:47 UTC 2010 x86_64 GNU/Linux
ii  linux-image-2.6.32-2-amd64 - 2.6.32-8 (it's based on
ii  smbclient                  - 2:3.2.5-4lenny8 
ii  smbfs                      - 2:3.2.5-4lenny8
ii  samba                      - 2:3.2.5-4lenny8
ii  samba-common               - 2:3.2.5-4lenny8

My file server is Windows-XP SP3 32bit. I mounted the cifs share this way:
mount -t cifs //winxp/sharename /mnt/cifs
Comment 1 fdupoux 2010-02-20 02:35:02 UTC
Created attachment 5404 [details]
Program to reproduce the bug

I have written a small C program to reproduce the bug.
This program writes two copies of random data blocks to two files.
If one file is written to the local disk and the other on a cifs 
mounted directory, then there is a very high risk of corruption on
the file written to the cifs mounted directory. We expect the two
files to be strictly identical at the end, but this program
demonstrates that they sometimes differ. If may be necessary to
run this program multiples times to get the bug, but in general it's
quite frequent and it does not take a lot of time to reproduce it.
This problem only happens when the files are locked using
lockf(fd, F_LOCK, 0). If we don't lock the files, the corruption
on cifs disappears and the two files are identical.
Comment 2 Jeff Layton 2010-02-22 16:33:01 UTC
Thanks for the bug report. Very interesting problem.

Looks like a write is occasionally failing with an -EACCES (permission denied) error. Why a lockf would affect that, I'm not immediately sure. I'll take a look at the reproducer when I get a chance.
Comment 3 Jeff Layton 2012-05-05 10:51:13 UTC
Sorry for the long delay on this, but I think I sort of see the issue...

Windows uses mandatory locking, so if you have a range of a file locked, then
you can't write to it. It's perfectly legitimate for the kernel to buffer up
write requests and then try to issue them later. Unfortunately for your
testcase, those writes will fail because of the lock. The question here is whether you get back an error on close(). You should most likely see -EIO,
but your testcase doesn't check for errors on close (a common application
bug, unfortunately).

Still, while that's technically correct, an error on close is not
ideal...does this work any better if you use a more recent kernel
(something 3.x-ish) and mount with '-o strictcache' ?
Comment 4 Jeff Layton 2012-05-05 11:16:34 UTC
In fact, I just tested this program against a win2k8 server and it seems to
work just fine. I'm not exactly sure what has changed, but can you let me
know whether a more recent kernel behaves better here?
Comment 5 Jeff Layton 2012-10-12 10:45:45 UTC
No response in several months. Closing bug.