7162 – high risk of corruption when writing a file with a lock on cifs

Bug 7162 - high risk of corruption when writing a file with a lock on cifs

Summary: high risk of corruption when writing a file with a lock on cifs

Status:	RESOLVED FIXED

Alias:	None

Product:	CifsVFS
Classification:	Unclassified
Component:	kernel fs (show other bugs)
Version:	2.6
Hardware:	x86 Linux

Importance:	P3 major
Target Milestone:	---
Assignee:	Jeff Layton
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-02-20 02:27 UTC by fdupoux
Modified:	2012-10-12 10:45 UTC (History)
CC List:	0 users

See Also:

Attachments
Program to reproduce the bug (5.17 KB, text/x-csrc) 2010-02-20 02:35 UTC, fdupoux	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description fdupoux 2010-02-20 02:27:42 UTC

If you create a file on a cifs filesystem and then lock it using lockf(fd, F_LOCK, 0), then there is a very high risk of corruption when you write data in that file. The application is not aware of the problem since write() returns the number of bytes that should have been written. The corruption is detected by checksum errors when the application reads the file.

When this happens we can see the following error in syslog:
CIFS VFS: Write2 ret -13, wrote 0

I am running linux-2.6.32.8 on Debian Lenny, but it seems that all recent linux versions are affected.

Linux debian 2.6.32-2-amd64 #1 SMP Fri Feb 12 00:01:47 UTC 2010 x86_64 GNU/Linux
ii  linux-image-2.6.32-2-amd64 - 2.6.32-8 (it's based on 2.6.32.8)
ii  smbclient                  - 2:3.2.5-4lenny8 
ii  smbfs                      - 2:3.2.5-4lenny8
ii  samba                      - 2:3.2.5-4lenny8
ii  samba-common               - 2:3.2.5-4lenny8

My file server is Windows-XP SP3 32bit. I mounted the cifs share this way:
mount -t cifs //winxp/sharename /mnt/cifs

Comment 1 fdupoux 2010-02-20 02:35:02 UTC

Created attachment 5404 [details]
Program to reproduce the bug

I have written a small C program to reproduce the bug.
This program writes two copies of random data blocks to two files.
If one file is written to the local disk and the other on a cifs 
mounted directory, then there is a very high risk of corruption on
the file written to the cifs mounted directory. We expect the two
files to be strictly identical at the end, but this program
demonstrates that they sometimes differ. If may be necessary to
run this program multiples times to get the bug, but in general it's
quite frequent and it does not take a lot of time to reproduce it.
This problem only happens when the files are locked using
lockf(fd, F_LOCK, 0). If we don't lock the files, the corruption
on cifs disappears and the two files are identical.

Comment 2 Jeff Layton 2010-02-22 16:33:01 UTC

Thanks for the bug report. Very interesting problem.

Looks like a write is occasionally failing with an -EACCES (permission denied) error. Why a lockf would affect that, I'm not immediately sure. I'll take a look at the reproducer when I get a chance.

Comment 3 Jeff Layton 2012-05-05 10:51:13 UTC

Sorry for the long delay on this, but I think I sort of see the issue...

Windows uses mandatory locking, so if you have a range of a file locked, then
you can't write to it. It's perfectly legitimate for the kernel to buffer up
write requests and then try to issue them later. Unfortunately for your
testcase, those writes will fail because of the lock. The question here is whether you get back an error on close(). You should most likely see -EIO,
but your testcase doesn't check for errors on close (a common application
bug, unfortunately).

Still, while that's technically correct, an error on close is not
ideal...does this work any better if you use a more recent kernel
(something 3.x-ish) and mount with '-o strictcache' ?

Comment 4 Jeff Layton 2012-05-05 11:16:34 UTC

In fact, I just tested this program against a win2k8 server and it seems to
work just fine. I'm not exactly sure what has changed, but can you let me
know whether a more recent kernel behaves better here?

Comment 5 Jeff Layton 2012-10-12 10:45:45 UTC

No response in several months. Closing bug.