Bug 9017 - file corrupted when copied from cifs mounted OS/2 share
Summary: file corrupted when copied from cifs mounted OS/2 share
Status: RESOLVED FIXED
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: x64 Linux
: P5 critical
Target Milestone: ---
Assignee: Jeff Layton
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-29 17:56 UTC by David H. Durgee
Modified: 2012-12-04 11:48 UTC (History)
2 users (show)

See Also:


Attachments
debug data from both helena and maya (418.09 KB, application/zip)
2012-06-29 17:56 UTC, David H. Durgee
no flags Details
sample test file and corrupted file (16.14 KB, application/zip)
2012-06-29 17:57 UTC, David H. Durgee
no flags Details
patch -- when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize (2.18 KB, patch)
2012-06-30 11:37 UTC, Jeff Layton
no flags Details
syslog and kern.log documenting cifs problem (32.82 KB, application/zip)
2012-07-02 17:03 UTC, David H. Durgee
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David H. Durgee 2012-06-29 17:56:35 UTC
Created attachment 7677 [details]
debug data from both helena and maya

I recenty installed linux mint 13 maya - x64 edition on a multiboot system for testing. I recently found that files copied from a cifs mounted OS/2 share are corrupted in a particular pattern.

The first 4297 bytes are copied correctly, so any small files would not manifest the problem. The next 3895 bytes of the file are replaced with x'00' bytes. The next 12489 bytes are copied correctly, followed by another 3895 bytes of x'00' instead of the actual file contents. This pattern continues with 12489 bytes from the original file followed by 3895 x'00' bytes until the file size is reached.

The smbstatus command reports Samba version 3.6.3 is running in my copy of maya. I have tested this on my other two versions of mint that are running here and neither of them manifest this problem. One system is linux mint 11 katya - x64 edition, which reports Samba version 3.5.8 and the other is the current linux mint 8 helena - x64 edition on the same system I am testing maya on which reports Samba version 3.4.0 running.

An example fstab entry for one of the OS/2 shares to mount is:

//tp600e/HDRIVE /mnt/x_ cifs user=userid%password,servern=TP600E,sec=lanman,nocase,uid=1000,noserverino 0 0

I have done some additional testing to confirm this is not due to an installation error:

1) I booted a verified maya LiveDVD and was able to recreate the problem.

2) I booted a lisa LiveCD and the problem was not present.

From this I conclude that:

1) this is NOT an installation error, my maya install is indeed "working as designed"

2) this is a recent problem, as it is not present in lisa but is present in maya

I have tested this in both helena and maya with cifsFYI and traceSMB set and have attached this to the bug.

Dave
Comment 1 David H. Durgee 2012-06-29 17:57:31 UTC
Created attachment 7678 [details]
sample test file and corrupted file
Comment 2 Jeff Layton 2012-06-29 18:17:04 UTC
Thanks for the bug report. David sent me a capture privately that shows the read traffic between the client and server. One thing I notice is that the server is replying with short reads. Here's one pair, for instance:

    Read AndX Request, FID: 0x0000, 16384 bytes at offset 0
    Read AndX Response, 4297 bytes

...the one thing this capture does not contain is the traffic from the mount itself. The thing we need to see is whether the server is claiming that it supports larger read sizes than that or not.

Can you redo the capture, but start it up before you mount the share?
Comment 3 David H. Durgee 2012-06-29 18:23:21 UTC
(In reply to comment #2)
> Thanks for the bug report. David sent me a capture privately that shows the
> read traffic between the client and server. One thing I notice is that the
> server is replying with short reads. Here's one pair, for instance:
> 
>     Read AndX Request, FID: 0x0000, 16384 bytes at offset 0
>     Read AndX Response, 4297 bytes
> 
> ...the one thing this capture does not contain is the traffic from the mount
> itself. The thing we need to see is whether the server is claiming that it
> supports larger read sizes than that or not.
> 
> Can you redo the capture, but start it up before you mount the share?

This share is in the fstab and mounted automatically.  Will it suffice to umount it and then issue the mount command again, or should I remove it from the fstab and reboot before issuing the mount command?

Dave
Comment 4 Jeff Layton 2012-06-29 18:25:46 UTC
Unmounting and remounting it should be fine, but you'll need to unmount all shares that are being mounted from this server. cifs.ko will share the socket between multiple mounts, so we want to ensure that it has to reconnect fully to the server.

I mainly need to see the protocol negotiation packets that occur when the client first contacts the server.
Comment 5 Jeff Layton 2012-06-29 19:31:45 UTC
Ok, David sent me another capture that shows the negotiation. Of interest is this in the Negotiate Protocol response:

    Max Buffer Size: 4356

...unfortunately this is an area where the MS-CIFS spec gets a bit "fuzzy". The "Max Buffer Size" is supposed to govern how large a SMB request that the server can receive. In this case, I suspect that it also has some internal limitations about how large a request it can send too. We have some hacks in the cifs rsize negotiation code to deal with this for Win9x-era servers:

----------------[snip]--------------------
        /*
         * Set default value...
         *
         * HACK alert! Ancient servers have very small buffers. Even though
         * MS-CIFS indicates that servers are only limited by the client's
         * bufsize for reads, testing against win98se shows that it throws
         * INVALID_PARAMETER errors if you try to request too large a read.
         *
         * If the server advertises a MaxBufferSize of less than one page,
         * assume that it also can't satisfy reads larger than that either.
         *
         * FIXME: Is there a better heuristic for this?
         */
        if (tcon->unix_ext && (unix_cap & CIFS_UNIX_LARGE_READ_CAP))
                defsize = CIFS_DEFAULT_IOSIZE;
        else if (server->capabilities & CAP_LARGE_READ_X)
                defsize = CIFS_DEFAULT_NON_POSIX_RSIZE;
        else if (server->maxBuf >= PAGE_CACHE_SIZE)
                defsize = CIFSMaxBufSize;
        else
                defsize = server->maxBuf - sizeof(READ_RSP);
----------------[snip]--------------------

...so there, we assume that when the server advertises a receive buffer that's smaller than 1 page (4k), that we should cap the rsize at that value too. Unfortunately, it looks like OS/2 has some similar limitations though it doesn't throw an error. It's advertising a buffer that's larger than a page so we end up trying to request 16k reads (the size of a cifs.ko standard receive buffer).

The upshot here is that you can probably work around this by specifying rsize=4096 when mounting. I'll have to give some thought about how best to detect this situation though if we want rsize autonegotiation with OS/2 to work.

Steve, any thoughts?
Comment 6 David H. Durgee 2012-06-29 20:34:20 UTC
Is this a recent change to the code?  As I noted in the report, this problem does not manifest with any linux mint prior to maya.  The DebugData shows cifs as 1.76, where the lisa release shows cifs at 1.74 which works.
Comment 7 David H. Durgee 2012-06-29 21:22:05 UTC
This may simply be a naive thought on my part, but would there be any reason not to simply adjust the rsize down on a connection when you attempt a large read and get a smaller response than expected?  I would think that the size of the response you received would indicate the largest response that the server can give, assuming that it does not return an error condition of some sort instead.
Comment 8 Jeff Layton 2012-06-30 00:10:21 UTC
(In reply to comment #6)
> Is this a recent change to the code?  As I noted in the report, this problem
> does not manifest with any linux mint prior to maya.  The DebugData shows cifs
> as 1.76, where the lisa release shows cifs at 1.74 which works.

Yes. We've changed the code to make asynchronous reads on the wire for better performance. At the same time we added the ability for the code to support larger read sizes.

(In reply to comment #7)
> This may simply be a naive thought on my part, but would there be any reason
> not to simply adjust the rsize down on a connection when you attempt a large
> read and get a smaller response than expected?  I would think that the size of
> the response you received would indicate the largest response that the server
> can give, assuming that it does not return an error condition of some sort
> instead.

Much easier said than done.

When we're issuing async reads, we issue them all at once and just collect the replies as they come in, so you not only need to take down the rsize but fill in the "gaps" too.

The assumption we generally make when we get a short read is that the file isn't as big as we thought. So you'd need some way to distinguish between that situation and this one (where the server sent a short reply because it couldn't receive a buffer that large), and I don't see a way to do that.

Assuming you could though, then you'd just need to somehow reissue reads to fill in the gaps from an asynchronous error handling. Handling async reads is already pretty tricky however, without that added complexity.

Luckily most modern servers handle larger reads and writes properly, so this problem pretty much goes away on any server from this millenium.

I think the best solution here is to make sure we get the rsize correct from the get-go. One thing that you could do to help here is to confirm whether mounting with rsize=4096 works around the problem.
Comment 9 David H. Durgee 2012-06-30 00:27:37 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > Is this a recent change to the code?  As I noted in the report, this problem
> > does not manifest with any linux mint prior to maya.  The DebugData shows cifs
> > as 1.76, where the lisa release shows cifs at 1.74 which works.
> 
> Yes. We've changed the code to make asynchronous reads on the wire for better
> performance. At the same time we added the ability for the code to support
> larger read sizes.
> 
> (In reply to comment #7)
> > This may simply be a naive thought on my part, but would there be any reason
> > not to simply adjust the rsize down on a connection when you attempt a large
> > read and get a smaller response than expected?  I would think that the size of
> > the response you received would indicate the largest response that the server
> > can give, assuming that it does not return an error condition of some sort
> > instead.
> 
> Much easier said than done.
> 
> When we're issuing async reads, we issue them all at once and just collect the
> replies as they come in, so you not only need to take down the rsize but fill
> in the "gaps" too.
> 
> The assumption we generally make when we get a short read is that the file
> isn't as big as we thought. So you'd need some way to distinguish between that
> situation and this one (where the server sent a short reply because it couldn't
> receive a buffer that large), and I don't see a way to do that.
> 
> Assuming you could though, then you'd just need to somehow reissue reads to
> fill in the gaps from an asynchronous error handling. Handling async reads is
> already pretty tricky however, without that added complexity.
> 
> Luckily most modern servers handle larger reads and writes properly, so this
> problem pretty much goes away on any server from this millenium.
> 
> I think the best solution here is to make sure we get the rsize correct from
> the get-go. One thing that you could do to help here is to confirm whether
> mounting with rsize=4096 works around the problem.

I just tried the rsize=4096 and it appears to solve the problem, although I have not had time to exhaustively test it.  So assuming further testing confirms it this appears to provide a work-around.

Regarding the detection and adjustment of rsize, would it make sense to make the first read from a server a test of your rsize?  If you get the expected response, then start your async operations on that server.  If you get a short response adjust your rsize and try your next read to pick up where the first left off.  Once you get the expected response to your read you have your rsize right and can start async operations.

Dave
Comment 10 David H. Durgee 2012-06-30 00:35:55 UTC
Further thoughts on comment 8:

> The assumption we generally make when we get a short read is that the file
> isn't as big as we thought. So you'd need some way to distinguish between that
> situation and this one (where the server sent a short reply because it couldn't
> receive a buffer that large), and I don't see a way to do that.

The only time a short read indicates the file size was incorrect is if it comes at the end of the file, so if you get a short reply for a block in the middle of a file and have replies for a later block you know this is an rsize problem.

I am assuming here that servers maintain a constant rsize and don't start replying in smaller blocks when they get tight on buffers for some reason.  If any of them actually do this I would expect it to cause similar problems to what I am seeing as well.

Dave
Comment 11 Jeff Layton 2012-06-30 11:28:34 UTC
> Regarding the detection and adjustment of rsize, would it make sense to make
> the first read from a server a test of your rsize?  If you get the expected
> response, then start your async operations on that server.  If you get a short
> response adjust your rsize and try your next read to pick up where the first
> left off.  Once you get the expected response to your read you have your rsize
> right and can start async operations.
> 
[...]
> 
> The only time a short read indicates the file size was incorrect is if it comes
> at the end of the file, so if you get a short reply for a block in the middle
> of a file and have replies for a later block you know this is an rsize problem.
> 
> I am assuming here that servers maintain a constant rsize and don't start
> replying in smaller blocks when they get tight on buffers for some reason.  If
> any of them actually do this I would expect it to cause similar problems to
> what I am seeing as well.
> 

It's very difficult to predict what a server will do in this regard since the spec for CIFS was extremely unclear for years and there are a lot of crappy server implementations in the field.

You can't necessarily tell after the fact what a short read actually means because that's racy and there's no guarantee of atomicity. Suppose you go to check the size of the file and the server says it "big" so you then go to do your read and it comes back short. Does that mean that the server can't handle a read that big, or someone truncated the file out from under you?

That of course doesn't even consider the complexity of handling this automatically. Again, I think the best solution is make sure we get the rsize correct from the beginning.
Comment 12 Jeff Layton 2012-06-30 11:37:16 UTC
Created attachment 7679 [details]
patch -- when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize

I think this patch is probably what we'll need. When a server doesn't set the CAP_LARGE_READ_X bit, then it will default to an rsize that's equivalent to its MaxBufferSize minus the size of the read header. That's probably the best we can do.

David, could you test this and let me know if it does the right thing?
Comment 13 David H. Durgee 2012-06-30 13:23:55 UTC
(In reply to comment #12)
> Created attachment 7679 [details]
> patch -- when server doesn't set CAP_LARGE_READ_X, cap default rsize at
> MaxBufferSize
> 
> I think this patch is probably what we'll need. When a server doesn't set the
> CAP_LARGE_READ_X bit, then it will default to an rsize that's equivalent to its
> MaxBufferSize minus the size of the read header. That's probably the best we
> can do.
> 
> David, could you test this and let me know if it does the right thing?

I am not set up to apply the patch and compile, but I can certainly test setting the rsize value on the mount.  You note in comment 5 that the MaxBufferSize is 4356, so given this what rsize would be set by the patch?

Dave
Comment 14 David H. Durgee 2012-06-30 18:38:03 UTC
I took a look at /proc/mounts and saw wsize=4292, so I tried setting rsize=4292 as well on these shares.  I have done some testing by testing zip files on the mounted shares on each of the three shares and they seem to be working properly, so assuming that this is what rsize would default to with the patch I would think this fixes the problem.

Thank you for your assistance in this matter.  Given this is a recently released new feature I can see how I might be the first to hit it.  I am glad to be able to assist you in fixing it.

I assume that the patched cifs.ko will find its way into the updates to my distro, which is linux mint maya x64 edition.  If I can assist in testing it prior to release let me know.  As this is not an area I work in I would need direction from you on installing it to my system.

Dave
Comment 15 Jeff Layton 2012-07-02 11:18:32 UTC
server's maxBuf is 4356 and the READ_RSP header is 63 bytes:

    4356 - 63 = 4293

...so that's likely what the rsize will end up at. As a practical matter, we typically round down to a multiple of the page size in most cases, so that will mostly lead you to get 4k I/Os on the wire.

You'll probably still see a bit of a performance gain over earlier kernels though simply because you'll be doing those reads in parallel.

Once I send this patch to the mailing list, we'll debate it there and then it should make its way into stable kernels eventually assuming there are no objections.
Comment 16 David H. Durgee 2012-07-02 15:35:56 UTC
(In reply to comment #15)
> server's maxBuf is 4356 and the READ_RSP header is 63 bytes:
> 
>     4356 - 63 = 4293
> 
> ...so that's likely what the rsize will end up at. As a practical matter, we
> typically round down to a multiple of the page size in most cases, so that will
> mostly lead you to get 4k I/Os on the wire.
> 
> You'll probably still see a bit of a performance gain over earlier kernels
> though simply because you'll be doing those reads in parallel.
> 
> Once I send this patch to the mailing list, we'll debate it there and then it
> should make its way into stable kernels eventually assuming there are no
> objections.

Will that include the 3.2.0-23-generic that is part of the Linux Mint 13 maya x64 distro that I am running where I encountered the problem?  I would be happy to test the patch for you if you can provide me a binary of cifs.ko appropriate to replace the one currently present in my maya partition.  I see it located at
/lib/modules/3.2.0-23-generic/kernel/fs/cifs/cifs.ko although I have not checked for other copies in the system.

Dave
Comment 17 Jeff Layton 2012-07-02 15:47:22 UTC
(In reply to comment #16)

> Will that include the 3.2.0-23-generic that is part of the Linux Mint 13 maya
> x64 distro that I am running where I encountered the problem?  I would be happy
> to test the patch for you if you can provide me a binary of cifs.ko appropriate
> to replace the one currently present in my maya partition.  I see it located at
> /lib/modules/3.2.0-23-generic/kernel/fs/cifs/cifs.ko although I have not
> checked for other copies in the system.
> 
> Dave

I can't build a kernel module for you since I don't have a mint box and you really need to make sure you're using the same sources and toolchain. I'm fairly certain the patch will do the right thing since it's reasonably simple.

Eventually this will make its way into the mainline stable kernel series. I think 3.2.x was a "longterm" release so it should get this patch eventually. As to whether mint will pick it up, I have no idea. I suggest filing a bug with them with a pointer to this one once this is in mainline kernels.
Comment 18 David H. Durgee 2012-07-02 17:02:49 UTC
I don't know if this is related to this issue in particular, but I just hit something when I booted maya a few minutes ago:

Jul  2 12:52:22 DG41TY kernel: [   83.492042] CIFS VFS: could not allocate crypto hmacmd5
Jul  2 12:52:22 DG41TY kernel: [   83.492045] 
Jul  2 12:52:22 DG41TY kernel: [   83.492051] CIFS VFS: could not allocate crypto hmacmd5
Jul  2 12:52:22 DG41TY kernel: [   83.492057] 
Jul  2 12:52:22 DG41TY kernel: [   83.492060] CIFS VFS: could not setup hash structures rc -110
Jul  2 12:52:22 DG41TY kernel: [   83.492071] CIFS VFS: could not setup hash structures rc -110
Jul  2 12:52:22 DG41TY kernel: [   83.492102] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Jul  2 12:52:22 DG41TY kernel: [   83.492106] IP: [<ffffffff812dc33b>] crypto_larval_kill+0x2b/0x90
Jul  2 12:52:22 DG41TY kernel: [   83.492115] PGD 133c6f067 PUD 135439067 PMD 0 
Jul  2 12:52:22 DG41TY kernel: [   83.492119] Oops: 0002 [#1] SMP 
Jul  2 12:52:22 DG41TY kernel: [   83.492122] CPU 0 
Jul  2 12:52:22 DG41TY kernel: [   83.492124] Modules linked in: rfcomm bnep bluetooth nls_utf8 cifs binfmt_misc nls_iso8859_1 nls_cp437 vfat fat nvidia(P) snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd psmouse soundcore ppdev mac_hid parport_pc snd_page_alloc serio_raw lp parport jfs floppy video sata_sil24 r8169
Jul  2 12:52:22 DG41TY kernel: [   83.492150] 
Jul  2 12:52:22 DG41TY kernel: [   83.492153] Pid: 747, comm: mount.cifs Tainted: P           O 3.2.0-23-generic #36-Ubuntu                  /DG41TY

I will upload a zip with syslog and kern.log for inspection in case you need it.

Dave
Comment 19 David H. Durgee 2012-07-02 17:03:44 UTC
Created attachment 7684 [details]
syslog and kern.log documenting cifs problem
Comment 20 Jeff Layton 2012-12-04 11:48:16 UTC
Closing this bug since the original problem was resolved.

As far as the later oops goes, that looks like a bug in the kernel's crypto API. I've seen a similar bug report against Fedora, but I don't think anyone has chased it down yet.

You may be best off reporting that on LKML or linux-crypto@vger.kernel.org.