10417 – CIFS VFS: Send error in Flush = -9

Bug 10417 - CIFS VFS: Send error in Flush = -9

Summary: CIFS VFS: Send error in Flush = -9

Status:	RESOLVED FIXED

Alias:	None

Product:	CifsVFS
Classification:	Unclassified
Component:	kernel fs (show other bugs)
Version:	2.6
Hardware:	x86 Linux

Importance:	P5 normal
Target Milestone:	---
Assignee:	Jeff Layton
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-02-05 09:10 UTC by spamact
Modified:	2014-02-06 05:32 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
Wireshark trace between client and server (2.96 MB, application/octet-stream) 2014-02-05 09:10 UTC, spamact	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description spamact 2014-02-05 09:10:55 UTC

Created attachment 9640 [details]
Wireshark trace between client and server

Hi,

I'm getting a lot of CIFS errors in my system log:

Feb  4 23:13:09 RockyUbu kernel: [ 6827.119992] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:09 RockyUbu kernel: [ 6827.709338] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:10 RockyUbu kernel: [ 6828.764272] CIFS VFS: Send error in Flush = -9

Usually 10-20 messages will appear at once like this.  This had happened occasionally when I was on Ubuntu 12.04 LTS, but since upgrading to Ubuntu 13.10 they are appearing daily.

Modinfo cifs shows cifs.ko at version 2.01, and cifs-utils is at 6.0.

The share is mounted via fstab:
//rocky/share /media/tm cifs credentials=/home/jeff/.smbcredentials,_netdev,uid=jeff,gid=jeff,file_mode=0777,dir_mode=0777,iocharset=utf8,sec+ntlm 0 0

I was asked to create a wireshark trace -- it's my first time so if it was done incorrectly then please let me know and I'll redo it.  The trace is attached, and the times in which the "Flush = -9" error occurs are as follows:

Feb  4 23:13:09 RockyUbu kernel: [ 6827.119992] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:09 RockyUbu kernel: [ 6827.709338] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:10 RockyUbu kernel: [ 6828.764272] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:11 RockyUbu kernel: [ 6829.319361] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:11 RockyUbu kernel: [ 6829.321144] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:11 RockyUbu kernel: [ 6829.323277] CIFS VFS: Send error in Flush = -9
Feb  4 23:13:11 RockyUbu kernel: [ 6829.819286] CIFS VFS: Send error in Flush = -9

In case it's relevant, both the Linux and Windows installations are virtual machines on the same Hyper-V server host.

Thanks!

Comment 1 Jeff Layton 2014-02-05 10:56:10 UTC

Ok, so I think the problem here is related to the fact that older kernels used virtual circuits incorrectly:


--------------------------------------8<------------------------------------

4159 19.305071000 192.168.1.106 -> 192.168.1.105 SMB Write AndX Request, FID: 0x8010, 65536 bytes at offset 1466368
4166 19.305292000 192.168.1.105 -> 192.168.1.106 SMB Write AndX Response, FID: 0x8010, 65536 bytes
4205 19.308942000 192.168.1.106 -> 192.168.1.105 SMB Write AndX Request, FID: 0x8010, 65536 bytes at offset 1531904
4212 19.309332000 192.168.1.105 -> 192.168.1.106 SMB Write AndX Response, FID: 0x8010, 65536 bytes
4252 19.311887000 192.168.1.106 -> 192.168.1.105 SMB Write AndX Request, FID: 0x8010, 65536 bytes at offset 1597440
4253 19.313820000 192.168.1.105 -> 192.168.1.106 SMB [TCP ACKed unseen segment] Write AndX Response, FID: 0x8010, 65536 bytes

...above here, everything is happily chugging along...

4254 19.315560000 192.168.1.106 -> 192.168.1.105 SMB Negotiate Protocol Request
4255 19.318554000 192.168.1.105 -> 192.168.1.106 SMB Negotiate Protocol Response
4256 19.318676000 192.168.1.106 -> 192.168.1.105 SMB Session Setup AndX Request, User: \timemachine
4257 19.320162000 192.168.1.105 -> 192.168.1.106 SMB Session Setup AndX Response
4258 19.320223000 192.168.1.106 -> 192.168.1.105 SMB Tree Connect AndX Request, Path: \\rocky\timemachine
4259 19.322031000 192.168.1.105 -> 192.168.1.106 SMB Tree Connect AndX Response

...at this point, a new session gets set up on a different socket. I don't see the actual TCP handshake in the capture, so I'm unclear on when that socket was established and how. In any case, the session setup request has VCnum == 0.

4305 19.322999000 192.168.1.106 -> 192.168.1.105 SMB Write AndX Request, FID: 0x8010, 65536 bytes at offset 1728512
4318 19.324054000 192.168.1.105 -> 192.168.1.106 SMB Write AndX Response, Error: STATUS_INVALID_HANDLE

...and after that, the server starts (correctly) returning STATUS_INVALID_HANDLE to requests involving that FID.

What's not clear to me is why the client decided to use VCnum == 0 on this session setup, but the point is rather moot since current mainline kernels no longer do that. I think the solution is to ensure that all of your kernels that are using cifs mounts have this patch:

commit 9ae6cf606a33b0a762798df0fb742848bcc685b5
Author: Jeff Layton <jlayton@redhat.com>
Date:   Mon Sep 16 11:23:45 2013 -0400

    cifs: stop trying to use virtual circuits

I'm going to go ahead and close this as FIXED on the assumption that it will be after you apply that patch. Please do reopen the bug if that doesn't fix it and I'll take a closer look.

Comment 2 Jeff Layton 2014-02-05 11:04:33 UTC

...and I should note that it may not be sufficient to only patch the client that's seeing these errors. If you have several clients connecting to the server through NAT, then you'll need to patch all of them. See the patch description for details as to why that is.