Bug 4312 - CIFS VFS vs EMC Celerra. File Access hangs with RFC1001 size vis-a-vis SMB mismatches
Summary: CIFS VFS vs EMC Celerra. File Access hangs with RFC1001 size vis-a-vis SMB mi...
Status: CLOSED FIXED
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: x64 Linux
: P3 critical
Target Milestone: ---
Assignee: Steve French
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-27 16:04 UTC by Pavel May
Modified: 2007-07-16 12:10 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel May 2006-12-27 16:04:21 UTC
Upgraded to FC6's 2.6.18-1.2849, CIFS mounts are fine, but trying to access the files on the CIFS-exported EMC Celerra's yields these:

[108173.066923]  CIFS VFS: RFC1001 size 135 bigger than SMB for Mid=17
[108228.062878]  CIFS VFS: server not responding
[108228.062890]  CIFS VFS: No response for cmd 162 mid 17

Kernel buffer has these:

 CIFS VFS: RFC1001 size 135 bigger than SMB for Mid=4895
Bad SMB: : dump of 48 bytes of data at 0xc74a8900
 00000087 424d53ff 000000a2 c0018000 . . . . ??S M B ??. . . . . . ??
                                                                      00000000 00000000 00000000 3
 131f003f 0000ff2a 00430100 00000001 ? . . . * ??. . . . C . . . . .
 CIFS VFS: RFC1001 size 135 bigger than SMB for Mid=4897
Bad SMB: : dump of 48 bytes of data at 0xc74a8900
 00000087 424d53ff 000000a2 c0018000 . . . . ??S M B ??. . . . . . ??
                                                                      00000000 00000000 00000000 5
 1321003f 0000ff2a 00420000 00000001 ? . ! . * ??. . . . B . . . . .
 CIFS VFS: RFC1001 size 135 bigger than SMB for Mid=4896
Bad SMB: : dump of 48 bytes of data at 0xc74a8900
 00000087 424d53ff 000000a2 c0018000 . . . . ??S M B ??. . . . . . ??
                                                                      00000000 00000000 00000000 3
 1320003f 0000ff2a 00400000 00000001 ? .   . * ??. . . . @ . . . . .
 CIFS VFS: RFC1001 size 135 bigger than SMB for Mid=4898
Bad SMB: : dump of 48 bytes of data at 0xc74a8900
 00000087 424d53ff 000000a2 c0018000 . . . . ??S M B ??. . . . . . ??
                                                                      00000000 00000000 00000000 4
 1322003f 0000ff2a 00410000 00000001 ? . " . * ??. . . . A . . . . .
 CIFS VFS: RFC1001 size 135 bigger than SMB for Mid=4899
Bad SMB: : dump of 48 bytes of data at 0xc74a8900
 00000087 424d53ff 000000a2 c0018000 . . . . ??S M B ??. . . . . . ??
                                                                      00000000 00000000 00000000 5
 1323003f 0000ff2a 00440000 00000001 ? . # . * ??. . . . D . . . . .
 CIFS VFS: server not responding
 CIFS VFS: No response for cmd 162 mid 4894
 CIFS VFS: No response for cmd 162 mid 4895
 CIFS VFS: No response for cmd 162 mid 4896
 CIFS VFS: No response for cmd 162 mid 4897
 CIFS VFS: No response for cmd 162 mid 4898


Looking for further info on the errors' meanings and/or workarounds.
Comment 1 Pavel May 2006-12-27 16:05:13 UTC
The issue is not exhibited by the 2.6.15 series of FC kernels.
Comment 2 Mark Komarinski 2007-03-30 13:56:27 UTC
We're seeing the same problem all the way up to the current FC6 kernel (2.6.20-1.2933) trying to access an EMC CIFS export.

Access sometimes works on the first file, but subsequent access to that file or other files just hangs.
Comment 3 Derek Spransy 2007-03-30 15:16:28 UTC
We have run into this same issue just this week.  Here is some additional information that I've gathered after doing some packet traces on affected and unaffected clients.

We've seen essentially the same errors noted in the original bug report.  When a read operation is done from a Windows share, everything works as expected.  The MIDs mentioned in these errors match the MIDs for the read commands sent by the client.  Also, in each case the NetBIOS session header records a length of 135.  Making this connection, I then looked at the packets generating these errors.  Each time these errors are logged, the Celerra is sending the FID of the requested file to the client workstation.  The clients then freeze for a while, tear down the current connection, negotiates a new connection, and then attempt the same read operation.  In each case when the packet containing the FID causes a problem it has 32 bytes of extra (seemingly random garbage) data tacked onto the end of the packet.  Wireshark (0.99.5) cannot decode this data, and it isn't present in FID packets sent by Windows servers.  The last field of an SMB command header is the byte count field.  This extra 32 bytes come after that field.  In each case where the read fails, the FID packet is 205 bytes in length.  In each case where the read is successful, the FID packet size is 173 bytes (a difference of +32 bytes).  
	The Celerra also generates the following error messages when a client attempts to read a file:

2007-03-30 15:25:19: SMB: 3:  Client=10.0.0.1 OS='Linux version 2.6.20-1.2925.fc6', LM='CIFS VFS Client for Linux' not registered capa=0xd0dc (R=8/8) 
2007-03-30 15:25:19: SMB: 3:  Client=10.0.0.1 OS=Linux version 2.6.20-1.2925.fc6 LM=CIFS VFS Client for Linux Extra=- type=- (1) 
2007-03-30 15:25:19: SMB: 3:  Client=10.0.0.1 OS='Linux version 2.6.20-1.2925.fc6', LM='CIFS VFS Client for Linux' not registered capa=0xd0dc (R=8/8) 
2007-03-30 15:25:19: SMB: 3:  Client=10.0.0.1 OS=Linux version 2.6.20-1.2925.fc6 LM=CIFS VFS Client for Linux Extra=- type=- (1) 

	We've seen the following affected clients: Fedora Core 6 (2.6.20 Kernel, 1.47 CIFS) SuSE 10.2 (2.6.18 Kernel, 1.45 CIFS).
Comment 4 Pavel May 2007-03-30 16:11:00 UTC
Attempts to get help from EMC resulted in EMC's statement of "We do not support CIFS/SMB access to the device by the Linux clients", followed by the generous offer to generate a quote for support via their professional services organization. Their offer was gracefully declined.
Comment 5 Derek Spransy 2007-04-13 09:34:17 UTC
According to EMC this particular issue has been fixed in DART version 5.5.27.5.  We have not tried the fix yet as our storage engineers want to wait until the next (non-patched) maintenance release (5.5.28).  When we update our Celerra I will report back on this bug report.  Does anyone else want to give 5.5.27.5 a shot?
Comment 6 Mark Komarinski 2007-05-07 09:07:44 UTC
Our storage group installed DART 5.5.28-1 on our EMC server we can now smbmount and view files on FC6 (2.6.20-1.2948.fc6) without issue.
Comment 7 Derek Spransy 2007-05-07 09:53:13 UTC
Our storage group will apply 5.5.28 during our June maintenance window.  I'll report back on our findings then.

(In reply to comment #6)
> Our storage group installed DART 5.5.28-1 on our EMC server we can now smbmount
> and view files on FC6 (2.6.20-1.2948.fc6) without issue.
> 

Comment 8 Derek Spransy 2007-07-16 09:38:01 UTC
We had a problem with the upgrade scripts in June, but the update has now been applied successfully.  It appears that 5.5.28 has also solved the problems that we were seeing.
Comment 9 Pavel May 2007-07-16 12:10:00 UTC
So. Looks like a code upgrade to 5.5.27-1 does the job as well.
Marking the bug report as "FIXED" since upgrading EMC DART 5.5.27-1 or later works for multiple reports.