Bug 7381 - "Weird" (according to user) CIFS packet fragmentation
"Weird" (according to user) CIFS packet fragmentation
Status: NEW
Product: CifsVFS
Classification: Unclassified
Component: kernel fs
2.6
Other Linux
: P3 normal
: ---
Assigned To: Steve French
http://bugs.debian.org/578503
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-21 13:09 UTC by Debian samba package maintainers (PUBLIC MAILING LIST)
Modified: 2012-04-06 11:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Debian samba package maintainers (PUBLIC MAILING LIST) 2010-04-21 13:09:27 UTC
We got this bug report in Debian, against cifs-utils (actually against "smbfs" and more precisely mount.cifs). I'm unsure whether it belongs to user space tools or kernel fs...

User bug report:

I'm trying to optimize my CIFS network traffic for jumbo frames and either I'm mis-using TCPDUMP or CIFS is doing something funky.

My MTU is 7000 (due to other device limits on the network) the actual machine can do 9000.

Here are the mount options in use:

uid=1000,gid=34,file_mode=0660,dir_mode=0770,user=guest,password=password,dom=DOMAIN,rsize=56000,wsize=56000

When I copy a large file (test file is about 250mb) this is what I get from TCPDUMP (just looking at the traffic FROM my client).

12:03:56.511357 IP (tos 0x0, ttl 64, id 30413, offset 0, flags [DF], proto TCP (6), length 48688) mysql.here.lan.41013 > readynas1.here.lan.microsoft-ds: . 
230432486:230481122(48636) ack 221076 win 143 <nop,nop,timestamp 93748678 89470853>
12:03:56.511365 IP (tos 0x0, ttl 64, id 30420, offset 0, flags [DF], proto TCP (6), length 4732) mysql.here.lan.41013 > readynas1.here.lan.microsoft-ds: P 
230481122:230485802(4680) ack 221076 win 143 <nop,nop,timestamp 93748678 89470853>
12:03:56.512915 IP (tos 0x0, ttl 64, id 30421, offset 0, flags [DF], proto TCP (6), length 48688) mysql.here.lan.41013 > readynas1.here.lan.microsoft-ds: . 
230485802:230534438(48636) ack 221127 win 143 <nop,nop,timestamp 93748678 89470855>
12:03:56.512923 IP (tos 0x0, ttl 64, id 30428, offset 0, flags [DF], proto TCP (6), length 4732) mysql.here.lan.41013 > readynas1.here.lan.microsoft-ds: P 
230534438:230539118(4680) ack 221127 win 143 <nop,nop,timestamp 93748678 89470855>
12:03:56.514478 IP (tos 0x0, ttl 64, id 30429, offset 0, flags [DF], proto TCP (6), length 48688) mysql.here.lan.41013 > readynas1.here.lan.microsoft-ds: . 
230539118:230587754(48636) ack 221178 win 143 <nop,nop,timestamp 93748678 89470856>
12:03:56.514488 IP (tos 0x0, ttl 64, id 30436, offset 0, flags [DF], proto TCP (6), length 4732) mysql.here.lan.41013 > readynas1.here.lan.microsoft-ds: P 
230587754:230592434(4680) ack 221178 win 143 <nop,nop,timestamp 93748678 89470856>

It does that repeatedly, large packet small packet, large packet small packet they add up to 53420 which is around the figure I entered. 

So trying to optimise the fragmentation I asjust the buffer sizes ... I get exactly the same packet sizes for 55500, 55000, and for 54000 

If I change the rsize and wsize options to this:

rsize=53000,wsize=53000

I get:


12:34:20.723729 IP (tos 0x0, ttl 64, id 7367, offset 0, flags [DF], proto TCP (6), length 48688) mysql.here.lan.42583 > readynas1.here.lan.microsoft-ds: . 
163903334:163951970(48636) ack 170484 win 143 <nop,nop,timestamp 93931099 91295031>
12:34:20.723738 IP (tos 0x0, ttl 64, id 7374, offset 0, flags [DF], proto TCP (6), length 636) mysql.here.lan.42583 > readynas1.here.lan.microsoft-ds: P 
163951970:163952554(584) ack 170484 win 143 <nop,nop,timestamp 93931099 91295031>
12:34:20.725203 IP (tos 0x0, ttl 64, id 7375, offset 0, flags [DF], proto TCP (6), length 48688) mysql.here.lan.42583 > readynas1.here.lan.microsoft-ds: . 
163952554:164001190(48636) ack 170535 win 143 <nop,nop,timestamp 93931099 91295033>
12:34:20.725211 IP (tos 0x0, ttl 64, id 7382, offset 0, flags [DF], proto TCP (6), length 636) mysql.here.lan.42583 > readynas1.here.lan.microsoft-ds: P 
164001190:164001774(584) ack 170535 win 143 <nop,nop,timestamp 93931099 91295033>
12:34:20.726919 IP (tos 0x0, ttl 64, id 7383, offset 0, flags [DF], proto TCP (6), length 48688) mysql.here.lan.42583 > readynas1.here.lan.microsoft-ds: . 
164001774:164050410(48636) ack 170586 win 143 <nop,nop,timestamp 93931100 91295034>
12:34:20.726928 IP (tos 0x0, ttl 64, id 7390, offset 0, flags [DF], proto TCP (6), length 636) mysql.here.lan.42583 > readynas1.here.lan.microsoft-ds: P 
164050410:164050994(584) ack 170586 win 143 <nop,nop,timestamp 93931100 91295034>

So it looks like it's close to optimal another small tweak and the litter fragment will go away? but no ... I get the same for 52000.

But for 51000 I get:

2:37:26.101798 IP (tos 0x0, ttl 64, id 24548, offset 0, flags [DF], proto TCP (6), length 34792) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: . 
229316714:229351454(34740) ack 238263 win 143 <nop,nop,timestamp 93949637 91480406>
12:37:26.102380 IP (tos 0x0, ttl 64, id 24553, offset 0, flags [DF], proto TCP (6), length 13948) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: . 
229351454:229365350(13896) ack 238263 win 143 <nop,nop,timestamp 93949637 91480406>
12:37:26.102630 IP (tos 0x0, ttl 64, id 24555, offset 0, flags [DF], proto TCP (6), length 636) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: P 
229365350:229365934(584) ack 238263 win 143 <nop,nop,timestamp 93949637 91480407>
12:37:26.103304 IP (tos 0x0, ttl 64, id 24556, offset 0, flags [DF], proto TCP (6), length 34792) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: . 
229365934:229400674(34740) ack 238314 win 143 <nop,nop,timestamp 93949637 91480407>
12:37:26.103881 IP (tos 0x0, ttl 64, id 24561, offset 0, flags [DF], proto TCP (6), length 13948) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: . 
229400674:229414570(13896) ack 238314 win 143 <nop,nop,timestamp 93949637 91480408>
12:37:26.104119 IP (tos 0x0, ttl 64, id 24563, offset 0, flags [DF], proto TCP (6), length 636) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: P 
229414570:229415154(584) ack 238314 win 143 <nop,nop,timestamp 93949637 91480408>
12:37:26.104800 IP (tos 0x0, ttl 64, id 24564, offset 0, flags [DF], proto TCP (6), length 34792) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: . 
229415154:229449894(34740) ack 238365 win 143 <nop,nop,timestamp 93949637 91480409>
12:37:26.105378 IP (tos 0x0, ttl 64, id 24569, offset 0, flags [DF], proto TCP (6), length 13948) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: . 
229449894:229463790(13896) ack 238365 win 143 <nop,nop,timestamp 93949637 91480409>
12:37:26.105616 IP (tos 0x0, ttl 64, id 24571, offset 0, flags [DF], proto TCP (6), length 636) mysql.here.lan.45347 > readynas1.here.lan.microsoft-ds: P 
229463790:229464374(584) ack 238365 win 143 <nop,nop,timestamp 93949637 91480410>

Now it's getting really silly.  Three different fragments repeating that just makes no sense at all.

Jumping a few bytes (I don't have all day)...

If I change the rsize and wsize options to this:

rsize=32768,wsize=32768

I get:

12:20:02.845407 IP (tos 0x0, ttl 64, id 55326, offset 0, flags [DF], proto TCP (6), length 27844) mysql.here.lan.49160 > readynas1.here.lan.microsoft-ds: . 
208279482:208307274(27792) ack 324147 win 143 <nop,nop,timestamp 93845311 90437169>
12:20:02.845414 IP (tos 0x0, ttl 64, id 55330, offset 0, flags [DF], proto TCP (6), length 5096) mysql.here.lan.49160 > readynas1.here.lan.microsoft-ds: P 
208307274:208312318(5044) ack 324147 win 143 <nop,nop,timestamp 93845311 90437169>
12:20:02.846528 IP (tos 0x0, ttl 64, id 55331, offset 0, flags [DF], proto TCP (6), length 27844) mysql.here.lan.49160 > readynas1.here.lan.microsoft-ds: . 
208312318:208340110(27792) ack 324198 win 143 <nop,nop,timestamp 93845312 90437170>
12:20:02.846536 IP (tos 0x0, ttl 64, id 55335, offset 0, flags [DF], proto TCP (6), length 5096) mysql.here.lan.49160 > readynas1.here.lan.microsoft-ds: P 
208340110:208345154(5044) ack 324198 win 143 <nop,nop,timestamp 93845312 90437170>
12:20:02.847664 IP (tos 0x0, ttl 64, id 55336, offset 0, flags [DF], proto TCP (6), length 27844) mysql.here.lan.49160 > readynas1.here.lan.microsoft-ds: . 
208345154:208372946(27792) ack 324249 win 143 <nop,nop,timestamp 93845312 90437171>
12:20:02.847671 IP (tos 0x0, ttl 64, id 55340, offset 0, flags [DF], proto TCP (6), length 5096) mysql.here.lan.49160 > readynas1.here.lan.microsoft-ds: P 
208372946:208377990(5044) ack 324249 win 143 <nop,nop,timestamp 93845312 90437171>

Still the same problem.

If I change the rsize and wsize options to this:

rsize=7000,wsize=7000

I get:

12:08:32.673994 IP (tos 0x0, ttl 64, id 62896, offset 0, flags [DF], proto TCP (6), length 4216) mysql.here.lan.53542 > readynas1.here.lan.microsoft-ds: P 
220118102:220122266(4164) ack 2696616 win 143 <nop,nop,timestamp 93776294 89747010>
12:08:32.673492 IP (tos 0x0, ttl 64, id 62895, offset 0, flags [DF], proto TCP (6), length 4216) mysql.here.lan.53542 > readynas1.here.lan.microsoft-ds: P
220122266:220126430(4164) ack 2696667 win 143 <nop,nop,timestamp 93776294 89747010>
12:08:32.673994 IP (tos 0x0, ttl 64, id 62896, offset 0, flags [DF], proto TCP (6), length 4216) mysql.here.lan.53542 > readynas1.here.lan.microsoft-ds: P
220126430:220130594(4164) ack 2696718 win 143 <nop,nop,timestamp 93776294 89747011>
12:08:32.674492 IP (tos 0x0, ttl 64, id 62897, offset 0, flags [DF], proto TCP (6), length 4216) mysql.here.lan.53542 > readynas1.here.lan.microsoft-ds: P
220130594:220134758(4164) ack 2696769 win 143 <nop,nop,timestamp 93776294 89747011>
2:08:32.674492 IP (tos 0x0, ttl 64, id 62897, offset 0, flags [DF], proto TCP (6), length 4216) mysql.here.lan.53542 > readynas1.here.lan.microsoft-ds: P 
220130594:220134758(4164) ack 2696769 win 143 <nop,nop,timestamp 93776294 89747011>

Nice consistant sized packets but much smaller than they could be so is this the most efficient setting?...

I have two questions.  

1) Is it really sending massive packets? e.g. 48688 byte packet? That makes no sense, my MTU is 7000.  I think this must be me reading TCPDUMP wrongly.
2) My real bug (which I would like fixed) would be that CIFS seems to not consider the MTU at all when looking at packet sizes.  I have read that the internal 
buffers are all based upon multiples of 4096.  But surely it should consider the MTU of the interface it's listening on (there is only one in this instance
maybe if there is more than one it could make sure the buffers are a nice multiple of them all).  Maybe the old behaviour is still optimal on 1500 MTU
but if the MTU is larger perhaps it could detect and make sure it fragments packets sensibly? So at least it is possible to get optimal network throughput.

Thanks.

Dave Fennell <dave at unluckypixie dotcom>
Comment 1 perwool 2011-08-31 08:10:10 UTC
As I see, no hot discussion about this bug is here, neither the appologize, neither the buggless future is promissed. 

I experimented with Jumbo packet, too. I found that the server setting does not have a noticable influence - the server runs simultaneously more services (nfs, httpd ... , and samba) but the client side, running on Linux (Debian Squeeze, Wheezy, Fedora 15,14,11 wee tested), crashes nearly to freeze if only use any other mtu setting of the default 1500. 

The Windows client (XP prof was the only available) does not mention any changes if the mtu is changing - does not freeze neither improves the speed.

Most frequent error notice appeared in <dmesg> is 

CIFS VFS: Send error in read = -11 

I do not have any idea how to explain this behavior. The switch was originaly suspected because it does not supported Jumbo frames. But the bug was reproduced and tested even on the direct server client cable connect 

perwool