I am seeing a significant performance hit when using CIFS vs. NFS. I suspect this is a bug, since the throughput difference is so extreme.
I know this is not a very good test to demonstrate this; however it does show the same difference seen when copying files via CIFS instead of NFS.
Both of these mount points are on the same netapp. Client is a RHEL4 u3 machine. (Physical machine is a Dell 2850 with 4GB of ram and 2 3.2GHz xeons and 2 gig interfaces, NetApp is on a local LAN)
time dd if=/dev/zero of=/nfs/.test bs=1024 count=250000 # real 0m27.267s
time dd if=/dev/zero of=/cifs/.test bs=1024 count=250000 # real 11m14.242s
I have also tried smaller block sizes as well. I get almost the same results with about the same difference in times.
I have tried all combinations of rsize/wsize and that does not appear to solve this bug. I suggest it is a bug, since nobody else seems to be having this issue, or at least noticing it.
I am also opening a case with NetApp just in case this is a known issue with their CIFS code in combination with Linux clients.
We also see these performance differences. In general, we see about 4x (in some cases 14x) better performance on NFS than using mount.smbfs (and before you say it, I realize that's not mount.cifs), and we see about 10x better performance using mount.smbfs over mount.cifs.
The CIFS share is on an EMC Celerra, the NFS share is on the same Celerra running against the same raid group (at different times, obviously). The performance benchmarking was done using bonnie++ several times.
The mount.cifs version we're using is 1.5, the kernel is 2.6.9-34.ELsmp for x86_64 (Centos 4.3).
We found that enabling the directio option made the performance significantly better (which, after reading the manpage makes some sense, given our network setup). Performance was still worse than NFS, but was better than using the mount.smbfs option.
My suspicion is that this is mostly because the NFS RPC engine is fundamentally asynchronous whereas CIFS is not -- a single client task can only do one read or write operation at a time.
I suspect that changing that would help read/write performance significantly.
How do we address this bug?
Off the top of my head...
There needs to be a way to allow a single task to issue multiple writes in parallel. Steve mentioned something about using slow_work threads for that, but I think you can probably do this by having cifsd handle the write responses. That'll take care of most of the perf problems, I suspect...
There are other problems too -- having a way to coalesce the async write requests would be good as well.
When writes are completed, the pages should be marked as unstable (like NFS does) and only fully considered clean after an over the wire fsync call.
I just want to share my latest test results with newer samba versions as I am on the way to build up a new server with samba and ldap authentication that should (sometime) replace our old insecure nis/nfs setup.
Samba version used: 3.5.3
I tested many smb.conf and client settings (vanilla and tweaked) but the top speed I could get out of cifs or smbclient (using different kernels from 2.6.26 to 2.6.35rc3) was around 35MB/s (I get ~100MB/s+ with nfs). Using Windows as client resulted in speeds of ~80MB/s which would be acceptable.
Read and write performance should be much improved now in 3.3 kernels and later. CIFS now does asynchronous reads and writes. Reads and writes are done directly
into and out of the pagecache too, so we're no longer constrained by the
standard cifs buffer sizes.
At this point, I'm going to go ahead and call this RESOLVED FIXED. Please
reopen if you are still seeing poor performance.