Bug 5738 - Open a OOo file on a CIFS share is very slow since 2.6.25+
Summary: Open a OOo file on a CIFS share is very slow since 2.6.25+
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: x86 Linux
: P3 major
Target Milestone: ---
Assignee: Steve French
QA Contact:
Depends on:
Reported: 2008-09-05 08:58 UTC by Jan-Marek Glogowski
Modified: 2008-09-19 12:01 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Jan-Marek Glogowski 2008-09-05 08:58:29 UTC
I have a NetApp filer, which I use with CIFS from linux.
The clients OS is Debian Etch with quite some backports, especially with an own kernel, KDE and OOo. The smbfs package is the one delivered with Etch: 3.0.24-6etch10. 
I'm using OOo 2.4.1.

With my previous kernel locking wasn't working perfect, but OOo was showing an IO/error message, if a second user opened the file - see examples below. Sometimes OOo was even showing the document as read-only, which is the correct handling.

I want / need to switch to, but now opening the file from OOo on the CIFS share is very slow. First testers thought that OOo had completely hung up, but if one waits long enought (20 minutes, I guess), OOo starts up correctly and shows the document.

The test document just has 6kb - a single line of text.

As a workaround I tested

1. a forward port from 2.6.24 to 2.6.26 (CIFS v1.52) - I just had to apply 4 patches:
  - cifs-no-iget-ce634ab28e7dbcc13ebe6e7bc5bc7de4f8def4c8.patch
  - cifs-pagecache-zeroing-eebd2aa355692afaf9906f62118620f1a1c19dbb.patch
  - cifs-remove-proc-root-fs-36a5aeb8787fbf92510ed20d806e229c55726f93.patch
  - cifs-sane-umount-begin-42faad99658eed7ca8bd328ffa4bcb7d78c9bcca.patch

  this compiles but has the same problem

2. a backport from 2.6.27-rc5-git6 (CIFS v1.54):
  - cifs-DFS-connects-inode-with-dfs-handling.patch
  - cifs-pass-path-to-do-add-mount.diff
  - cifs-drop-kmem-cache-argument-from-constructor.patch
  - cifs-sanitize-permission-prototype-patch
  - cifs-generic_llseek.diff
  - cifs-tryloc-page-rename.diff

  same problem

But it seems that the locking is working "correctly". If I open the document with the 2.6.26 kernel, and switch to the other computer running 2.6.24, I get the OOo IO error message.

So my guess would be, that some kernel semantics have changed, as the old module has the same problem with the new kernel.

There is also a strange OOo phenomenon:

P1: Open document - ok
P2: Open document - kind of fails - you get the OOo filter selection dialog. I guess OOo opens a zero size document - on selection you get the IO error msg.
P2: Open document again - see previous.

On the other hand:

P1: Open document - ok
P2: Open document - fails with filter dialog
P1: Close document
P2: Open document - ok
P1: Open document - ok, opend in read-only mode

When mounted I get this debug info:

# cat /proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
CIFS Version 1.52
Active VFS Requests: 0
1) Name:  Domain: TVC.MUENCHEN.DE Mounts: 1 OS: Windows 5.0
        NOS: Windows 2000 LAN Manager   Capability: 0xd3fd
        SMB session status: 1   TCP status: 1
        Local Users To Server: 1 SecMode: 0x3 Req On Wire: 0

1) \\netapp-fas250/\cifs Uses: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x4000f
PathComponentMax: 255 Status: 1 type: DISK

The module is compiled with the kernel config (make -C /usr/src/linux-headers-2.6.26-1-686 M=`pwd`)

All tested kernels have the following CIFS config:

# CONFIG_CIFS_STATS is not set
# CONFIG_CIFS_DEBUG2 is not set

I also tried to enabled CONFIG_CIFS_EXPERIMENTAL, but this didn't help or change anything.

strace'ing OOo didn't help me understand what's going on. soffice calls a lot of sched_yield, when blocked and I can see some polls on the unix socket to X timeouts every few thousend yields. The log is more the 100 MB - I can send a compressed file or put it on some webspace, if needed.
Comment 1 Jan-Marek Glogowski 2008-09-05 09:45:11 UTC
From investigating the OOo strace, it seems, that the process is actually waiting for a futex, which always returns ETIMEDOUT.
Comment 2 Jan-Marek Glogowski 2008-09-19 12:01:21 UTC
This is actually a problem of the combination

1. on-access virus scanner via dazuko
2. OOo locking 

If I disable dazuko via kernel cmdline, locking works fast again. The strange OOo lock handling still doesn't change, but compared to the FS stalls it's a minor problem.