Bug 5103 - Samba read of files cause appearance of modify via NFS, plays havoc on concurrent 'make'
Summary: Samba read of files cause appearance of modify via NFS, plays havoc on concur...
Status: RESOLVED INVALID
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.10
Hardware: All Linux
: P3 normal
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-23 20:47 UTC by starlight
Modified: 2009-03-15 15:48 UTC (History)
1 user (show)

See Also:


Attachments
tar file with traces (350.00 KB, application/octet-stream)
2007-11-24 02:09 UTC, starlight
no flags Details
tar file with traces (560.00 KB, application/octet-stream)
2007-11-24 03:29 UTC, starlight
no flags Details
file lease test program (705 bytes, text/plain)
2007-11-24 12:43 UTC, starlight
no flags Details
kernel-2.6.9-55_remove_lease_get_mtime.patch (3.48 KB, patch)
2007-11-27 21:59 UTC, starlight
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description starlight 2007-11-23 20:47:03 UTC
Submitted this bug to Red Hat Bugzilla for RHEL 4.5. Not sure if 
Samba code is implicated, but submitting it here too just in 
case.  Feel free to close it if it doesn't belong here.

https://bugzilla.redhat.com/show_bug.cgi?id=396281


Description of problem:

Reading data from files over Samba share causes them to 
temporarily appear as modified when same files are examined via 
NFS share.  Does not happen if cached 'ls' data is present.

Results in spurious rebuilding of output files when 'make' is 
run concurrently from Samba and NFS shares.  Note that in our 
environment objects for different platforms are kept in separate 
directories in a manner similar to the way 'gcc' builds work. 
The fictitious source file timestamp modifications are what
cause the problem.

Version-Release number of selected component (if applicable):

Samba+NFS server, x86_64
  samba-3.0.10-1.4E.12.2
  kernel-smp-2.6.9-55.EL

Samba client
  Windows 2003 R2 SP2 X64, current patches

NFS client, i686
  kernel-2.6.9-55.EL

How reproducible:

1) place a bunch of files in a directory
   (verified this with 'gzip' source)
2) issue 'touch time_mark'
3) create Samba share of directory
4) create NFSv3 share of directory
5) issue 'find . -type f -print >/dev/null'
   in directory via Samba client
   (alternately 'make -n' could be run)
6) issue 'find . -type f -newer time_mark -ls'
   in directory via NFS client
7) observe that all files are reported as modified!!!
8) repeat (6) after 'acregmax' interval and observe
   that files are no longer reported as modified

Actual results:

In practice this has been causing 'make' to rebuild things
that it should not.  Putting it nicely, it's been driving
us nuts for many months.

Expected results:

Obviously, the files were never modified and should not
be reported as such.

Additional info:

If you run 'ls -l' in the directory via the NFS share, the 
problem is cured.  You may have to unmount and remount the share 
to bring back the bug.

Before we figured this out, tried NFSv4 for awhile.  Seemed like 
it fixed the above issue but it's not certain.  NFSv4 + PoPToP 
started hosing our system so it's gone and I have no further 
patience or time for more experiments.  Determined the problem
appears for both 'i686' and 'x86_64' servers, so it's
a portable bug.

Further details:

does not happen at top-level of NFSv3 mount, must be in subdirectory

if any directories are touched by the 'find' on Samba share, problem is suppressed

does not happen when files reside on 'ext2' file system

does happen when files reside on 'ext3' file system
  have 'noreservation' option active
  too lazy to try it with 'noreservation' turned off

tried mounting Samba share remotely with CIFS
  problem does not happen with 'find' or 'cat' run on CIFS mount

files accessed via Windows UNC path, not mapped drive

obviously CYGWIN is in use on the Windows system

can produce problem with MKS, but a 'cat' is also required:
  find . -type f -print | xargs cat >/dev/null

can also produce the problem using CMD shell and 'type \\server\...\file' command
   CMD 'dir' doesn't work, probably because directory file is examined

does happen with even tiny number of files

does happen with WXP SP2 32-bit Samba client
Comment 1 Jeremy Allison 2007-11-23 23:49:19 UTC
So if I read this correctly, using a Windows client and using a Cygwin mount to read files on a Samba share modifies the time ? I'm guessing that we're only obeying the modification requests from the client, as we don't arbitrarily modify file timestamps on our own.

To debug - firstly use the latest released Samba (3.0.27a) so we know the codebase you're using, then run the command in a directory with only one file (to minimize the log info) with Samba running with debug level 10, and also attach a wireshark trace of this activity.

Jeremy.
Comment 2 starlight 2007-11-24 02:07:57 UTC
Not exactly.  What's happening is that reading a file over a Samba
share from a Windows system (with or without CYGWIN), via a UNC path
will cause the file to *appear* to have been modified from the POV
of a NFS mount, but only for a few seconds until the NFS cache entry
times-out.  It's convoluted--took months to finally figure out what
was causing the spurious 'make' target building.

Built and installed 3.0.27a, reproduced the problem and attached
the requested trace information.  The file of interest is 'djl0'.
First a Windows CMD 'type' command is used to access the file
at around packet 26.  Then a NFS remote 'find' is run on the
directory where 'djl0' is located.  The 'find' shows 'djl0' with
modify time of "Nov 24 02:46" even though the actual modify time
of the file is "Nov 24 02:20".
Comment 3 starlight 2007-11-24 02:09:48 UTC
Created attachment 2994 [details]
tar file with traces
Comment 4 starlight 2007-11-24 02:43:37 UTC
Looking at NFS trace, can't see the file names from
the 'find' command.  Perhaps NFS has some special
differential encoding or similar approach.

However the first "Regular File" packet is the one
that shows the target.  Packet 68 in the trace.
The 'mtime' shown is incorrect.
Comment 5 starlight 2007-11-24 03:29:28 UTC
Created attachment 2995 [details]
tar file with traces

Better trace here.  Remounted the NFS file share
so now the names appear instead of just the NFS handles.


# Windows 'type' access of file

04:00:41
SMB access of 'djl0' at packet 21
    Last Write: Nov 24, 2007 02:20:44.000000000

# first 'find' invocation

04:00:45
NFS access of 'djl0' at packet 97
    mtime: Nov 24, 2007 04:00:45.000000000

04:00:45
NFS access of 'djl0' at packet 107
    mtime: Nov 24, 2007 04:00:45.000000000

# second 'find' invocation

04:00:54
NFS access of 'djl0' at packet 176
    mtime: Nov 24, 2007 02:20:44.000000000
Comment 6 starlight 2007-11-24 12:43:21 UTC
Created attachment 2996 [details]
file lease test program

Examined Samba traces and discerned that the cause of the problem:

Samba issues

   fcntl(fd, F_SETLEASE, F_WRLCK);

on files that are opened for read access.  The Linux kernel
set the modify time as current when the activity trace is
put in place and then restores the modify time once it's
released.  Reproduced this with the attached test case.

Seems to me this is a kernel bug.  File modify time should
not be altered until an actual modify occurs.
Comment 7 Volker Lendecke 2007-11-24 12:46:08 UTC
Wow. That's thorough analysis.

Volker
Comment 8 starlight 2007-11-24 12:59:34 UTC
Thank you.

The interesting bit is that the altered modify time is visible 
*only* via NFS.  On the local system the modify time of the file 
remains as the correct value for the duration the the lease is 
held.  This definitely is a NFS/kernel bug, so I'm closing the
report--Samba clearly is not the direct cause of the issue.
Comment 9 starlight 2007-11-26 14:47:58 UTC
The Plot Thickens
=================

It turns out the observed behavior is by design; full details at

http://bugzilla.kernel.org/show_bug.cgi?id=9454

I don't fully understand all the interactions, so I'm pasting 
comments from Bruce Fields, a 'nfsd' maintainer.  Also leaving 
the bug closed, as it seems more appropriate for the Samba team 
to determine the future status of this issue.

>------- Comment  #5 From bfields@fieldses.org  2007-11-26 08:57:56  [reply] -------
>
>This is by design--see fs/locks.c:lease_get_mtime().
>
>The argument is: if somebody has a write lease on the file (are 
>you exporting this via Samba?  That'd be the typical user), then 
>they're caching writes--they're explicitly telling us that we do 
>not know whether the file is still the same, because we may have 
>modified it on a remote client and not told us about it.  So 
>lease_get_mtime() reports the current time as the mtime, 
>prompting you to actually try opening the file, at which point 
>the write lease gets broken and any cached writes get flushed 
>out.
>
>It's a terrible kludge, I agree, and maybe we should remove it.  
>But I'd first like to understand what circumstances prompted 
>smoebody to add it originally, and talk to the Samba people 
>about how they're using these write leases.
>
>(In NFSv4, by the way, there's a callback to the client to allow 
>the server to find out the attributes of a file that the client 
>holding a write lease is caching, which solves this problem.  We 
>haven't implemented that yet; it seems likely to be an enormous 
>pain.  I wonder if Samba would want something similar?)
>
>
>------- Comment #6 From starlight@binnacle.cx 2007-11-26 10:17:08 [reply] -------
>
>Yes, using a Samba mount along with NFS mounts to 'make' build
>an application on multiple platforms simultaneously.  When
>re-building existing trees the result is spurious target 
>building, a real problem.  Took months of frustration to
>figure out what was going on.
>
>Actually reported this as a bug to Samba along with RH, and a
>request for info from the Samba team provoked the analysis that
>isolated the problem.  Closed the Samba bug but it's all still
>there:
>
>https://bugzilla.samba.org/show_bug.cgi?id=5103
>
>My impression from looking at the Samba code is that the leases
>are put in place so that files can be monitored for changes,
>but that's just a guess.  Perhaps Samba can use a read lease
>instead of a write lease.
>
>
>------- Comment #7 From bfields@fieldses.org 2007-11-26 10:55:17 [reply] -------
>
>Thanks for the pointer to the samba bug.  I understand the 
>frustration.
>
>My feeling is that: yes, it's suboptimal (but perhaps not a bug) 
>for samba to be requesting a write lease when a read lease 
>(sufficient to alert it to modifications of the file) would be 
>enough.
>
>It seems to me that the real bug is the incomplete lease 
>implementation--if the purpose of a lease is to allow a remote 
>client to cache writes to the file, then there's no way for us 
>to give a sensible answer to a stat call, unless we break the 
>lease first.
>
>Perhaps we could find out (in the samba case) what the 
>consequences would be of not updating mtime in this case?  I 
>suppose the worst case would be that a modification to a file 
>made on a samba client could be indefinitely delayed from being 
>flushed to the server.
>
>If we agree that that would be less of a problem than these 
>spurious bumps in the mtime, then the best solution for now may 
>just be to rip out lease_get_mtime().  I'll cook up a straw-man 
>patch....
>
>
>------- Comment #8 From bfields@fieldses.org 2007-11-26 10:56:16 [reply] -------
>
>Created an attachment (id=13763) [details]
>remove lease_get_mtime
>
>Not sure if this is what we want to do, but it's at least 
>something you could test to confirm the source of the problem.
>
>
>------- Comment #9 From starlight@binnacle.cx 2007-11-26 
>11:13:01 [reply] -------
>
>Thanks!  Happen to have a kernel source tree set up for the 
>Centos/RH images running on the server, so I can try this out in 
>the next week or so.  Seems certain to work.
>
>In the interim perhaps you could start a dialog with Samba about 
>the reasoning behind the application of the F_WRLCK?  Or if you 
>wish I could do so using the Samaba bug report cited above.
>
Comment 10 starlight 2007-11-27 21:59:32 UTC
Created attachment 3008 [details]
kernel-2.6.9-55_remove_lease_get_mtime.patch

Adapted patch for RHEL 4.5 2.6.9-55 kernel (attached).  Works as 
expected and eliminates the spurious target rebuilding when a 
concurrent Samaba and NFS 'make' is run on same source tree. 
Test case also shows expected behavior.

Obviously did not do any regression testing with other Samba 
file scenarios.  In our build trees the same files and 
directories are never written by more than one server.
Comment 11 starlight 2007-11-27 23:45:37 UTC
Concise description of patch rationale from original:

>From: J. Bruce Fields <>
>Date: Mon, 26 Nov 2007 13:48:34 -0500
>Subject: [PATCH] nfsd: stop incrementing mtime on presence of write lease
>
>The lease_get_mtime() function has the effect of setting the mtime of a
>file (as far as any nfs client is concerned) to the current time,
>whenever there is a write lease held on the file.
>
>The presence of a write lease may mean that some client (almost
>certainly a samba client) is caching writes to that file.  Thus
>increasing the mtime has the effect of making the nfs client read data
>from that file, thus opening the file for read, thus breaking the write
>lease, thus forcing the samba client to flush any cached writes.
>
>However Samba seems to be requesting write leases even when a read lease
>would do.  And unfortunately the consequences of the spurious mtime
>updates--make unnecessarily rebuilding, for example--may be worse than
>the consequences of no mtime update--modifications from a samba client
>taking longer to be noticed on the nfs client.
>
>So perhaps we should rip out lease_get_mtime().
Comment 12 starlight 2008-04-25 15:12:41 UTC
On suggestion of the RH NFS developer, tried
"kernel oplocks = no" in the Samba config
and it does work around the issue.
Comment 13 s 2009-03-11 15:33:20 UTC
The NFSv3 problem only relates to F_WRLCK leases.  Non-exclusive F_RDLCK leases don't interfere with nfs mtime.

Samba currently always uses F_WRLCK and has no option for F_RDLCK.

The following patch makes samba reject exclusive oplocks, but accept level 2 oplocks and implement them using F_RDLCK.

This allows NFSv3 and samba to both share the same filesystem without splurious mtime problems, without performance penalties of completely disabling oplocks and without the data integrity problems of disabling kernel oplocks.

Maybe there could be an option along these lines for samba?

diff -rup samba-3.2.5-orig/source/smbd/open.c samba-3.2.5/source/smbd/open.c
--- samba-3.2.5-orig/source/smbd/open.c 2009-03-10 12:34:51.000000000 +1300
+++ samba-3.2.5/source/smbd/open.c      2009-03-10 13:19:07.000000000 +1300
@@ -1842,7 +1842,16 @@ NTSTATUS open_file_ntcreate(connection_s
            (fsp->oplock_type != FAKE_LEVEL_II_OPLOCK)) {
                if (!set_file_oplock(fsp, fsp->oplock_type)) {
                        /* Could not get the kernel oplock */
-                       fsp->oplock_type = NO_OPLOCK;
+                       if (EXCLUSIVE_OPLOCK_TYPE(fsp->oplock_type)) {
+                               /* try readonly... */
+                               fsp->oplock_type = LEVEL_II_OPLOCK;
+                               if (!set_file_oplock(fsp, fsp->oplock_type)) {
+                                       fsp->oplock_type = NO_OPLOCK;
+                               }
+
+                       } else {
+                               fsp->oplock_type = NO_OPLOCK;
+                       }
                }
        }

diff -rup samba-3.2.5-orig/source/smbd/oplock_linux.c samba-3.2.5/source/smbd/oplock_linux.c
--- samba-3.2.5-orig/source/smbd/oplock_linux.c 2009-03-10 12:34:51.000000000 +1300
+++ samba-3.2.5/source/smbd/oplock_linux.c      2009-03-12 08:22:06.000000000 +1300
@@ -130,7 +130,12 @@ static files_struct *linux_oplock_receiv

 static bool linux_set_kernel_oplock(files_struct *fsp, int oplock_type)
 {
-       if ( SMB_VFS_LINUX_SETLEASE(fsp, F_WRLCK) == -1) {
+       if (EXCLUSIVE_OPLOCK_TYPE(oplock_type)) {
+               return False; /* don't grant exclusive/write locks as they
+                        interfere with mtime reporting through NFSv3 */
+
+       }
+       if ( SMB_VFS_LINUX_SETLEASE(fsp, F_RDLCK) == -1) {
                DEBUG(3,("linux_set_kernel_oplock: Refused oplock on file %s, "
                         "fd = %d, file_id = %s. (%s)\n",
                         fsp->fsp_name, fsp->fh->fd,
Comment 14 Volker Lendecke 2009-03-15 04:12:05 UTC
I think this patch should go a bit further: Couldn't we look at whether the client actually wants write access to the file and request a RDLK if that is not the case? Would that make sense?

Volker
Comment 15 s 2009-03-15 15:35:56 UTC
I have a patch to do that, but it is completely untested and also over 200 lines - I don't know if there would be copyright issues?

Ultimately it doesn't solve the splurious mtime nfs issue, as samba will still grant WRLCK when requested by the client, which triggers the problem.
Comment 16 Volker Lendecke 2009-03-15 15:48:10 UTC
Copyright issues? In what sense do you have issues with the copyright? Do have reasons not to submit patches under your personal copyright? If you can submit patches on your own, it would be best if you sent a git format-patch formatted patch to samba-technical@samba.org, so that it's really publically visible.

Second, I would like to see that patch. I think Tim Prouty should also comment here, he has done a lot of oplock work lately.

Third, if retrieving RDLK is a workaround for deficient kernel behaviour, I would like to see this in a VFS module and not in the core Samba code. This would require the oplock operations to be passed through the VFS though. But as the VFS has grown like mad lately, probably that doesn't matter anymore :-)

Volker