I'm actually unsure about sending this bug to samba of the kernel CIFS. From one of our users: $ cd /some/cifs/mount $ date --rfc-3339=ns; touch 1.touch; ls -la --full-time 1.touch;sleep 1; ls -la --full-time 1.touch 2006-05-12 12:39:06.702395000+04:00 -rw-r--r-- 1 srr srr 0 2006-05-12 12:39:06.705093120 +0400 1.touch -rw-r--r-- 1 srr srr 0 2006-05-12 12:39:07.000000000 +0400 1.touch $ cd /some/local/dir $ date --rfc-3339=ns; touch 1.touch; ls -la --full-time 1.touch;sleep 1; ls -la --full-time 1.touch 2006-05-12 12:39:53.241027000+04:00 -rw-r--r-- 1 srr srr 0 2006-05-12 12:39:53.000000000 +0400 1.touch -rw-r--r-- 1 srr srr 0 2006-05-12 12:39:53.000000000 +0400 1.touch On server side there is HPUX with samba: $ /usr/local/samba/bin/smbd -V Version 2.2.8a Because of this rounding editors reports files are modified, that is annoying. I confirmed this with samba 3.0.22 on both sides, actually.
I have the same problem with Ubuntu 6.10 Edgy, kernel 2.6.17 and 2.6.20. I tried applying cifs 1.48 to my 2.6.17 kernel as described in Steve French's last most recent comment here (https://bugzilla.samba.org/show_bug.cgi?id=4076), but it would not compile so I upgraded to kernel 2.6.20 which i believe includes cifs 1.48. The problem I'm experiencing causes gedit, vim, emacs, and eclipse to throw errors complaining that the file i am trying to save has already been modified even though it has not. I tested this from a linux cifs client and used Windows 2000, 2003, and XP as servers. Smbfs did not cause this problem. It appears from the output below that there is not just a lack of rounding happening here. On my ext3 partition (containing /tmp), file modified dates are being rounded to the nearest second. When I mount with smbfs, these dates are rounded to the nearest second. Perhaps if cifs did this as well it would not be a problem. However, something else is going on. CIFS is reporting that the modification time has changed in between two ls commands. Not only that, the time is rounded down (instead of up) on the second ls command and shows a time prior to the first ls command (15:21:42.508038200). Personally, I think this is a critical bug as I cannot use any decent editor to edit files on a cifs mount in my Windows server environment. mleclair@tdc-mleclair2:~/mnt/testsrv$ date --rfc-3339=ns; touch 1.touch; ls -la --full-time 1.touch;sleep 1;ls -la --full-time 1.touch 2007-03-28 15:21:42.331019000-04:00 -rw-r--r-- 1 mleclair Domain Users 0 2007-03-28 15:21:42.508038281 -0400 1.touch -rw-r--r-- 1 mleclair Domain Users 0 2007-03-28 15:21:42.508038200 -0400 1.touch mleclair@tdc-mleclair2:~/mnt/testsrv$ cd /tmp mleclair@tdc-mleclair2:/tmp$ date --rfc-3339=ns; touch 1.touch; ls -la --full-time 1.touch;sleep 1;ls -la --full-time 1.touch 2007-03-28 15:21:58.610199000-04:00 -rw-r--r-- 1 mleclair Domain Users 0 2007-03-28 15:21:58.000000000 -0400 1.touch -rw-r--r-- 1 mleclair Domain Users 0 2007-03-28 15:21:58.000000000 -0400 1.touch
This may be hard to resolve if ext3 is doing the rounding of the time, and ext3 does not support granular timestamps. Is there any chance that you could format a small test partition on the server as XFS (or JFS or ext4) and try the test case to a share on the XFS partition. Those three file systems (unlike ext3) support accurate file time stamps (at least 100 nanosecond granularity)
I tried this on a jfs fileystem, no such problems date --rfc-3339=ns; touch /mnt/smb_a/1.touch; ls -la --full-time /mnt/smb_a/1.touch;sleep 1; ls -la --full-time /mnt/smb_a/1.touch 2009-11-09 14:09:50.154718417-06:00 -rw-r--r-- 1 nobody nobody 0 2009-11-09 14:09:50.156501735 0600 /mnt/smb_a/1.touch -rw-r--r-- 1 nobody nobody 0 2009-11-09 14:09:50.156501000 -0600 /mnt/smb_a/1.touch
And on a ext4 filesystem date --rfc-3339=ns; touch /mnt/smb_b/1.touch; ls -la --full-time /mnt/smb_b/1.touch;sleep 1; ls -la --full-time /mnt/smb_b/1.touch 2009-11-09 18:18:25.391833219-06:00 -rw-r--r-- 1 root root 0 2009-11-09 18:18:25.485500671 -0600 /mnt/smb_b/1.touch -rw-r--r-- 1 root root 0 2009-11-09 18:18:25.485500000 -0600 /mnt/smb_b/1.touch
(this should be fine on xfs, jfs, ext4, btrfs and other fs with 100nanosecond or better timestamps, or perhaps 1 microsecond depending on how you are setting timestamps). Since cifs is caching the timestamp it sets to the server, if running on a filesystem which can't support saving accurate timestamps, you can disabling caching of stat info (echo 0 > /proc/fs/cifs/LookupCacheEnabled). Special casing ext2 and ext3 timestamps would help the client be able to recognize this - as would an operation to get the superblock time mask (over the network) but turning off metadata caching is easier.
(In reply to comment #5) an operation to get the superblock time mask (over the network) Can/would this operation/query be a part of the cifs protocol? Regarding special casing filesystems such as ext2/ext3/(probably fat) which have larger granularity, what would special casing do, set the granularity to 1 second while creating a file (inorder to assign a granual timestamp to begin with) or disable lookupcache?
I think we can't rely on the server to provide the granularity of the timestamp, so I would vote for special casing ext2/ext3/fat and set the timestamps with 1 second granularity. Working on a patch.
One difficulty with special casing ext2/ext3 (e.g. checking fstype in SMB tconx response) is that the root of the share could be ext3 but directories underneath that could be ext4 or xfs (for example) which would allow accurate timestamps. We can (eventually) detect when we pass from one volume to another (on the server) in traversing a directory hierarchy by looking at the fsid in the QueryPathInfo and QueryFileInfo responses (and e.g. could do "implicit mounts" when we cross a volume boundary to make it easier to cache these volume properties for each server volume we encounter) - we may eventually need to do that to handle duplicate inode numbers for volumes mounted under one another on a share.
So the problem is really the inconsistency in the times? If so, then it sounds like the right answer is to just refetch inode attributes after a setattr call.
(In reply to comment #9) Couple of issues, with refetch, we will be adding to some network traffic and we probably should do the (selective) fetching for ext2/ext3/fat and not for others since they have higher granularity compared to others like ext4 and jfs.
(In reply to comment #10) Really, I think the simplest solution is probably the best one here. Setattr's aren't so common that the extra network traffic will likely be noticed. It will obviously have some cost, but that's the price of correctness. I also don't see how you can special case this either. How will you be determining whether the underlying filesystem on the server supports fine-grained timestamps? How do you intend to deal with shares that span multiple filesystems?
One way to minimize the impact would be to only refetch inode attrs when one of the ATTR_?TIME flags is set. That should make it only add the extra network traffic for utimes() type calls.
It would be helpful if we had a QueryFS type of call to query the time stamp "mask" (s_time_gran), but at a minimum we should be setting s_time_gran on the client to higher value for LANMAN sessions (Win9x and OS/2). Can we tell in other ways what the filesystem type is, and simply special case ext2/ext3 (the common ones that have this problem since most everything else these days is supporting at least 100 nanosecond or better) - need to check what comes back in tcon for fstype.
Created attachment 5167 [details] revalidate inode if any of the times (atime, ctime, mtime) on the inode are set For those fs types which support unix extensions (such as ext2/ext3), we can mark inode to get revalidated if any of the times for the inode get set. I think adding one more transaction only when times are set is probably cleaner than special casing fs types.
Looks (In reply to comment #14) > Created an attachment (id=5167) [details] > revalidate inode if any of the times (atime, ctime, mtime) on the inode are set > > For those fs types which support unix extensions (such as ext2/ext3), we can > mark inode to get revalidated if any of the times for the inode get set. > > I think adding one more transaction only when times are set is probably cleaner > than special casing fs types. > Agreed. Basic approach on the patch looks reasonable, but please fix the comment format to match other multi-line comments in the kernel. You also don't need that many parenthesis in the if statement. Even better would be to change the existing "if (!rc)" to: if (rc) goto out; ...which would make that code more readable. Do we have confirmation that this patch fixes the issue?
date --rfc-3339=ns; touch /mnt/smb_a/2.touch; ls -la --full-time /mnt/smb_a/2.touch; sleep 1; ls -la --full-time /mnt/smb_a/2.touch 2010-01-13 14:13:48.966442951-06:00 -rw-r--r-- 1 1002 1002 0 2010-01-13 14:13:48.000000000 -0600 /mnt/smb_a/2.touch -rw-r--r-- 1 1002 1002 0 2010-01-13 14:13:48.000000000 -0600 /mnt/smb_a/2.touch
Created attachment 5168 [details] second version of the previous patch to fix problems due to coarser timestamps granularity Changed comment style and simplified code as per suggestions by Jeff Layton in an earlier comment.
Looks good. Maybe post it to the list to allow others to comment, but I think it probably a winner.
A reiserfs shows up as ntfs. So I am not sure whether we can rely on tree connect response to obtain a fs type reliably. Also in smb2, the tree connect response is different, there is no Extra Byte Parameters containing fs type.
Even a ext3 share exported by Samba server shows up as ntfs fs type in response to tree connect. So I a not sure we can special case ext2/ext3 fs types to set timestamp granularity somewhere.
as there are some mentionings here that ext3 only supports second resoution: with large inodes (256 bytes or more) you have nanosecond resolution even with ext3.
It's been a while since I trawled through ext3 code, but I don't think that's the case. A quick grep shows this: $ grep tv_nsec fs/ext3/* fs/ext3/inode.c: inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0; ...at one point I considered doing a patch to add finer-grained timestamps to ext3, but it was a pretty messy thing to do and ext4 came along soon afterward... In any case, it's pretty clear that we can't have the client reliably predict the timestamp granularity of the server. It may even be different for different inodes (if the share spans multiple filesystems). I think the best we can do here is something like the patch that Shirish has proposed.
I think we can close this bug now that the patch posted to fix this bug has been committed in upstream.
Closing bug as requested.