Bug 7407 - Bug on umounting share served by Samba 3.5.2 unless noserverino is used (shrink_dcache_for_umount)
Summary: Bug on umounting share served by Samba 3.5.2 unless noserverino is used (shri...
Status: RESOLVED FIXED
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: x64 Linux
: P3 normal
Target Milestone: ---
Assignee: Jeff Layton
QA Contact:
URL:
Keywords:
: 7405 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-05-04 04:07 UTC by Eike Hein
Modified: 2010-07-24 05:55 UTC (History)
3 users (show)

See Also:


Attachments
patch -- disable use of server inode numbers by default (1.77 KB, patch)
2010-05-04 11:40 UTC, Jeff Layton
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eike Hein 2010-05-04 04:07:28 UTC
This bug report is an off-shoot from the following mailing list thread, as requested by Jeff Layton: http://lists.samba.org/archive/linux-cifs-client/2010-May/005951.html


--- The problem ---

Since about 2.6.31 or so I have a problem with unmounting a CIFS share that prevents my system from shutting down successfully.

The problem also occurs when unmounting a share while the system is up. In both cases, the share must have been used before trying to unmount it, e.g. using mplayer to play a video file found on the share, then quitting mplayer and then trying to unmount the share.

The shutdown failure means that the system stops during shutdown with a trace like the one below on screen and then won't power down.

If the share has not been used prior to unmounting/shuttind down, there is no problem.

Here is a dmesg snippet from unmounting the share during normal system operation:

BUG: Dentry ffff88001d8f95c8{i=2,n=/} still in use (2) [unmount of cifs
cifs]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:670!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/dmi/id/bios_vendor
CPU 0
Pid: 2338, comm: umount Tainted: P           2.6.33.2-57.fc13.x86_64 #1
P5B-Deluxe/System Product Name
RIP: 0010:[<ffffffff81130f9c>]  [<ffffffff81130f9c>]
shrink_dcache_for_umount_subtree+0x205/0x2eb
RSP: 0018:ffff88001b699e08  EFLAGS: 00010292
RAX: 0000000000000054 RBX: ffff88001d8f95c8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000246
RBP: ffff88001b699e38 R08: 000000000000ac81 R09: ffffffff8107ca78
R10: ffffffff81a56ef8 R11: 0000000000000000 R12: ffff88001d8f95c8
R13: 0000000000000174 R14: ffff88001d86d668 R15: ffff88001b699f18
FS:  00007f68a7908740(0000) GS:ffff880004a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f2e775f4000 CR3: 000000001b672000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount (pid: 2338, threadinfo ffff88001b698000, task
ffff880015828000)
Stack:
 ffff880079e8ae38 0000000000000246 ffff880079e8a968 ffffffffa0dcf690
<0> ffff880079e8a968 ffffffff81ba3ce0 ffff88001b699e58 ffffffff811310be
<0> ffff880079e8aa30 ffff880079e8a968 ffff88001b699e78 ffffffff81121f0d
Call Trace:
 [<ffffffff811310be>] shrink_dcache_for_umount+0x3c/0x4c
 [<ffffffff81121f0d>] generic_shutdown_super+0x1f/0xce
 [<ffffffff81122011>] kill_anon_super+0x16/0x55
 [<ffffffff811223dd>] deactivate_super+0x6d/0x82
 [<ffffffff81136cbf>] mntput_no_expire+0xbd/0xfa
 [<ffffffff811372a6>] sys_umount+0x2c0/0x2ef
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 d0 04 00 00
48 89 de 48 c7 c7 34 d4 78 81 48 89 04 24 31 c0 e8 af 6a 34 00 <0f> 0b
eb fe 4c 8b 63 68 4c 39 e3 75 05 45 31 e4 eb 05 f0 41 ff
RIP  [<ffffffff81130f9c>] shrink_dcache_for_umount_subtree+0x205/0x2eb
 RSP <ffff88001b699e08>
---[ end trace 15c19ec0d5874bcd ]---


--- The setup ---

My client system is a x64_64 Fedora 13 system running a kernel identifying itself as 2.6.33.3-72.fc13.x86_64. The cifs-utils package that contains the mount helper is version 4.3-2.fc13.

The server is a i686 Fedora 13 system - kernel 2.6.33.2-57.fc13.i686. The Samba server is version 3.5.2-59.fc13.

The fstab entry for the CIFS share in question is:
//ehs1/ehs1 /mnt/ehs1 cifs
file_mode=0644,dir_mode=0755,uid=500,gid=500,username=<username>,password=<password>
0 0

The shared folder on the server system, /ehs1, is the mount point of a distinct partition on the server's primary hard drive, next to the partition mounted as /.

It contains several folders which are the mount point for bind mounts (i.e. -o bind) from folders on other volumes mounted into the server system at other places. /ehs1/backup for example is a bind mount from /mnt/raid/backup, where /mnt/raid is the mount point for an MD raid1. There are several such bind mounts in the shared folder.

The video file I'm accessing with mplayer for the test is located directly in /ehs1, however, not in any of the bind-mounted folders.


--- The Workaround ---

As suggested by Jeff Layton in the mailing list thread linked at the top, the problem disappears when the share is mounted with "noserverino". When mounted with "noserverino", the share umounts cleanly and hitch-free, both during system operation as well as during shutdown.

However, if the bug has ever been triggered during system operation, the system will not shut down cleanly, even if the share is subsequently remounted with "noserverino". If left mounted (with noserverino, after previously triggering the bug by mounting without noserverino and umounting), shutdown will get stuck with "Unmounting CIFS filesystems: /mnt/ehs1 is not mounted". If unmounted prior to shutdown, it gets past "Unmounting CIFS filesystems" but gets stuck with a mount error 22 ("invalid argument").

The following two mails contain more detailed step-by-step test protocols for various scenarios:
http://lists.samba.org/archive/linux-cifs-client/2010-May/005955.html
http://lists.samba.org/archive/linux-cifs-client/2010-May/005956.html
Comment 1 Jeff Layton 2010-05-04 04:15:28 UTC
cc'ing Steve and Suresh. Suresh, did you ever make any progress on this? If so could you summarize what you found?
Comment 2 Suresh Jayaraman (mail address is dead) 2010-05-04 07:55:59 UTC
(In reply to comment #1)
> cc'ing Steve and Suresh. Suresh, did you ever make any progress on this? If so
> could you summarize what you found?
> 

I thought the recent page cache revalidation fixes would have fixed this. Apparently it does not seem to. From what I remember, the problem started appearing when we introduced autodisabling serverino change. But, I have not figured out why..

Since it has becomes easily reproducible now, I'll take a look.
Comment 3 Jeff Layton 2010-05-04 08:31:44 UTC
Ok, I think I understand what the problem is...

We have an inode number collision. As Eike says, he has 2 filesystems...one mounted on the other and a samba share that encompasses both filesystems. In most cases, the root inode on a ext3/4 filesystem is "2". So in this configuration, we end up with 2 inodes with the same inode number (both the root of the actual filesystem and the root of the submount).

I'm not sure what the right fix for this is yet. Is there a way to detect when we've crossed a mountpoint on the server? If so, we could conceivably do a new submount (similar to how we handle DFS) when we're crossing the mountpoint.
Comment 4 Jeff Layton 2010-05-04 11:16:36 UTC
*** Bug 7405 has been marked as a duplicate of this bug. ***
Comment 5 Jeff Layton 2010-05-04 11:20:08 UTC
Unfortunately, the QUERY_PATH_INFO buffer has fields for major/minor dev, but they are sort of intended for use with char/block device inodes. IOW, they correspond to the st_rdev field in a stat() call.

Even if the server did send adequate info to detect when we cross a mountpoint on the server, we still have the problem that a lot of servers in the field don't. I think the only reasonable fix we can do is to disable server inode numbers by default.
Comment 6 Jeff Layton 2010-05-04 11:40:40 UTC
Created attachment 5683 [details]
patch -- disable use of server inode numbers by default

This patch is probably what we need to do in the near term. I'd like to see us come up with a scheme that allows us to reenable this by default, but the immediate need is to prevent kernel panics.

I think this is actually a samba bug -- it should give us unique ID's that are actually unique over the scope of the share, but fixing it there is problematic and the nature of the fix there is still under discussion.
Comment 7 Suresh Jayaraman (mail address is dead) 2010-05-05 07:05:09 UTC
Good catch!

A few questions.. may be naive..
Does CIFS by protocol support mount point crossing? I guess, yes, because Samba I think can mount virtually anything (including NFS mounts). OTOH, NFS need not handle this as typically mount point crossing is not supported.

Making servers set appropriate major/minor numbers and handling at the client accordingly sounds like a real fix. Any idea what other servers are currently doing or not doing? I think windows cannot mount a filesystem inside a directory and we can safely rule out. Similarly if servers cannot do or already doing (if we can verify this with server implementations) we can safely fix Samba and the cifs client?
Comment 8 Jeff Layton 2010-05-05 10:05:47 UTC
(In reply to comment #7)
> Good catch!
> 
> A few questions.. may be naive..
> Does CIFS by protocol support mount point crossing? I guess, yes, because Samba
> I think can mount virtually anything (including NFS mounts). OTOH, NFS need not
> handle this as typically mount point crossing is not supported.
> 

The protocol doesn't have a clean way to indicate when you've crossed a mountpoint on the server. NFS does this by looking to see if the parent and child's fsid don't match. If they don't then they know a mountpoint on the server has been crossed.

> Making servers set appropriate major/minor numbers and handling at the client
> accordingly sounds like a real fix. Any idea what other servers are currently
> doing or not doing? I think windows cannot mount a filesystem inside a
> directory and we can safely rule out. Similarly if servers cannot do or already
> doing (if we can verify this with server implementations) we can safely fix
> Samba and the cifs client?
> 

According to the SNIA spec, the server is supposed to send a "uniqueid" in the field where it currently sends the inode number. The uniqueid is supposed to be unique across the scope of the share, which can span more than one server-side filesystem. Samba is currently broken in this regard and Jeremy has asked me to open a new bug for him to fix it.

Regardless though, all of the servers deployed in the field currently are broken in this regard and we'll have to deal with them some way. The simple fix is to simply default to "noserverino" again. Doing something more complex is also possible but I'm not sure that we should try to be fancy in -rc6.
Comment 9 Jeff Layton 2010-05-06 07:30:29 UTC
The samba bug to fix the uniqueid is bug 7410. 
Comment 10 Jeff Layton 2010-05-06 07:34:24 UTC
I've also proposed a patch upstream to disable serverino for now. I think it's the safest short term fix until we can deal with the possibility for inode number collisions and changes in a better way. 
Comment 11 Jeff Layton 2010-05-07 12:14:35 UTC
I've posted a less invasive patch today. It disables serverino whenever a problematic situation on the server is discovered. Patch and discussion are here. We'll probably also push this to stable:

http://lists.samba.org/archive/linux-cifs-client/2010-May/005981.html
Comment 12 Steve French 2010-05-11 12:14:55 UTC
Fix merged into cifs-2.6.git
Comment 13 Jeff Layton 2010-05-11 12:58:40 UTC
Now that I think about this, I wonder whether this patch may be too aggressive still. It's possible that the VM could shrink the dcache but leave the inode in the cache. In that event, we might find a directory inode that's not really hardlinked.

I wonder whether we should also check to see whether we have a non-empty i_dentry list? i.e. do something like this in cifs_find_inode instead of what's in the original patch:

   if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {

...we might still be fooled by this anyway, by renaming trickery on the server, but that should still help somewhat.
Comment 14 Suresh Jayaraman (mail address is dead) 2010-05-27 12:20:37 UTC
One perhaps important observation:

I'm able to reproduce this error only when I try running the reproducer with 'root' user on the server and a normal user on the 'client'. I could not reproduce this with same user on both the server and the client with Jeff's recent patches. So it might have to do something with the "permission denied" errors (from touch and echo) on the client.
Comment 15 Suresh Jayaraman (mail address is dead) 2010-05-28 03:32:57 UTC
Scratch my earlier comment (that was meant for a different bugzilla, sorry).
Comment 16 Jeff Layton 2010-07-24 05:55:50 UTC
Patches for this are now in mainline. Closing case.