Bug 5498 - CIFS 1.51 - Kernel BUG during umount, invalid opcode
Summary: CIFS 1.51 - Kernel BUG during umount, invalid opcode
Status: RESOLVED FIXED
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: x64 Linux
: P3 major
Target Milestone: ---
Assignee: Steve French
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-28 16:32 UTC by David Cardimino
Modified: 2008-08-27 09:44 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Cardimino 2008-05-28 16:32:55 UTC
I'm using a 2.6.18 (Fedora 5) kernel, modified to use the 1.50c CIFS code available on http://us1.samba.org/samba/Linux_CIFS_client.html (since FC5 included 1.45 of CIFS).  While unmounting a windows share, I got the following:

May 28 14:01:03 asteroids kernel: ----------- [cut here ] --------- [please bite here ] ---------
May 28 14:01:03 asteroids kernel: Kernel BUG at include/linux/mm.h:304
May 28 14:01:03 asteroids kernel: invalid opcode: 0000 [1] SMP
May 28 14:01:03 asteroids kernel: last sysfs file: /class/net/eth0/address
May 28 14:01:03 asteroids kernel: CPU 1
May 28 14:01:03 asteroids kernel: Modules linked in: cifs(U) nls_utf8 ipv6 video sbs i2c_ec button battery asus_acpi ac lp parport_pc parport uhci_hc
d ehci_hcd sg tg3 serio_raw i2c_i801 i2c_core ide_cd cdrom shpchp pcspkr dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ata_piix libata sd_mod scsi_mod
May 28 14:01:03 asteroids kernel: Pid: 17599, comm: umount.cifs Tainted: PF     2.6.18-1.2258.fc5 #1
May 28 14:01:03 asteroids kernel: RIP: 0010:[<ffffffff8022de2e>]  [<ffffffff8022de2e>] __free_pages+0x7/0x2b
May 28 14:01:03 asteroids kernel: RSP: 0018:ffff81001cef5df0  EFLAGS: 00010246
May 28 14:01:03 asteroids kernel: RAX: 0000000000000000 RBX: ffff810021bef7d0 RCX: 0000000000000003
May 28 14:01:03 asteroids kernel: RDX: 0000000000711d80 RSI: 0000000000000001 RDI: ffff810001712d80
May 28 14:01:03 asteroids kernel: RBP: ffff81006ac89dc0 R08: ffff81001cef4000 R09: ffff81007ffbf080
May 28 14:01:03 asteroids kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000010
May 28 14:01:03 asteroids kernel: R13: ffff810021bef7d0 R14: ffff81007d5bc8c0 R15: 0000000000000000
May 28 14:01:03 asteroids kernel: FS:  00002aaaaaab1200(0000) GS:ffff81007ffba9c0(0000) knlGS:0000000000000000
May 28 14:01:03 asteroids kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 28 14:01:03 asteroids kernel: CR2: 00002aaaae1b2000 CR3: 0000000019483000 CR4: 00000000000006e0
May 28 14:01:03 asteroids kernel: Process umount.cifs (pid: 17599, threadinfo ffff81001cef4000, task ffff81001e0337d0)
May 28 14:01:03 asteroids kernel: Stack:  ffffffff8025a1ae ffff810021bef7d0 ffffffff802997fe 0000000000000000
May 28 14:01:03 asteroids kernel:  ffffffff882a449d ffff81006ac89dc0 ffff810041351800 ffff8100413518a0
May 28 14:01:03 asteroids kernel:  0000000000000000 00007fffe0356b25 ffffffff882983be 0000000000000000
May 28 14:01:03 asteroids kernel: Call Trace:
May 28 14:01:03 asteroids kernel:  [<ffffffff8025a1ae>] free_task+0x12/0x22
May 28 14:01:03 asteroids kernel:  [<ffffffff802997fe>] kthread_stop+0x4c/0x79
May 28 14:01:03 asteroids kernel:  [<ffffffff882a449d>] :cifs:cifs_umount+0x141/0x213
May 28 14:01:03 asteroids kernel:  [<ffffffff882983be>] :cifs:cifs_put_super+0x51/0x86
May 28 14:01:03 asteroids kernel:  [<ffffffff802d0481>] generic_shutdown_super+0x79/0xfd
May 28 14:01:03 asteroids kernel:  [<ffffffff802d0548>] kill_anon_super+0x9/0x36
May 28 14:01:03 asteroids kernel:  [<ffffffff802d05fc>] deactivate_super+0x6c/0x84
May 28 14:01:03 asteroids kernel:  [<ffffffff802d9059>] sys_umount+0x246/0x28a
May 28 14:01:03 asteroids kernel:  [<ffffffff8025c74e>] system_call+0x7e/0x83
May 28 14:01:03 asteroids kernel: DWARF2 unwinder stuck at system_call+0x7e/0x83
May 28 14:01:03 asteroids kernel: Leftover inexact backtrace:
May 28 14:01:03 asteroids kernel:
May 28 14:01:03 asteroids kernel:
May 28 14:01:03 asteroids kernel: Code: 0f 0b 68 bb c2 47 80 c2 30 01 f0 ff 4f 08 0f 94 c0 84 c0 74
May 28 14:01:03 asteroids kernel: RIP  [<ffffffff8022de2e>] __free_pages+0x7/0x2b
May 28 14:01:03 asteroids kernel:  RSP <ffff81001cef5df0>
May 28 14:01:03 asteroids kernel:  BUG: warning at kernel/exit.c:852/do_exit() (Tainted: PF    )
May 28 14:01:03 asteroids kernel:
May 28 14:01:03 asteroids kernel: Call Trace:
May 28 14:01:03 asteroids kernel:  [<ffffffff802698ed>] show_trace+0x34/0x47
May 28 14:01:03 asteroids kernel:  [<ffffffff80269912>] dump_stack+0x12/0x17
May 28 14:01:03 asteroids kernel:  [<ffffffff80214e8a>] do_exit+0x58/0x927
May 28 14:01:03 asteroids kernel:  [<ffffffff80269c05>] kernel_math_error+0x0/0x90
May 28 14:01:03 asteroids kernel:
----------------------------------------------------------------------

I noticed a similar problem in cifs_mount that was fixed in commit 28356a1679006b110215596e057f304ef3083922 (Fix oops on failed cifs mount, in kthread_stop).  So, I updated cifs_umount, and the problem seems to have went away.  I noticed that this is not in the latest version on git.kernel.org.  Is there a better fix for this issue?

--- a/connect.c 2007-09-20 15:46:02.000000000 -0400
+++ b/connect.c 2008-05-28 17:16:33.000000000 -0400
@@ -3588,7 +3588,8 @@ cifs_umount(struct super_block *sb, stru
                                cFYI(1, ("Waking up socket by sending signal"));
                                if (cifsd_task) {
                                        force_sig(SIGKILL, cifsd_task);
-                                       kthread_stop(cifsd_task);
+                                       if (ses->server->tsk)
+                                               kthread_stop(ses->server->tsk);
                                }
                                rc = 0;
                        } /* else - we have an smb session
Comment 1 Jeff Layton 2008-08-26 15:55:50 UTC
I suspect that this is now fixed in current kernels since cifsd now waits for kthread_stop before exiting. Please let us know if you can reproduce this on something more recent.
Comment 2 Jeff Layton 2008-08-26 18:02:51 UTC
Ahh as to your question -- yes, there is a better fix in place. While the patch you have there helps, it's still a bit racy. It's possible to check the tsk var there and then have the thread exit on another CPU before we call kthread_stop on it.

A better fix is in place now. cifsd now goes to sleep until kthread_stop is called so we don't need to check that the tsk var is non-NULL.

I think we can probably close this case.
Comment 3 Jeff Layton 2008-08-27 09:44:28 UTC
Closing as FIXED, please reopen if it isn't...