Bug 8219 - SMB Panic from Windows 7 Client
Summary: SMB Panic from Windows 7 Client
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: SMB2 (show other bugs)
Version: 3.6.0rc2
Hardware: x64 Windows 7
: P5 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 8279
  Show dependency treegraph
 
Reported: 2011-06-10 14:57 UTC by Eric Solheim
Modified: 2011-06-30 17:36 UTC (History)
2 users (show)

See Also:


Attachments
Log File for Windows 7 machine with smb_panic (538.87 KB, application/x-gzip-compressed)
2011-06-10 14:57 UTC, Eric Solheim
no flags Details
smb.conf file, configuration, and latest crash (19.72 KB, application/x-zip-compressed)
2011-06-14 12:47 UTC, Eric Solheim
no flags Details
Log File for latest backtrace (274.40 KB, application/octet-stream)
2011-06-27 16:15 UTC, Eric Solheim
no flags Details
Valgrind Output process manually killed (555.48 KB, text/plain)
2011-06-27 17:14 UTC, Eric Solheim
no flags Details
git-am fix for 3.6.0 (1.15 KB, patch)
2011-06-29 00:50 UTC, Jeremy Allison
metze: review+
Details
New logs after patch is applied (49.42 KB, application/x-gzip-compressed)
2011-06-29 14:27 UTC, Eric Solheim
no flags Details
Second part of the fix for this bug. git-am for 3.6.0 (1004 bytes, patch)
2011-06-29 16:52 UTC, Jeremy Allison
no flags Details
Corrected second part of the git-am patch for 3.6.0 (1.02 KB, patch)
2011-06-29 16:59 UTC, Jeremy Allison
metze: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Solheim 2011-06-10 14:57:57 UTC
Created attachment 6562 [details]
Log File for Windows 7 machine with smb_panic

Samba Server machine Running RHEL 2.6.18-238.9.1.el5
Client Machine Windows 7 Enterprise x64

The Windows7 client will connect then directory arrow starts flashing and loses each directory until it can no longer access the share.

Connection is unavailable in further attempts to access share.

Windows XP clients connect fine even after share dies for Windows 7 clients.

Attached Level 10 Debug, causes smb_panic.
Fault Report starts here 2011/06/10 10:00:26.237377
1) Samba was started Windows7 atttempted to connect.
2) Client connection crashed.
3) Attempted to connect again.

Bug exists in Both RC1 and RC2 and does not exist in version 3.5.6

(Using the samba checklist the failure occurs on the last step)
Comment 1 Jeremy Allison 2011-06-10 17:18:04 UTC
Can you add the line:

panic action = /bin/sleep 999999

then reproduce the crash, and attach with gdb to the parent process of the sleep and type "bt" to get a backtrace ?

This would help enormously in tracking this down.

Thanks !

Jeremy.
Comment 2 Jeremy Allison 2011-06-10 18:03:49 UTC
For anyone else examining this (Metze, Vl ?) here is the relevant part of the log.

--------------------------------------------------------------

2011/06/10 10:00:26.236558, 10] smbd/smb2_server.c:1087(smbd_smb2_request_dispatch)

  smbd_smb2_request_dispatch: opcode[SMB2_NOTIFY] mid = 20

[2011/06/10 10:00:26.236636,  4] smbd/uid.c:348(change_to_user)

  Skipping user change - already user

[2011/06/10 10:00:26.236681, 10] smbd/smb2_notify.c:220(smbd_smb2_notify_send)

  smbd_smb2_notify_send: file_id[0xFFFFFFFFFFFFFFFF]

[2011/06/10 10:00:26.236736,  3] smbd/smb2_notify.c:254(smbd_smb2_notify_send)

  smbd_smb2_notify_send: notify change called on gomalley, filter = FILE_NAME|DIR_NAME|ATTRIBUTES|LAST_WRITE, recursive = 0

[2011/06/10 10:00:26.236792, 10] lib/dbwrap_tdb.c:102(db_tdb_fetch_locked)

  Locking key 6E6F7469667920617272

[2011/06/10 10:00:26.236841, 10] lib/dbwrap_tdb.c:131(db_tdb_fetch_locked)

  Allocated locked data 0x0x87203d0

[2011/06/10 10:00:26.236930, 10] smbd/notify_inotify.c:416(inotify_watch)

  inotify_add_watch for /home/gomalley mask 210003c6 returned wd 1

[2011/06/10 10:00:26.236976, 10] lib/dbwrap_tdb.c:44(db_tdb_record_destr)

  Unlocking key 6E6F7469667920617272

[2011/06/10 10:00:26.237027, 10] smbd/notify.c:228(change_notify_add_request)

  change_notify_add_request: Adding request for gomalley: max_param = 32

  smbd_smb2_request_pending_queue: req->current_idx = 4

  	req->in.vector[0].iov_len = 4

  	req->in.vector[1].iov_len = 64

  	req->in.vector[2].iov_len = 56

  	req->in.vector[3].iov_len = 104

  	req->in.vector[4].iov_len = 64

  	req->in.vector[5].iov_len = 32

  	req->in.vector[6].iov_len = 0

  	req->out.vector[0].iov_len = 4

  	req->out.vector[1].iov_len = 64

  	req->out.vector[2].iov_len = 88

  	req->out.vector[3].iov_len = 88

  	req->out.vector[4].iov_len = 64

  	req->out.vector[5].iov_len = 8

  	req->out.vector[6].iov_len = 0

[2011/06/10 10:00:26.237324, 10] smbd/smb2_server.c:472(smb2_set_operation_credit)

  smb2_set_operation_credit: requested 1, granted 1, total granted 30

[2011/06/10 10:00:26.237377,  0] lib/fault.c:47(fault_report)

  ===============================================================

[2011/06/10 10:00:26.237420,  0] lib/fault.c:48(fault_report)

  INTERNAL ERROR: Signal 11 in pid 19980 (3.6.0rc2)

  Please read the Trouble-Shooting section of the Samba3-HOWTO

[2011/06/10 10:00:26.237478,  0] lib/fault.c:50(fault_report)

  

  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf

[2011/06/10 10:00:26.237533,  0] lib/fault.c:51(fault_report)

  ===============================================================

[2011/06/10 10:00:26.237579,  0] lib/util.c:1116(smb_panic)

  PANIC (pid 19980): internal error

[2011/06/10 10:00:26.240754,  0] lib/util.c:1220(log_stack_trace)

  BACKTRACE: 17 stack frames:

   #0 /opt/samba/sbin/smbd(log_stack_trace+0x2d) [0x873fb5]

   #1 /opt/samba/sbin/smbd(smb_panic+0x7c) [0x8740e7]

   #2 /opt/samba/sbin/smbd [0x86476e]

   #3 [0xdb9420]

   #4 /opt/samba/sbin/smbd(smbd_smb2_request_process_notify+0x6cc) [0x5cce6c]

   #5 /opt/samba/sbin/smbd(smbd_smb2_request_dispatch+0xe87) [0x5bd7d7]

   #6 /opt/samba/sbin/smbd(smbd_smb2_request_dispatch_immediate+0xbe) [0x5bf342]

   #7 /opt/samba/sbin/smbd(tevent_common_loop_immediate+0x111) [0x88525d]

   #8 /opt/samba/sbin/smbd(run_events_poll+0x3e) [0x883425]

   #9 /opt/samba/sbin/smbd(smbd_process+0x1313) [0x5aa83b]

   #10 /opt/samba/sbin/smbd [0xb402b1]

   #11 /opt/samba/sbin/smbd(run_events_poll+0x3f9) [0x8837e0]

   #12 /opt/samba/sbin/smbd [0x883c3d]

   #13 /opt/samba/sbin/smbd(_tevent_loop_once+0x9d) [0x88409d]

   #14 /opt/samba/sbin/smbd(main+0x16f5) [0xb41eba]

   #15 /lib/libc.so.6(__libc_start_main+0xdc) [0x25be9c]

   #16 /opt/samba/sbin/smbd [0x524e41]
Comment 3 Jeremy Allison 2011-06-10 18:21:35 UTC
Any information on how you reproduced this (smb.conf etc.) would also be very interesting.

Thanks !

Jeremy.
Comment 4 Jeremy Allison 2011-06-10 18:22:32 UTC
Re-assigning to Metze for this week as I'll be on vacation.

Thanks Metze.

Jeremy.
Comment 5 Eric Solheim 2011-06-10 18:27:32 UTC
[root@claridge samba]# ps -ef | grep mb
root     25526     1  0 14:15 ?        00:00:00 /opt/samba/sbin/smbd -D
root     25528 25526  0 14:15 ?        00:00:00 /opt/samba/sbin/smbd -D
root     25658     1  0 14:16 ?        00:00:00 /opt/samba/sbin/nmbd -D
root     25660     1  0 14:16 ?        00:00:00 /opt/samba/sbin/winbindd
root     25662 25660  0 14:16 ?        00:00:00 /opt/samba/sbin/winbindd
root     25701 25660  0 14:17 ?        00:00:00 /opt/samba/sbin/winbindd
root     25702 25660  0 14:17 ?        00:00:00 /opt/samba/sbin/winbindd
gomalley 25708 25526  0 14:17 ?        00:00:00 /opt/samba/sbin/smbd -D
root     25724 25255  0 14:18 pts/2    00:00:00 grep mb
[root@claridge samba]# gdb /opt/samba/sbin/smbd 25708
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/samba/sbin/smbd...(no debugging symbols
found)...done.
Attaching to program: /opt/samba/sbin/smbd, process 25708
Cannot access memory at address 0x89028c03
A program is being debugged already.  Kill it? (y or n) n
Program not killed.
(gdb) bt
#0  0x00110402 in smb_vfs_call_aio_cancel ()
#1  0x00d8e833 in ?? ()
#2  0x00e51ff4 in ?? ()
#3  0x00d3319b in ?? ()
#4  0x00006470 in ?? ()
#5  0xbfe8b1c8 in ?? ()
#6  0x00000000 in ?? ()
(gdb)
Comment 6 Jeremy Allison 2011-06-10 18:33:28 UTC
Can you load up the debug symbols as well please, we don't get much without them.
Thanks!

Jeremy.
Comment 7 Eric Solheim 2011-06-14 12:47:12 UTC
Created attachment 6583 [details]
smb.conf file, configuration, and latest crash
Comment 8 Eric Solheim 2011-06-14 12:48:58 UTC
(In reply to comment #7)
> Created attachment 6583 [details]
> smb.conf file, configuration, and latest crash

Below is the gdb output with symbols enabled, I have also attached the smb.conf file and configuration steps used.

[root@claridge samba]# ps -ef | grep sleep
root     26176 26172  0 08:33 ?        00:00:00 /bin/sleep 999999
root     26196 30989  0 08:34 pts/6    00:00:00 grep sleep
[root@claridge samba]# ps -ef | grep 26172
gomalley 26172 26143  0 08:33 ?        00:00:00 /opt/samba/sbin/smbd -D
root     26176 26172  0 08:33 ?        00:00:00 /bin/sleep 999999
root     26203 30989  0 08:34 pts/6    00:00:00 grep 26172
[root@claridge samba]# gdb /opt/samba/sbin/smbd 26172
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/samba/sbin/smbd...done.
Attaching to program: /opt/samba/sbin/smbd, process 26172
Cannot access memory at address 0x8b30508d
A program is being debugged already.  Kill it? (y or n) n
Program not killed.
(gdb) bt
#0  0x0012d402 in check_reduced_name (conn=Cannot access memory at address 0xd
) at smbd/vfs.c:1027
#1  0x002d8833 in own_ft_info (dom_info=Cannot access memory at address 0xd
) at rpc_server/lsa/srv_lsa_nt.c:3829
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Comment 9 Stefan Metzmacher 2011-06-15 14:42:11 UTC
Sorry, I'm busy with other stuff currently...
Comment 10 Jeremy Allison 2011-06-23 20:16:28 UTC
Unfortunately this is a corrupt stack, which makes it impossible to tell where we were :-(.

Can you reproduce and post the debug level 10 log from smbd when it dies, the gdb backtrace (maybe we'll get a more useful one). What would really help is to run smbd under valgrind, and then reproduce. That would tell us exactly what was being corrupted.

Jeremy.
Comment 11 Eric Solheim 2011-06-27 16:15:57 UTC
Created attachment 6635 [details]
Log File for latest backtrace

Thanks for checking into this here is a uncorrupted backtrace and log

Log for attached backtrace:
(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x00237833 in __waitpid_nocancel () from /lib/libc.so.6
#2  0x001dc19b in do_system () from /lib/libc.so.6
#3  0x007ed8aa in smb_panic (why=0xd5fce8 "internal error") at lib/util.c:1122
#4  0x007db615 in fault_report (sig=11) at lib/fault.c:53
#5  0x007db626 in sig_fault (sig=11) at lib/fault.c:76
#6  <signal handler called>
#7  0x00480f5f in smb2_send_async_interim_response (req=0x9b8cc88)
    at smbd/smb2_server.c:739
#8  0x004812cd in smbd_smb2_request_pending_queue (req=0x9b8cc88, subreq=0x0)
    at smbd/smb2_server.c:845
#9  0x004960c0 in smbd_smb2_request_process_notify (req=0x9b8cc88)
    at smbd/smb2_notify.c:106
#10 0x00482a85 in smbd_smb2_request_dispatch (req=0x9b8cc88)
    at smbd/smb2_server.c:1377
#11 0x004831dd in smbd_smb2_request_dispatch_immediate (ctx=0x9b50ab0, im=0x0, 
    private_data=0x9b8cc88) at smbd/smb2_server.c:1539
#12 0x00803489 in tevent_common_loop_immediate (ev=0x9b50ab0)
    at ../lib/tevent/tevent_immediate.c:135
#13 0x008006aa in run_events_poll (ev=0x9b50ab0, pollrtn=0, pfds=0x0, 
    num_pfds=0) at lib/events.c:197
#14 0x004677c3 in smbd_server_connection_loop_once (conn=0x9b50b20)
    at smbd/process.c:994
---Type <return> to continue, or q <return> to quit---
#15 0x0046c411 in smbd_process (sconn=0x9b50b20) at smbd/process.c:3153
#16 0x00bfe7fe in smbd_accept_connection (ev=0x9b50ab0, fde=0x9b63928, 
    flags=1, private_data=0x9b6faf0) at smbd/server.c:496
#17 0x00800c85 in run_events_poll (ev=0x9b50ab0, pollrtn=1, pfds=0x9b5f750, 
    num_pfds=5) at lib/events.c:282
#18 0x00800f29 in s3_event_loop_once (ev=0x9b50ab0, 
    location=0xe097c1 "smbd/server.c:829") at lib/events.c:345
#19 0x00801ee8 in _tevent_loop_once (ev=0x9b50ab0, 
    location=0xe097c1 "smbd/server.c:829") at ../lib/tevent/tevent.c:494
#20 0x00bff56d in smbd_parent_loop (parent=0x9b692d0) at smbd/server.c:829
#21 0x00c00762 in main (argc=1, argv=0xbffc9694) at smbd/server.c:1311
(gdb) quit
Comment 12 Eric Solheim 2011-06-27 17:14:09 UTC
Created attachment 6638 [details]
Valgrind Output process manually killed

Here is the valgrind output the window continued to refresh indefinitely any attempt to click on a folder failed.
Process was manually killed. 

Let me know if there is anything else I can do.
Thanks
Comment 13 Jeremy Allison 2011-06-29 00:18:14 UTC
Marking this as a blocker for 3.6.0 as it's an uninitialized memory read that will kill the server in SMB2.

Patch to follow.

Jeremy.
Comment 14 Jeremy Allison 2011-06-29 00:29:29 UTC
FYI. This would only occur when signing was turned on, that's why I didn't see it before. Patch to follow.
Jeremy.
Comment 15 Jeremy Allison 2011-06-29 00:50:51 UTC
Created attachment 6643 [details]
git-am fix for 3.6.0

Metze please review. I've added this to master as it's pretty obviously correct (a missing field initialization when duplicating a structure leads to a NULL pointer reference).

Eric Solheim, if you could please review and test this patch to see if it fixes your problem asap I'd really appreciate it.

Thanks,

Jeremy.
Comment 16 Stefan Metzmacher 2011-06-29 07:16:29 UTC
Comment on attachment 6643 [details]
git-am fix for 3.6.0

Looks good
Comment 17 Stefan Metzmacher 2011-06-29 07:17:36 UTC
Karolin please pick for 3.6.0
Comment 18 Eric Solheim 2011-06-29 14:27:02 UTC
Created attachment 6647 [details]
New logs after patch is applied

The proposed patch by Jeremy fixed the panic but samba still refreshes and is not usable.

Attached is the log with and without Valgrind.
(Without Valgrind it gives up after many fast refreshes. 
With Valgrind it runs much slower so I don't know if it eventually gives up or not)

Let me know if you want me to submit this as a new bug since technically the panic is fixed.

Also the problem disappeared when I disabled the following Local Security Policy Microsoft Network client: Digitially sign communications (always)

So it can be tested by enabling this parameter.

As a side my environment also enables the following registry setting:
HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/services\LanmanWorkstation\Parameters\RequreSecuritySignature

See http://blogs.technet.com/b/josebda/archive/2010/12/01/the-basics-of-smb-signing-covering-both-smb1-and-smb2.aspx

Although it now appears to have an easy workaround, 
please look into this bug for 3.6.0 as I cannot dictate the Security Policy for the company.
Comment 19 Jeremy Allison 2011-06-29 16:40:48 UTC
The key is this snippet from the log, which keeps repeating:

[2011/06/29 09:36:55.930569, 10] smbd/smb2_server.c:472(smb2_set_operation_credit)
  smb2_set_operation_credit: requested 1, granted 1, total granted 30
[2011/06/29 09:36:55.930632,  5] smbd/smb2_signing.c:73(smb2_signing_sign_pdu)
  signed SMB2 message
  smb2_send_async_interim_response: nreq->current_idx = 4
  smb2_send_async_interim_response: returning 4 vectors
  	req->out.vector[0].iov_len = 4
  	req->out.vector[1].iov_len = 64
  	req->out.vector[2].iov_len = 88
  	req->out.vector[3].iov_len = 88
[2011/06/29 09:36:55.930794, 10] smbd/smb2_server.c:472(smb2_set_operation_credit)
  smb2_set_operation_credit: requested 1, granted 1, total granted 31
[2011/06/29 09:36:55.930842, 10] smbd/smb2_server.c:582(smbd_server_connection_terminate_ex)
  smbd_server_connection_terminate_ex: reason[NT_STATUS_INVALID_PARAMETER] at smbd/smb2_server.c:1542

I'm guessing this is coming from this code in smbd/smb2_signing.c:

27 NTSTATUS smb2_signing_sign_pdu(DATA_BLOB session_key,
 28                                struct iovec *vector,
 29                                int count)
 30 {
 31         uint8_t *hdr;
 32         uint64_t session_id;
 33         struct HMACSHA256Context m;
 34         uint8_t res[SHA256_DIGEST_LENGTH];
 35         int i;
 36 
 37         if (count < 2) {
 38                 return NT_STATUS_INVALID_PARAMETER;
 39         }
 40 
 41         if (vector[0].iov_len != SMB2_HDR_BODY) {
 42                 return NT_STATUS_INVALID_PARAMETER;
 43         }

As we always call smb2_signing_sign_pdu() with a count parameter of 3, I'm guessing it's the vector[0].iov_len != SMB2_HDR_BODY check that is triggering here.

I'm happy with you opening this as a new bug, as it's different from the crash bug, but it certainly seems that signing when sending an interim asynchronous response is failing here.

Jeremy.
Comment 20 Jeremy Allison 2011-06-29 16:45:24 UTC
Ok - hang on ! Don't open a new bug. I've spotted the error here. I have a patch. Eric Solheim please test asap and we'll get this into 3.6.0 and ensure signing works for 3.6.0.

Simple patch will follow.

Jeremy.
Comment 21 Jeremy Allison 2011-06-29 16:46:53 UTC
Re-assigning to metze for patch review.
Comment 22 Jeremy Allison 2011-06-29 16:52:37 UTC
Created attachment 6648 [details]
Second part of the fix for this bug. git-am for 3.6.0

Metze please review asap. We were passing vector[0] (which starts with the 4 byte packet length) to the signing function in the asynchronous reply case, instead of passing &vector[1] which is the start of the SMB2_HEADER to the signing function - causing the signing function to return NT_STATUS_INVALID_PARAMETER.

We need both these fixes for 3.6.0 to make SMB2 signing work with asnychronous interim responses.

Jeremy.
Comment 23 Jeremy Allison 2011-06-29 16:59:48 UTC
Created attachment 6650 [details]
Corrected second part of the git-am patch for 3.6.0

Corrected patch - forgot to reduce the vector count from 3 to 2 in previous patch. Now we're passing &vector[1] instead of just vector we're passing in only a 2 element vector, not 3.

Metze please check !

Jeremy.
Comment 24 Eric Solheim 2011-06-29 17:02:09 UTC
(In reply to comment #23)
> Created attachment 6650 [details]
> Corrected second part of the git-am patch for 3.6.0
> Corrected patch - forgot to reduce the vector count from 3 to 2 in previous
> patch. Now we're passing &vector[1] instead of just vector we're passing in
> only a 2 element vector, not 3.
> Metze please check !
> Jeremy.

Ok will test shortly please ignore this for now.
https://bugzilla.samba.org/show_bug.cgi?id=8279
Comment 25 Eric Solheim 2011-06-29 19:09:50 UTC
That fixed it thanks.

(In reply to comment #24)
> (In reply to comment #23)
> > Created attachment 6650 [details] [details]
> > Corrected second part of the git-am patch for 3.6.0
> > Corrected patch - forgot to reduce the vector count from 3 to 2 in previous
> > patch. Now we're passing &vector[1] instead of just vector we're passing in
> > only a 2 element vector, not 3.
> > Metze please check !
> > Jeremy.
> Ok will test shortly please ignore this for now.
> https://bugzilla.samba.org/show_bug.cgi?id=8279
Comment 26 Jeremy Allison 2011-06-29 20:58:09 UTC
Great ! Thanks for confirming Eric. Just awaiting review from metze and we'll get this into 3.6.0 final.

Cheers,

Jeremy.
Comment 27 Stefan Metzmacher 2011-06-30 07:18:50 UTC
Comment on attachment 6650 [details]
Corrected second part of the git-am patch for 3.6.0

Looks correct
Comment 28 Stefan Metzmacher 2011-06-30 07:19:35 UTC
Karolin, please pick for 3.6.0
Comment 29 Karolin Seeger 2011-06-30 17:36:45 UTC
Pushed both patches to v3-6-test.
Closing out bug report.

Thanks!