Created attachment 6562 [details] Log File for Windows 7 machine with smb_panic Samba Server machine Running RHEL 2.6.18-238.9.1.el5 Client Machine Windows 7 Enterprise x64 The Windows7 client will connect then directory arrow starts flashing and loses each directory until it can no longer access the share. Connection is unavailable in further attempts to access share. Windows XP clients connect fine even after share dies for Windows 7 clients. Attached Level 10 Debug, causes smb_panic. Fault Report starts here 2011/06/10 10:00:26.237377 1) Samba was started Windows7 atttempted to connect. 2) Client connection crashed. 3) Attempted to connect again. Bug exists in Both RC1 and RC2 and does not exist in version 3.5.6 (Using the samba checklist the failure occurs on the last step)
Can you add the line: panic action = /bin/sleep 999999 then reproduce the crash, and attach with gdb to the parent process of the sleep and type "bt" to get a backtrace ? This would help enormously in tracking this down. Thanks ! Jeremy.
For anyone else examining this (Metze, Vl ?) here is the relevant part of the log. -------------------------------------------------------------- 2011/06/10 10:00:26.236558, 10] smbd/smb2_server.c:1087(smbd_smb2_request_dispatch) smbd_smb2_request_dispatch: opcode[SMB2_NOTIFY] mid = 20 [2011/06/10 10:00:26.236636, 4] smbd/uid.c:348(change_to_user) Skipping user change - already user [2011/06/10 10:00:26.236681, 10] smbd/smb2_notify.c:220(smbd_smb2_notify_send) smbd_smb2_notify_send: file_id[0xFFFFFFFFFFFFFFFF] [2011/06/10 10:00:26.236736, 3] smbd/smb2_notify.c:254(smbd_smb2_notify_send) smbd_smb2_notify_send: notify change called on gomalley, filter = FILE_NAME|DIR_NAME|ATTRIBUTES|LAST_WRITE, recursive = 0 [2011/06/10 10:00:26.236792, 10] lib/dbwrap_tdb.c:102(db_tdb_fetch_locked) Locking key 6E6F7469667920617272 [2011/06/10 10:00:26.236841, 10] lib/dbwrap_tdb.c:131(db_tdb_fetch_locked) Allocated locked data 0x0x87203d0 [2011/06/10 10:00:26.236930, 10] smbd/notify_inotify.c:416(inotify_watch) inotify_add_watch for /home/gomalley mask 210003c6 returned wd 1 [2011/06/10 10:00:26.236976, 10] lib/dbwrap_tdb.c:44(db_tdb_record_destr) Unlocking key 6E6F7469667920617272 [2011/06/10 10:00:26.237027, 10] smbd/notify.c:228(change_notify_add_request) change_notify_add_request: Adding request for gomalley: max_param = 32 smbd_smb2_request_pending_queue: req->current_idx = 4 req->in.vector[0].iov_len = 4 req->in.vector[1].iov_len = 64 req->in.vector[2].iov_len = 56 req->in.vector[3].iov_len = 104 req->in.vector[4].iov_len = 64 req->in.vector[5].iov_len = 32 req->in.vector[6].iov_len = 0 req->out.vector[0].iov_len = 4 req->out.vector[1].iov_len = 64 req->out.vector[2].iov_len = 88 req->out.vector[3].iov_len = 88 req->out.vector[4].iov_len = 64 req->out.vector[5].iov_len = 8 req->out.vector[6].iov_len = 0 [2011/06/10 10:00:26.237324, 10] smbd/smb2_server.c:472(smb2_set_operation_credit) smb2_set_operation_credit: requested 1, granted 1, total granted 30 [2011/06/10 10:00:26.237377, 0] lib/fault.c:47(fault_report) =============================================================== [2011/06/10 10:00:26.237420, 0] lib/fault.c:48(fault_report) INTERNAL ERROR: Signal 11 in pid 19980 (3.6.0rc2) Please read the Trouble-Shooting section of the Samba3-HOWTO [2011/06/10 10:00:26.237478, 0] lib/fault.c:50(fault_report) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2011/06/10 10:00:26.237533, 0] lib/fault.c:51(fault_report) =============================================================== [2011/06/10 10:00:26.237579, 0] lib/util.c:1116(smb_panic) PANIC (pid 19980): internal error [2011/06/10 10:00:26.240754, 0] lib/util.c:1220(log_stack_trace) BACKTRACE: 17 stack frames: #0 /opt/samba/sbin/smbd(log_stack_trace+0x2d) [0x873fb5] #1 /opt/samba/sbin/smbd(smb_panic+0x7c) [0x8740e7] #2 /opt/samba/sbin/smbd [0x86476e] #3 [0xdb9420] #4 /opt/samba/sbin/smbd(smbd_smb2_request_process_notify+0x6cc) [0x5cce6c] #5 /opt/samba/sbin/smbd(smbd_smb2_request_dispatch+0xe87) [0x5bd7d7] #6 /opt/samba/sbin/smbd(smbd_smb2_request_dispatch_immediate+0xbe) [0x5bf342] #7 /opt/samba/sbin/smbd(tevent_common_loop_immediate+0x111) [0x88525d] #8 /opt/samba/sbin/smbd(run_events_poll+0x3e) [0x883425] #9 /opt/samba/sbin/smbd(smbd_process+0x1313) [0x5aa83b] #10 /opt/samba/sbin/smbd [0xb402b1] #11 /opt/samba/sbin/smbd(run_events_poll+0x3f9) [0x8837e0] #12 /opt/samba/sbin/smbd [0x883c3d] #13 /opt/samba/sbin/smbd(_tevent_loop_once+0x9d) [0x88409d] #14 /opt/samba/sbin/smbd(main+0x16f5) [0xb41eba] #15 /lib/libc.so.6(__libc_start_main+0xdc) [0x25be9c] #16 /opt/samba/sbin/smbd [0x524e41]
Any information on how you reproduced this (smb.conf etc.) would also be very interesting. Thanks ! Jeremy.
Re-assigning to Metze for this week as I'll be on vacation. Thanks Metze. Jeremy.
[root@claridge samba]# ps -ef | grep mb root 25526 1 0 14:15 ? 00:00:00 /opt/samba/sbin/smbd -D root 25528 25526 0 14:15 ? 00:00:00 /opt/samba/sbin/smbd -D root 25658 1 0 14:16 ? 00:00:00 /opt/samba/sbin/nmbd -D root 25660 1 0 14:16 ? 00:00:00 /opt/samba/sbin/winbindd root 25662 25660 0 14:16 ? 00:00:00 /opt/samba/sbin/winbindd root 25701 25660 0 14:17 ? 00:00:00 /opt/samba/sbin/winbindd root 25702 25660 0 14:17 ? 00:00:00 /opt/samba/sbin/winbindd gomalley 25708 25526 0 14:17 ? 00:00:00 /opt/samba/sbin/smbd -D root 25724 25255 0 14:18 pts/2 00:00:00 grep mb [root@claridge samba]# gdb /opt/samba/sbin/smbd 25708 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/samba/sbin/smbd...(no debugging symbols found)...done. Attaching to program: /opt/samba/sbin/smbd, process 25708 Cannot access memory at address 0x89028c03 A program is being debugged already. Kill it? (y or n) n Program not killed. (gdb) bt #0 0x00110402 in smb_vfs_call_aio_cancel () #1 0x00d8e833 in ?? () #2 0x00e51ff4 in ?? () #3 0x00d3319b in ?? () #4 0x00006470 in ?? () #5 0xbfe8b1c8 in ?? () #6 0x00000000 in ?? () (gdb)
Can you load up the debug symbols as well please, we don't get much without them. Thanks! Jeremy.
Created attachment 6583 [details] smb.conf file, configuration, and latest crash
(In reply to comment #7) > Created attachment 6583 [details] > smb.conf file, configuration, and latest crash Below is the gdb output with symbols enabled, I have also attached the smb.conf file and configuration steps used. [root@claridge samba]# ps -ef | grep sleep root 26176 26172 0 08:33 ? 00:00:00 /bin/sleep 999999 root 26196 30989 0 08:34 pts/6 00:00:00 grep sleep [root@claridge samba]# ps -ef | grep 26172 gomalley 26172 26143 0 08:33 ? 00:00:00 /opt/samba/sbin/smbd -D root 26176 26172 0 08:33 ? 00:00:00 /bin/sleep 999999 root 26203 30989 0 08:34 pts/6 00:00:00 grep 26172 [root@claridge samba]# gdb /opt/samba/sbin/smbd 26172 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5_6.2) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/samba/sbin/smbd...done. Attaching to program: /opt/samba/sbin/smbd, process 26172 Cannot access memory at address 0x8b30508d A program is being debugged already. Kill it? (y or n) n Program not killed. (gdb) bt #0 0x0012d402 in check_reduced_name (conn=Cannot access memory at address 0xd ) at smbd/vfs.c:1027 #1 0x002d8833 in own_ft_info (dom_info=Cannot access memory at address 0xd ) at rpc_server/lsa/srv_lsa_nt.c:3829 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Sorry, I'm busy with other stuff currently...
Unfortunately this is a corrupt stack, which makes it impossible to tell where we were :-(. Can you reproduce and post the debug level 10 log from smbd when it dies, the gdb backtrace (maybe we'll get a more useful one). What would really help is to run smbd under valgrind, and then reproduce. That would tell us exactly what was being corrupted. Jeremy.
Created attachment 6635 [details] Log File for latest backtrace Thanks for checking into this here is a uncorrupted backtrace and log Log for attached backtrace: (gdb) bt #0 0x00110402 in __kernel_vsyscall () #1 0x00237833 in __waitpid_nocancel () from /lib/libc.so.6 #2 0x001dc19b in do_system () from /lib/libc.so.6 #3 0x007ed8aa in smb_panic (why=0xd5fce8 "internal error") at lib/util.c:1122 #4 0x007db615 in fault_report (sig=11) at lib/fault.c:53 #5 0x007db626 in sig_fault (sig=11) at lib/fault.c:76 #6 <signal handler called> #7 0x00480f5f in smb2_send_async_interim_response (req=0x9b8cc88) at smbd/smb2_server.c:739 #8 0x004812cd in smbd_smb2_request_pending_queue (req=0x9b8cc88, subreq=0x0) at smbd/smb2_server.c:845 #9 0x004960c0 in smbd_smb2_request_process_notify (req=0x9b8cc88) at smbd/smb2_notify.c:106 #10 0x00482a85 in smbd_smb2_request_dispatch (req=0x9b8cc88) at smbd/smb2_server.c:1377 #11 0x004831dd in smbd_smb2_request_dispatch_immediate (ctx=0x9b50ab0, im=0x0, private_data=0x9b8cc88) at smbd/smb2_server.c:1539 #12 0x00803489 in tevent_common_loop_immediate (ev=0x9b50ab0) at ../lib/tevent/tevent_immediate.c:135 #13 0x008006aa in run_events_poll (ev=0x9b50ab0, pollrtn=0, pfds=0x0, num_pfds=0) at lib/events.c:197 #14 0x004677c3 in smbd_server_connection_loop_once (conn=0x9b50b20) at smbd/process.c:994 ---Type <return> to continue, or q <return> to quit--- #15 0x0046c411 in smbd_process (sconn=0x9b50b20) at smbd/process.c:3153 #16 0x00bfe7fe in smbd_accept_connection (ev=0x9b50ab0, fde=0x9b63928, flags=1, private_data=0x9b6faf0) at smbd/server.c:496 #17 0x00800c85 in run_events_poll (ev=0x9b50ab0, pollrtn=1, pfds=0x9b5f750, num_pfds=5) at lib/events.c:282 #18 0x00800f29 in s3_event_loop_once (ev=0x9b50ab0, location=0xe097c1 "smbd/server.c:829") at lib/events.c:345 #19 0x00801ee8 in _tevent_loop_once (ev=0x9b50ab0, location=0xe097c1 "smbd/server.c:829") at ../lib/tevent/tevent.c:494 #20 0x00bff56d in smbd_parent_loop (parent=0x9b692d0) at smbd/server.c:829 #21 0x00c00762 in main (argc=1, argv=0xbffc9694) at smbd/server.c:1311 (gdb) quit
Created attachment 6638 [details] Valgrind Output process manually killed Here is the valgrind output the window continued to refresh indefinitely any attempt to click on a folder failed. Process was manually killed. Let me know if there is anything else I can do. Thanks
Marking this as a blocker for 3.6.0 as it's an uninitialized memory read that will kill the server in SMB2. Patch to follow. Jeremy.
FYI. This would only occur when signing was turned on, that's why I didn't see it before. Patch to follow. Jeremy.
Created attachment 6643 [details] git-am fix for 3.6.0 Metze please review. I've added this to master as it's pretty obviously correct (a missing field initialization when duplicating a structure leads to a NULL pointer reference). Eric Solheim, if you could please review and test this patch to see if it fixes your problem asap I'd really appreciate it. Thanks, Jeremy.
Comment on attachment 6643 [details] git-am fix for 3.6.0 Looks good
Karolin please pick for 3.6.0
Created attachment 6647 [details] New logs after patch is applied The proposed patch by Jeremy fixed the panic but samba still refreshes and is not usable. Attached is the log with and without Valgrind. (Without Valgrind it gives up after many fast refreshes. With Valgrind it runs much slower so I don't know if it eventually gives up or not) Let me know if you want me to submit this as a new bug since technically the panic is fixed. Also the problem disappeared when I disabled the following Local Security Policy Microsoft Network client: Digitially sign communications (always) So it can be tested by enabling this parameter. As a side my environment also enables the following registry setting: HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/services\LanmanWorkstation\Parameters\RequreSecuritySignature See http://blogs.technet.com/b/josebda/archive/2010/12/01/the-basics-of-smb-signing-covering-both-smb1-and-smb2.aspx Although it now appears to have an easy workaround, please look into this bug for 3.6.0 as I cannot dictate the Security Policy for the company.
The key is this snippet from the log, which keeps repeating: [2011/06/29 09:36:55.930569, 10] smbd/smb2_server.c:472(smb2_set_operation_credit) smb2_set_operation_credit: requested 1, granted 1, total granted 30 [2011/06/29 09:36:55.930632, 5] smbd/smb2_signing.c:73(smb2_signing_sign_pdu) signed SMB2 message smb2_send_async_interim_response: nreq->current_idx = 4 smb2_send_async_interim_response: returning 4 vectors req->out.vector[0].iov_len = 4 req->out.vector[1].iov_len = 64 req->out.vector[2].iov_len = 88 req->out.vector[3].iov_len = 88 [2011/06/29 09:36:55.930794, 10] smbd/smb2_server.c:472(smb2_set_operation_credit) smb2_set_operation_credit: requested 1, granted 1, total granted 31 [2011/06/29 09:36:55.930842, 10] smbd/smb2_server.c:582(smbd_server_connection_terminate_ex) smbd_server_connection_terminate_ex: reason[NT_STATUS_INVALID_PARAMETER] at smbd/smb2_server.c:1542 I'm guessing this is coming from this code in smbd/smb2_signing.c: 27 NTSTATUS smb2_signing_sign_pdu(DATA_BLOB session_key, 28 struct iovec *vector, 29 int count) 30 { 31 uint8_t *hdr; 32 uint64_t session_id; 33 struct HMACSHA256Context m; 34 uint8_t res[SHA256_DIGEST_LENGTH]; 35 int i; 36 37 if (count < 2) { 38 return NT_STATUS_INVALID_PARAMETER; 39 } 40 41 if (vector[0].iov_len != SMB2_HDR_BODY) { 42 return NT_STATUS_INVALID_PARAMETER; 43 } As we always call smb2_signing_sign_pdu() with a count parameter of 3, I'm guessing it's the vector[0].iov_len != SMB2_HDR_BODY check that is triggering here. I'm happy with you opening this as a new bug, as it's different from the crash bug, but it certainly seems that signing when sending an interim asynchronous response is failing here. Jeremy.
Ok - hang on ! Don't open a new bug. I've spotted the error here. I have a patch. Eric Solheim please test asap and we'll get this into 3.6.0 and ensure signing works for 3.6.0. Simple patch will follow. Jeremy.
Re-assigning to metze for patch review.
Created attachment 6648 [details] Second part of the fix for this bug. git-am for 3.6.0 Metze please review asap. We were passing vector[0] (which starts with the 4 byte packet length) to the signing function in the asynchronous reply case, instead of passing &vector[1] which is the start of the SMB2_HEADER to the signing function - causing the signing function to return NT_STATUS_INVALID_PARAMETER. We need both these fixes for 3.6.0 to make SMB2 signing work with asnychronous interim responses. Jeremy.
Created attachment 6650 [details] Corrected second part of the git-am patch for 3.6.0 Corrected patch - forgot to reduce the vector count from 3 to 2 in previous patch. Now we're passing &vector[1] instead of just vector we're passing in only a 2 element vector, not 3. Metze please check ! Jeremy.
(In reply to comment #23) > Created attachment 6650 [details] > Corrected second part of the git-am patch for 3.6.0 > Corrected patch - forgot to reduce the vector count from 3 to 2 in previous > patch. Now we're passing &vector[1] instead of just vector we're passing in > only a 2 element vector, not 3. > Metze please check ! > Jeremy. Ok will test shortly please ignore this for now. https://bugzilla.samba.org/show_bug.cgi?id=8279
That fixed it thanks. (In reply to comment #24) > (In reply to comment #23) > > Created attachment 6650 [details] [details] > > Corrected second part of the git-am patch for 3.6.0 > > Corrected patch - forgot to reduce the vector count from 3 to 2 in previous > > patch. Now we're passing &vector[1] instead of just vector we're passing in > > only a 2 element vector, not 3. > > Metze please check ! > > Jeremy. > Ok will test shortly please ignore this for now. > https://bugzilla.samba.org/show_bug.cgi?id=8279
Great ! Thanks for confirming Eric. Just awaiting review from metze and we'll get this into 3.6.0 final. Cheers, Jeremy.
Comment on attachment 6650 [details] Corrected second part of the git-am patch for 3.6.0 Looks correct
Karolin, please pick for 3.6.0
Pushed both patches to v3-6-test. Closing out bug report. Thanks!