I recently installed (compiled from sources) Samba 3.4.5 on Ubuntu 8.04 64 bit and had to downgrad again because of massive core panics: I can't really tell you what caused that (maybe while printing via samba-cups) - can somebody give me a hint or tell me what information you need to analyze that please? [2010/01/25 13:46:52, 0] lib/fault.c:46(fault_report) =============================================================== [2010/01/25 13:46:52, 0] lib/fault.c:47(fault_report) INTERNAL ERROR: Signal 11 in pid 7553 (3.4.5) Please read the Trouble-Shooting section of the Samba3-HOWTO [2010/01/25 13:46:52, 0] lib/fault.c:49(fault_report) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2010/01/25 13:46:52, 0] lib/fault.c:50(fault_report) =============================================================== [2010/01/25 13:46:52, 0] lib/util.c:1480(smb_panic) PANIC (pid 7553): internal error [2010/01/25 13:46:52, 0] lib/util.c:1584(log_stack_trace) BACKTRACE: 25 stack frames: #0 /opt/samba345/sbin/smbd(log_stack_trace+0x1a) [0x6bc42b] #1 /opt/samba345/sbin/smbd(smb_panic+0x55) [0x6bc52a] #2 /opt/samba345/sbin/smbd [0x6ad1ae] #3 /lib/libpthread.so.0 [0x7fa381cb77d0] #4 /etc/samba/libtalloc.so.1 [0x7fa380da9761] #5 /etc/samba/libtalloc.so.1(talloc_free+0x1cc) [0x7fa380dab92e] #6 /opt/samba345/sbin/smbd [0x6249ca] #7 /opt/samba345/sbin/smbd(api_pipe_request+0x3ac) [0x656ef5] #8 /opt/samba345/sbin/smbd [0x650b46] #9 /opt/samba345/sbin/smbd(np_write_send+0x912) [0x651a13] #10 /opt/samba345/sbin/smbd [0x4a7456] #11 /opt/samba345/sbin/smbd [0x4a79d9] #12 /opt/samba345/sbin/smbd(reply_trans+0x6b7) [0x4a85af] #13 /opt/samba345/sbin/smbd [0x4fee33] #14 /opt/samba345/sbin/smbd [0x5012e8] #15 /opt/samba345/sbin/smbd [0x501a6c] #16 /opt/samba345/sbin/smbd(run_events+0x131) [0x6cb350] #17 /opt/samba345/sbin/smbd(smbd_process+0x790) [0x500cb7] #18 /opt/samba345/sbin/smbd [0x947e15] #19 /opt/samba345/sbin/smbd(run_events+0x131) [0x6cb350] #20 /opt/samba345/sbin/smbd [0x6cb5b0] #21 /opt/samba345/sbin/smbd(_tevent_loop_once+0x84) [0x6cb929] #22 /opt/samba345/sbin/smbd(main+0xf17) [0x947b99] #23 /lib/libc.so.6(__libc_start_main+0xf4) [0x7fa3806481c4] #24 /opt/samba345/sbin/smbd [0x48e629] [2010/01/25 13:46:52, 0] lib/fault.c:326(dump_core) dumping core in /var/log/samba/cores/smbd I let my samba run for about an hour until smb_panic occured - I have enclosed the whole /var/log/samba/ directory and put the smb.conf in it. Hope you find something to that topic regards Martin
As the compressed logs are to big - http://195.202.144.10/~nsc/varlogsamba.rar
Thanks, that helps a bit. This is in the reply of spoolss_EnumPrinterData. Probably an impossible request for you, but it would still be VERY helpful if you could run the smbd under valgrind. Ubuntu should have valgrind packages, on 9.04 I can install them with sudo aptitude install valgrind It should be possible to find something similar under 8.04. When installed, for a test could you shutdown smbd and run sudo valgrind --tool=memcheck --num-callers=20 --trace-children=yes smbd -d 0 > /tmp/valgrind.out 2>&1 & This will slowdown smbd a LOT, but /tmp/valgrind.out will contain very valuable information for us. Thanks, Volker
I have done the valgrind debugging (altough it didn't generated that much data) - please find today's testing on: /tmp/ http://195.202.144.10/~nsc/tmp.rar /var/log/samba/ http://195.202.144.10/~nsc/varlogsamba.rar regards Martin
==17767== Invalid write of size 8 ==17767== at 0x714A761: _talloc_free (in /etc/samba/libtalloc.so.1) ==17767== by 0x714C92D: talloc_free (in /etc/samba/libtalloc.so.1) ==17767== by 0x6249C9: api_spoolss_EnumPrinterData (in /opt/samba345/sbin/smbd) ==17767== by 0x656EF4: api_pipe_request (in /opt/samba345/sbin/smbd) ==17767== by 0x650B45: process_complete_pdu (in /opt/samba345/sbin/smbd) ==17767== by 0x651A12: np_write_send (in /opt/samba345/sbin/smbd) ==17767== by 0x4A7455: api_fd_reply (in /opt/samba345/sbin/smbd) ==17767== by 0x4A79D8: handle_trans (in /opt/samba345/sbin/smbd) ==17767== by 0x4A85AE: reply_trans (in /opt/samba345/sbin/smbd) ==17767== by 0x4FEE32: switch_message (in /opt/samba345/sbin/smbd) ==17767== by 0x5012E7: process_smb (in /opt/samba345/sbin/smbd) ==17767== by 0x501A6B: smbd_server_connection_handler (in /opt/samba345/sbin/smbd) ==17767== by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd) ==17767== by 0x500CB6: smbd_process (in /opt/samba345/sbin/smbd) ==17767== by 0x947E14: smbd_accept_connection (in /opt/samba345/sbin/smbd) ==17767== by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd) ==17767== by 0x6CB5AF: s3_event_loop_once (in /opt/samba345/sbin/smbd) ==17767== by 0x6CB928: _tevent_loop_once (in /opt/samba345/sbin/smbd) ==17767== by 0x947B98: main (in /opt/samba345/sbin/smbd) ==17767== Address 0xca is not stack'd, malloc'd or (recently) free'd That is the crash. It is a pity that your compiler is broken and does not do any debug symbols, but we probably have to live with that as it is. Günther, does this ring any bell for you? Volker
(In reply to comment #4) > ==17767== Invalid write of size 8 > ==17767== at 0x714A761: _talloc_free (in /etc/samba/libtalloc.so.1) > ==17767== by 0x714C92D: talloc_free (in /etc/samba/libtalloc.so.1) > ==17767== by 0x6249C9: api_spoolss_EnumPrinterData (in > /opt/samba345/sbin/smbd) > ==17767== by 0x656EF4: api_pipe_request (in /opt/samba345/sbin/smbd) > ==17767== by 0x650B45: process_complete_pdu (in /opt/samba345/sbin/smbd) > ==17767== by 0x651A12: np_write_send (in /opt/samba345/sbin/smbd) > ==17767== by 0x4A7455: api_fd_reply (in /opt/samba345/sbin/smbd) > ==17767== by 0x4A79D8: handle_trans (in /opt/samba345/sbin/smbd) > ==17767== by 0x4A85AE: reply_trans (in /opt/samba345/sbin/smbd) > ==17767== by 0x4FEE32: switch_message (in /opt/samba345/sbin/smbd) > ==17767== by 0x5012E7: process_smb (in /opt/samba345/sbin/smbd) > ==17767== by 0x501A6B: smbd_server_connection_handler (in > /opt/samba345/sbin/smbd) > ==17767== by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd) > ==17767== by 0x500CB6: smbd_process (in /opt/samba345/sbin/smbd) > ==17767== by 0x947E14: smbd_accept_connection (in /opt/samba345/sbin/smbd) > ==17767== by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd) > ==17767== by 0x6CB5AF: s3_event_loop_once (in /opt/samba345/sbin/smbd) > ==17767== by 0x6CB928: _tevent_loop_once (in /opt/samba345/sbin/smbd) > ==17767== by 0x947B98: main (in /opt/samba345/sbin/smbd) > ==17767== Address 0xca is not stack'd, malloc'd or (recently) free'd > > > That is the crash. It is a pity that your compiler is broken and does not do > any debug symbols, but we probably have to live with that as it is. > > Günther, does this ring any bell for you? > > Volker > Can you analyze that data so far to find the root cause auf this or do you need more to find a solution ... or "do I have to live with that" ? :) regards martin
I am sorry, but I am very busy with paying customer's requests. I spend about 13 hours yesterday doing that. I have to sleep a bit, so this will have to wait. Volker
Volker, take care about the issue when you have the time to - it is absolutly not urgent as I run samba 3.028 in parallel. I just wanted to know if you have a point to get into it (sometimes) thank you martin
I seem to be seeing the same thing. 3.4.5 on CentOS 5.4 32-bit. (gdb) bt #0 0x00110402 in __kernel_vsyscall () #1 0x00f93df0 in raise () from /lib/libc.so.6 #2 0x00f95701 in abort () from /lib/libc.so.6 #3 0x0091e0ec in dump_core () at lib/fault.c:337 #4 0x00930c4a in smb_panic (why=0xc89c1e "internal error") at lib/util.c:1496 #5 0x0091e6de in sig_fault (sig=11) at lib/fault.c:52 #6 <signal handler called> #7 0x0022b2d0 in _talloc_free (ptr=0x832d888) at ../lib/talloc/talloc.c:539 #8 0x0022e0d3 in talloc_free (ptr=0x832d508) at ../lib/talloc/talloc.c:562 #9 0x00888b5b in api_spoolss_EnumPrinterData (p=0x83c6140) at ../librpc/gen_ndr/srv_spoolss.c:5706 #10 0x008b5d46 in api_pipe_request (p=0x83c6140) at rpc_server/srv_pipe.c:2329 #11 0x008b13ba in process_complete_pdu (p=0x83c6140) at rpc_server/srv_pipe_hnd.c:480 #12 0x008b1aec in np_write_send (mem_ctx=0x832d308, ev=0x82c18d0, handle=0x83d9160, data=0x832d348 "\005", len=40) at rpc_server/srv_pipe_hnd.c:741 #13 0x006d8f90 in api_fd_reply (conn=0x83c5538, vuid=100, req=0x832d288, setup=0x83285c0, data=0x84abc80 "\005", params=0x0, suwcnt=2, tdscnt=56, tpscnt=0, mdrcnt=3696, mprcnt=0) at smbd/ipc.c:250 #14 0x006d94d9 in handle_trans (conn=0x83c5538, req=0x832d288, state=0x847d650) at smbd/ipc.c:513 #15 0x006da330 in reply_trans (req=0x832d288) at smbd/ipc.c:755 #16 0x0073b9be in switch_message (type=37 '%', req=0x832d288, size=144) at smbd/process.c:1377 #17 0x0073e570 in process_smb (conn=0x83db208, inbuf=<value optimized out>, nread=144, unread_bytes=0, encrypted=false, deferred_pcd=0x0) at smbd/process.c:1408 #18 0x0073e98e in smbd_server_connection_handler (ev=0x82c18d0, fde=0x8326c30, flags=1, private_data=0x83db208) at smbd/process.c:1887 #19 0x00941a47 in run_events (ev=0x82c18d0, selrtn=1, read_fds=0xbff8e74c, write_fds=0xbff8e6cc) at lib/events.c:126 #20 0x0073de06 in smbd_process () at smbd/process.c:820 #21 0x00c06175 in smbd_accept_connection (ev=0x82c18d0, fde=0x83dafc0, flags=1, private_data=0x83da9f0) at smbd/server.c:395 #22 0x00941a47 in run_events (ev=0x82c18d0, selrtn=2, read_fds=0xbff8ea18, write_fds=0xbff8e998) at lib/events.c:126 #23 0x00941cd9 in s3_event_loop_once (ev=0x82c18d0, location=0xd0c878 "smbd/server.c:681") at lib/events.c:185 #24 0x00942306 in _tevent_loop_once (ev=0x82c18d0, location=0xd0c878 "smbd/server.c:681") at ../lib/tevent/tevent.c:490 #25 0x00c05cf8 in main (argc=0, argv=0xbff8ee24) at smbd/server.c:681 (gdb) up 7 #7 0x0022b2d0 in _talloc_free (ptr=0x832d888) at ../lib/talloc/talloc.c:539 539 _TLIST_REMOVE(tc->parent->child, tc); #9 0x00888b5b in api_spoolss_EnumPrinterData (p=0x83c6140) at ../librpc/gen_ndr/srv_spoolss.c:5706 5706 talloc_free(r); (gdb) print p $3 = (pipes_struct *) 0x83c6140 (gdb) print *p $4 = {next = 0x0, prev = 0x0, client_address = "192.168.0.99", '\0' <repeats 33 times>, server_info = 0x83c6348, syntax = {uuid = {time_low = 305419896, time_mid = 4660, time_hi_and_version = 43981, clock_seq = "�", node = "\001#Eg\211�"}, if_version = 1}, contexts = 0x83c6820, hdr = {major = 5 '\005', minor = 0 '\0', pkt_type = 0 '\0', flags = 3 '\003', pack_type = "\020\000\000", frag_len = 56, auth_len = 0, call_id = 238}, hdr_req = { alloc_hint = 32, context_id = 0, opnum = 72}, auth = {auth_type = PIPE_AUTH_TYPE_NONE, auth_level = PIPE_AUTH_LEVEL_NONE, a_u = {schannel_auth = 0x0, auth_ntlmssp_state = 0x0}, auth_data_free_func = 0}, dc = 0x0, pipe_bound = true, fault_state = false, bad_handle_fault_state = false, rng_fault_state = false, endian = false, in_data = { current_in_pdu = 0x847d358 "\005", pdu_needed_len = 0, pdu_received_len = 56, data = {io = true, bigendian_data = false, align = 4 '\004', is_dynamic = true, data_offset = 32, buffer_size = 32, grow_size = 3692, data_p = 0x847f500 "", mem_ctx = 0x848e4d0, sess_key = 0x0}}, out_data = {rdata = {io = false, bigendian_data = false, align = 4 '\004', is_dynamic = true, data_offset = 152, buffer_size = 256, grow_size = 152, data_p = 0x83d5450 "@", mem_ctx = 0x848e4d0, sess_key = 0x0}, data_sent_length = 0, frag = { io = false, bigendian_data = false, align = 4 '\004', is_dynamic = false, data_offset = 0, buffer_size = 0, grow_size = 48, data_p = 0x0, mem_ctx = 0x848e4d0, sess_key = 0x0}, current_pdu_sent = 0}, mem_ctx = 0x848e4d0, pipe_handles = 0x83d3400} (gdb) print r No symbol "r" in current context. #12 0x008b1aec in np_write_send (mem_ctx=0x832d308, ev=0x82c18d0, handle=0x83d9160, data=0x832d348 "\005", len=40) at rpc_server/srv_pipe_hnd.c:741 741 process_complete_pdu(p); (gdb) print *ev $6 = {ops = 0xd8e4a0, fd_events = 0x8326c30, timer_events = 0x832aac0, immediate_events = 0x0, signal_events = 0x83d4ae8, additional_data = 0x0, pipe_fde = 0x83db070, pipe_fds = {6, 7}, debug_ops = {debug = 0x9415c0 <s3_event_debug>, context = 0x0}, nesting = {allowed = false, level = 1, hook_fn = 0, hook_private = 0x0}} (gdb) print *handle $7 = {type = FAKE_FILE_TYPE_NAMED_PIPE, private_data = 0x83c6140} Started seeing after making the following changes: < old, > new 32d31 < printcap name = /etc/printcap 39c38,40 < use client driver = yes --- > printcap name = cups > # Why? > use client driver = no
Reverted to original smb.conf with no effect. Other change was to upload a new PS driver for our Ricoh Aficio CL7200 to the driver area and try printing using that.
Martin - what version did you downgrade to? 3.0.28?
(gdb) print *(struct talloc_chunk *)(ptr-48) $5 = {next = 0x6e, prev = 0x0, parent = 0x95f88d0, child = 0x0, refs = 0x0, destructor = 0, name = 0x7b85f1 "uint32_t", size = 4, flags = 3893685368, pool = 0x95f8550} Next doesn't look valid.
Indeed - this looks like memory corruption. Can you run this under valgrind with : valgrind --trace-children=yes /usr/sbin/smbd Jeremy.
It fails to run under valgrind for me, or valgrind exits early. Are Martin's valgrind logs of any use?
well, with "memory corruption" you mean one of the 4x1 GB RAM banks of this HP Proliant DL360 is corrupt? I run the new valgrind on monday because I need proper activity on this PDC @Orion I have the Ubuntu 8.04 packages (3.028a) running in parallel with 3.4.5 self-compiled from the source regards Martin
Additonal - I did the same thing on 3.4.5 as Orion - I tried the "point and click" on a windows client and it was not working any more on the 3.4.5 pdc as windows asked for the drivers (altough they are uploaded on the proper samba share and the operation is working with 3.0.28a) - so I uploaded the cups PCL driver again and my people are using those drivers to print over the pdc ... just to mention that regards (I do have a samba 3.4.5 running on another ubuntu 8.04 64 bit but without printing support)
(In reply to comment #14) > well, with "memory corruption" you mean one of the 4x1 GB RAM banks of this HP > Proliant DL360 is corrupt? No - it's a program error.
Would compiling with: #if defined(DEVELOPER) && defined(VALGRIND_MAKE_MEM_NOACCESS) be of any help?
Orion Poplawski - are you using --valgrind-trace-children=yes ? "Exiting early" usually just means you're seeing the parent daemon die, and need to wait for the child to print errors. I'll take a look at Martin's vargrind logs, but we really need debug symbols to help isolate this and his logs don't contain any. It's a simple as compiling with -g or including the "samba-dbg" pacakge. Jeremy.
So, i did as requested - i compiled with the -g option and started valgrind as Jeremy told me. I tried to follow the debug help from the samba documentation (see at the bottom if I catched the correct smbd) and I hope that I have done anything correctly .... var/log/samba http://195.202.144.10/~nsc/varlogsamba.rar valgrind.out http://195.202.144.10/~nsc/valgrind.out Hope it helps regards Martin ---------------------------- root 30370 1.0 1.3 181300 55316 ? Ss 09:46 0:02 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30383 1.6 1.4 181936 59600 ? S 09:46 0:03 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30392 0.0 0.0 57812 1996 ? Ss 09:46 0:00 /opt/samba345/sbin/nmbd -D root 30394 1.7 1.5 183992 62004 ? S 09:46 0:03 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 10459 30395 4.4 1.9 201820 78616 ? S 09:46 0:08 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30396 6.5 1.9 195580 76844 ? S 09:46 0:12 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30398 1.1 1.6 188684 64096 ? S 09:46 0:02 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30427 2.0 1.6 189964 65220 ? S 09:47 0:03 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30428 2.7 1.6 189548 65360 ? S 09:47 0:04 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 root 30429 1.3 1.4 183956 59732 ? S 09:47 0:02 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0 10459 30528 0.9 0.5 76180 21660 ? S 09:49 0:00 valgrind.bin --trace-children=yes /bin/sh -c /bin/sleep 90000 10459 30529 1.2 0.5 81692 22348 ? S 09:49 0:00 valgrind.bin --trace-children=yes /bin/sleep 90000 nagios 30561 0.0 0.0 93196 616 ? S 09:50 0:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg nagios 30562 0.0 0.0 3944 564 ? S 09:50 0:00 sh -c /opt/nagios/libexec/check_disk_smb -H 192.168.1.100 -W RK -s Box -u Manager -p plk78tdz -w 80 -c 95 nagios 30564 0.0 0.1 22000 4140 ? R 09:50 0:00 /usr/bin/perl -w /opt/nagios/libexec/check_disk_smb -H 192.168.1.100 -W RK -s Box -u Manager -p plk78tdz -w 80 -c 95 root 30568 0.0 0.0 50500 2112 pts/0 R+ 09:50 0:00 ps aux root@rk01:/var/log# gdb GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". (gdb) Quit (gdb) gdQuit (gdb) quit root@rk01:/var/log# gdb /opt/samba345/sbin/smbd 30395 GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... Attaching to program: /opt/samba345/sbin/smbd, process 30395 0x0000000038066fc1 in ?? () (gdb) bt #0 0x0000000038066fc1 in ?? () #1 0x0000000000000008 in ?? () #2 0x00000004049bbe50 in ?? () #3 0x00000004049bbe70 in ?? () #4 0x0000000038d2a0f0 in ?? () #5 0x000000000000003d in ?? () #6 0x00000004049bbe70 in ?? () #7 0x000000000000003d in ?? () #8 0x0000000038d2a0f0 in ?? () #9 0x0000000000000001 in ?? () #10 0x0000000038d2a0e0 in ?? () #11 0x0000000038eed8d8 in ?? () #12 0x00000000380353fa in ?? () #13 0x0000000402144fd8 in ?? () #14 0x0000000402144fd8 in ?? () #15 0x0000000000000057 in ?? () #16 0x00000007feffe618 in ?? () #17 0x0000000038d2a0f0 in ?? () #18 0x0000000038eed968 in ?? () #19 0x0000000000000000 in ?? ()
and as I see in the process list - I have to change my passwords .... (stupid mistake)
Some -d 10 info: [2010/02/02 12:01:14, 4] rpc_server/srv_pipe.c:2297(api_rpcTNP) api_rpcTNP: \spoolss op 0x48 - api_rpcTNP: rpc command: SPOOLSS_ENUMPRINTERDATA [2010/02/02 12:01:14, 6] rpc_server/srv_pipe.c:2327(api_rpcTNP) api_rpc_cmds[72].fn == 0x8f2730 [2010/02/02 12:01:14, 1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug) spoolss_EnumPrinterData: struct spoolss_EnumPrinterData in: struct spoolss_EnumPrinterData handle : * handle: struct policy_handle handle_type : 0x00000000 (0) uuid : 00000003-0000-0000-684b-7a76b1420000 enum_index : 0x00000000 (0) value_offered : 0x00000080 (128) data_offered : 0x00000000 (0) [2010/02/02 12:01:14, 4] rpc_server/srv_lsa_hnd.c:180(find_policy_by_hnd_internal) Kzv [0010] B1 42 00 00 .B.. [2010/02/02 12:01:14, 5] rpc_server/srv_spoolss_nt.c:7951(_spoolss_EnumPrinterData) _spoolss_EnumPrinterData [2010/02/02 12:01:14, 4] rpc_server/srv_lsa_hnd.c:180(find_policy_by_hnd_internal) Kzv [0010] B1 42 00 00 .B.. [2010/02/02 12:01:14, 4] rpc_server/srv_spoolss_nt.c:377(get_printer_snum) short name:dali [2010/02/02 12:01:14, 10] printing/nt_printing.c:4630(get_a_printer_internal) get_a_printer: [dali] level 2 [2010/02/02 12:01:14, 5] printing/print_cups.c:1634(cups_pull_comment_location) pulling dali location [2010/02/02 12:01:14, 10] printing/print_cups.c:87(cups_connect) connecting to cups server /var/run/cups/cups.sock:631 [2010/02/02 12:01:14, 8] printing/nt_printing.c:2783(unpack_devicemode) Unpacked devicemode [\\EARTH\dali](Letter) [2010/02/02 12:01:14, 10] printing/nt_printing.c:2822(add_new_printer_key) add_new_printer_key: Inserted new data key [PrinterDriverData] [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:TrayFormSize], len: 4 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:TrayFormTable], len: 110 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:TrayFormMapSize], len: 4 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:TrayFormMap], len: 39 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:TrayFormKeywordSize], len: 4 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:TrayFormKeyword], len: 26 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:AutoConfigMode], len: 1 [2010/02/02 12:01:14, 8] printing/nt_printing.c:3732(unpack_values) specific: [PrinterDriverData:UiHideCapability], len: 4 [2010/02/02 12:01:14, 9] printing/nt_printing.c:4092(get_a_printer_2) Unpacked printer [dali] name [\\EARTH\dali] running driver [RICOH Aficio CL7200 PS] [2010/02/02 12:01:14, 1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug) spoolss_EnumPrinterData: struct spoolss_EnumPrinterData out: struct spoolss_EnumPrinterData value_name : 'TrayFormSize' value_needed : * value_needed : 0x0000000c (12) type : * type : REG_DWORD (4) data : * data: ARRAY(0) data_needed : * data_needed : 0x00000004 (4) result : WERR_OK [2010/02/02 12:01:14, 0] lib/fault.c:46(fault_report) It pretty reliably faults in api_spoolss_EnumPrinterData trying to free memory it allocates at the start of that routine, so the corruption probably occurs somewhere in that routine.
After srv_spoolss_nt.c:8059: r->out.value_name = talloc_strdup(r, regval_name(val)); Then: 2: *(struct talloc_chunk *) (void *) (r->out.value_name - 48) = {next = 0x82d4848, prev = 0x0, parent = 0x82d44c8, child = 0x0, refs = 0x0, destructor = 0, name = 0x82d4c48 "TrayFormSize", size = 13, flags = 3893685368, pool = 0x82d4148} 1: *r = {in = {handle = 0x82d4618, enum_index = 0, value_offered = 128, data_offered = 0}, out = {value_name = 0x82d4c48 "TrayFormSize", value_needed = 0x82d47c8, type = 0x82d4808, data = 0x82d4848 "\030H-\b\030L-\b", data_needed = 0x82d4878, result = {v = 0}}} Notice that the r->out.data pointer is the same as *(struct talloc_chunk *) (void *) (r->out.value_name - 48) -> next. So when we write to data: 8077: memcpy(r->out.data, regval_data_p(val), regval_size(val)); we end up stomping on the talloc list pointers. No idea why this happens. Don't really know anything about how talloc works.
Looks like data is getting allocated as a uint8_t: r->out.data = talloc_zero_array(r, uint8_t, r->in.data_offered); But we're writing 4 bytes: 8077 memcpy(r->out.data, regval_data_p(val), regval_size(val)); (gdb) print regval_size(val) $9 = 4 (gdb) print regval_data_p(val) $10 = (uint8_t *) 0x844de50 "n"
(gdb) print val $12 = (REGISTRY_VALUE *) 0x844dd10 (gdb) print *val $13 = { valuename = "TrayFormSize\000_d��\027r;\000\002\000\000\000\002$\000\f\000\017\020\001\005\000\000\000\000\000\005\025\000\000\000�B\237\220\224_d��\027r;\000\002\000\000\000\t\030\000\f\000\017\020\001\002\000\000\000\000\000\005 \000\000\000 \002\000\000\000\002\030\000\f\000\017\020\001\002\000\000\000\000\000\005 \000\000\000 \002\000\000\001\002\000\000\000\000\000\005 \000\000\000 \002\000\000\001\002\000\000\000\000\000\005 \000\000\000\020\001\000\000@", '\0' <repeats 11 times>, "h�D\b", '\0' <repeats 12 times>, "��D\b\n\000\000\000s�\024�\000\000\000\000\b\001\000\000H\000\000\000/bin/bash\000E"..., type = 4, size = 4, data_p = 0x844de50 "n"} (gdb) print val->data_p $14 = (uint8_t *) 0x844de50 "n" (gdb) print val->data_p[1] $15 = 0 '\0' (gdb) print val->data_p[2] $16 = 0 '\0' (gdb) print val->data_p[3] $17 = 0 '\0'
So, r->out.data is allocated to be the size of r->in.data_offered, which in this case is 0.
I don't see how you know how big to allocate r->out.data until you look up the registry value and see how big it is. At the very least we need a check that we don't copy more data than is allocated.
Is it true that is we simply don't allocate r->out.data, that the talloc_strdup() call will allocate it for us? Should that be done throughout srv_spoolss.c?
talloc_strdup is an allocating call. I'll take a look carefully at the internals of _spoolss_EnumPrinterData() to see if I can find the error. Thanks ! Jeremy.
Created attachment 5259 [details] Proposed patch for 3.4.x, 3.5.0 and master Can you try this patch please ? I think it should fix the crash. Jeremy.
Created attachment 5260 [details] Proposed patch for 3.4.x, 3.5.0 and master (slightly better) Slightly better patch, protects against val->data_p being null as well. Jeremy.
One thought: If r->in.data_offered < regval_size(val), we still return with r->out.data_needed = regval_size(val). Client may not like that. No idea how these things really work though.
I think that's correct. r->out.data_needed is telling the client how much data we want to return. I'll look at the 3.3.x code to make sure. Does the patch work in your environment ?
Appears to be working fine. No crashes. Duplex printing issues, but...
Thanks. Let me do a little more research as to whether we should return regval_size(val) or MIN(regval_size(val),r->in.data_offered) and I'll update the patch and commit. Thanks a *LOT* for tracking this down for me - much appreciated ! Jeremy.
Ok, I checked the MS-RPRN spec. 3.1.4.1.7 Dynamically Typed Query Parameters: It says: -------------------------------------------------------------- nSize or cbData: This parameter MUST be the size, in bytes, of the buffer. The value of nSize or cbData MAY be larger than the required size for the requested dynamically typed values. pcbNeeded or pcbData: This parameter MUST be a non-null pointer to a variable into which the server MUST copy the actual number of bytes between the start of the buffer and the last byte written by the server into the buffer, both inclusive, or the required size of the buffer in bytes if the value of the cbBuf parameter is smaller than the actual size of the data to return to the caller. -------------------------------------------------------------- In our code: cbData == r->in.data_offered pcbNeeded == r->out.data_needed I think the text here is using "cbBuf" instead of "cbData" (I'll raise a CAR on this), but the meaning is that pcbNeeded should be filled with the actual size of the data we would have *liked* to return - which in our case is regval_size(val), so the patch is correct as is. The client knows how much the maximum data is could have received back, as that was what it sent in r->in.data_offered.
In the latest version of MS-RPRN the section is : 3.1.4.1.2 Dynamically Typed Query Parameters.
Created attachment 5261 [details] git-am fix for 3.5.0.
Created attachment 5262 [details] git-am fix for 3.4.6.
Confirmed fixed by submitter. Re-assigning to Karolin for inclusion in 3.5.0 final and 3.4.6. Karolin, as this one is a crash bug it's important to get in for 3.5.0 final. Thanks, Jeremy.
Great work folks!! I have the newer samba 3.4.5 (including the patch) running for 3 hours now without any problems or smbd_panics - formerly I had smbd_panics almost within an hour when I tried it so for me the problem seems to be solved. Thank you very much! regards Martin
*** Bug 7073 has been marked as a duplicate of this bug. ***
Raising to "blocker", as we really must get this into 3.5.0 final. Looks like this bug got closed by mistake - it needs to be in the "open" state until the fixes get pushed to 3.5.0 and 3.4.6. Jeremy.
(In reply to comment #42) > Raising to "blocker", as we really must get this into 3.5.0 final. Looks like > this bug got closed by mistake - it needs to be in the "open" state until the > fixes get pushed to 3.5.0 and 3.4.6. You are right. Not sure why I thought I pushed the patches... Sorry! Now, both patches have been pushed. Closing out bug report. Thanks for verifying, Jeremy!
Sorry, re-opening to cover second worrying use of memcpy. Jeremy.
Created attachment 5303 [details] git-am fix for 3.5.0 and 3.4.6. Guenther, please review and ask Karolin to push if you're ok. Safely checks the other memcpy inside rpc_spoolss_nt.c Jeremy.
Comment on attachment 5303 [details] git-am fix for 3.5.0 and 3.4.6. yep, looks fine
Karolin, please include that last patch as well.
(In reply to comment #47) > Karolin, please include that last patch as well. > Done. Closing out bug report. Thanks!