Bug 7063 - Samba 3.4.5 on ubuntu 8.04 64 bit - Core dumps
Samba 3.4.5 on ubuntu 8.04 64 bit - Core dumps
Status: RESOLVED FIXED
Product: Samba 3.4
Classification: Unclassified
Component: File services
3.4.5
x64 Linux
: P3 regression
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
: 7073 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-26 08:10 UTC by Martin Hochreiter
Modified: 2010-02-10 02:51 UTC (History)
5 users (show)

See Also:


Attachments
Proposed patch for 3.4.x, 3.5.0 and master (620 bytes, patch)
2010-02-02 16:43 UTC, Jeremy Allison
no flags Details
Proposed patch for 3.4.x, 3.5.0 and master (slightly better) (648 bytes, patch)
2010-02-02 16:50 UTC, Jeremy Allison
jra: review? (gd)
Details
git-am fix for 3.5.0. (1.28 KB, patch)
2010-02-02 18:36 UTC, Jeremy Allison
jra: review? (gd)
Details
git-am fix for 3.4.6. (1.21 KB, text/x-diff)
2010-02-02 18:40 UTC, Jeremy Allison
jra: review? (gd)
Details
git-am fix for 3.5.0 and 3.4.6. (1.95 KB, patch)
2010-02-09 17:01 UTC, Jeremy Allison
gd: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Hochreiter 2010-01-26 08:10:34 UTC
I recently installed (compiled from sources) Samba 3.4.5 on Ubuntu 8.04 64 bit and
had to downgrad again because of massive core panics:

I can't really tell you what caused that (maybe while printing via samba-cups) - can
somebody give me a hint or tell me what information you need to analyze that please?

[2010/01/25 13:46:52,  0] lib/fault.c:46(fault_report)
  ===============================================================
[2010/01/25 13:46:52,  0] lib/fault.c:47(fault_report)
  INTERNAL ERROR: Signal 11 in pid 7553 (3.4.5)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2010/01/25 13:46:52,  0] lib/fault.c:49(fault_report)

  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2010/01/25 13:46:52,  0] lib/fault.c:50(fault_report)
  ===============================================================
[2010/01/25 13:46:52,  0] lib/util.c:1480(smb_panic)
  PANIC (pid 7553): internal error
[2010/01/25 13:46:52,  0] lib/util.c:1584(log_stack_trace)
  BACKTRACE: 25 stack frames:
   #0 /opt/samba345/sbin/smbd(log_stack_trace+0x1a) [0x6bc42b]
   #1 /opt/samba345/sbin/smbd(smb_panic+0x55) [0x6bc52a]
   #2 /opt/samba345/sbin/smbd [0x6ad1ae]
   #3 /lib/libpthread.so.0 [0x7fa381cb77d0]
   #4 /etc/samba/libtalloc.so.1 [0x7fa380da9761]
   #5 /etc/samba/libtalloc.so.1(talloc_free+0x1cc) [0x7fa380dab92e]
   #6 /opt/samba345/sbin/smbd [0x6249ca]
   #7 /opt/samba345/sbin/smbd(api_pipe_request+0x3ac) [0x656ef5]
   #8 /opt/samba345/sbin/smbd [0x650b46]
   #9 /opt/samba345/sbin/smbd(np_write_send+0x912) [0x651a13]
   #10 /opt/samba345/sbin/smbd [0x4a7456]
   #11 /opt/samba345/sbin/smbd [0x4a79d9]
   #12 /opt/samba345/sbin/smbd(reply_trans+0x6b7) [0x4a85af]
   #13 /opt/samba345/sbin/smbd [0x4fee33]
   #14 /opt/samba345/sbin/smbd [0x5012e8]
   #15 /opt/samba345/sbin/smbd [0x501a6c]
   #16 /opt/samba345/sbin/smbd(run_events+0x131) [0x6cb350]
   #17 /opt/samba345/sbin/smbd(smbd_process+0x790) [0x500cb7]
   #18 /opt/samba345/sbin/smbd [0x947e15]
   #19 /opt/samba345/sbin/smbd(run_events+0x131) [0x6cb350]
   #20 /opt/samba345/sbin/smbd [0x6cb5b0]
   #21 /opt/samba345/sbin/smbd(_tevent_loop_once+0x84) [0x6cb929]
   #22 /opt/samba345/sbin/smbd(main+0xf17) [0x947b99]
   #23 /lib/libc.so.6(__libc_start_main+0xf4) [0x7fa3806481c4]
   #24 /opt/samba345/sbin/smbd [0x48e629]
[2010/01/25 13:46:52,  0] lib/fault.c:326(dump_core)
  dumping core in /var/log/samba/cores/smbd 

I let my samba run for about an hour until smb_panic occured - 
I have enclosed the whole /var/log/samba/ directory and put the
smb.conf in it.

Hope you find something to that topic

regards
Martin
Comment 1 Martin Hochreiter 2010-01-26 09:03:35 UTC
As the compressed logs are to big - 

http://195.202.144.10/~nsc/varlogsamba.rar
Comment 2 Volker Lendecke 2010-01-26 12:21:52 UTC
Thanks, that helps a bit. This is in the reply of spoolss_EnumPrinterData. Probably an impossible request for you, but it would still be VERY helpful if you could run the smbd under valgrind. Ubuntu should have valgrind packages, on 9.04 I can install them with

sudo aptitude install valgrind

It should be possible to find something similar under 8.04. When installed, for a test could you shutdown smbd and run

sudo valgrind --tool=memcheck --num-callers=20 --trace-children=yes smbd -d 0 > /tmp/valgrind.out 2>&1 &

This will slowdown smbd a LOT, but /tmp/valgrind.out will contain very valuable information for us.

Thanks,

Volker
Comment 3 Martin Hochreiter 2010-01-27 02:35:07 UTC
I have done the valgrind debugging (altough it didn't generated that much data) - 
please find today's testing on:

/tmp/ http://195.202.144.10/~nsc/tmp.rar

/var/log/samba/ http://195.202.144.10/~nsc/varlogsamba.rar

regards
Martin
Comment 4 Volker Lendecke 2010-01-27 04:25:49 UTC
==17767== Invalid write of size 8
==17767==    at 0x714A761: _talloc_free (in /etc/samba/libtalloc.so.1)
==17767==    by 0x714C92D: talloc_free (in /etc/samba/libtalloc.so.1)
==17767==    by 0x6249C9: api_spoolss_EnumPrinterData (in /opt/samba345/sbin/smbd)
==17767==    by 0x656EF4: api_pipe_request (in /opt/samba345/sbin/smbd)
==17767==    by 0x650B45: process_complete_pdu (in /opt/samba345/sbin/smbd)
==17767==    by 0x651A12: np_write_send (in /opt/samba345/sbin/smbd)
==17767==    by 0x4A7455: api_fd_reply (in /opt/samba345/sbin/smbd)
==17767==    by 0x4A79D8: handle_trans (in /opt/samba345/sbin/smbd)
==17767==    by 0x4A85AE: reply_trans (in /opt/samba345/sbin/smbd)
==17767==    by 0x4FEE32: switch_message (in /opt/samba345/sbin/smbd)
==17767==    by 0x5012E7: process_smb (in /opt/samba345/sbin/smbd)
==17767==    by 0x501A6B: smbd_server_connection_handler (in /opt/samba345/sbin/smbd)
==17767==    by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd)
==17767==    by 0x500CB6: smbd_process (in /opt/samba345/sbin/smbd)
==17767==    by 0x947E14: smbd_accept_connection (in /opt/samba345/sbin/smbd)
==17767==    by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd)
==17767==    by 0x6CB5AF: s3_event_loop_once (in /opt/samba345/sbin/smbd)
==17767==    by 0x6CB928: _tevent_loop_once (in /opt/samba345/sbin/smbd)
==17767==    by 0x947B98: main (in /opt/samba345/sbin/smbd)
==17767==  Address 0xca is not stack'd, malloc'd or (recently) free'd


That is the crash. It is a pity that your compiler is broken and does not do any debug symbols, but we probably have to live with that as it is.

Günther, does this ring any bell for you?

Volker
Comment 5 Martin Hochreiter 2010-01-28 05:33:14 UTC
(In reply to comment #4)
> ==17767== Invalid write of size 8
> ==17767==    at 0x714A761: _talloc_free (in /etc/samba/libtalloc.so.1)
> ==17767==    by 0x714C92D: talloc_free (in /etc/samba/libtalloc.so.1)
> ==17767==    by 0x6249C9: api_spoolss_EnumPrinterData (in
> /opt/samba345/sbin/smbd)
> ==17767==    by 0x656EF4: api_pipe_request (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x650B45: process_complete_pdu (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x651A12: np_write_send (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x4A7455: api_fd_reply (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x4A79D8: handle_trans (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x4A85AE: reply_trans (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x4FEE32: switch_message (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x5012E7: process_smb (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x501A6B: smbd_server_connection_handler (in
> /opt/samba345/sbin/smbd)
> ==17767==    by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x500CB6: smbd_process (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x947E14: smbd_accept_connection (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x6CB34F: run_events (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x6CB5AF: s3_event_loop_once (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x6CB928: _tevent_loop_once (in /opt/samba345/sbin/smbd)
> ==17767==    by 0x947B98: main (in /opt/samba345/sbin/smbd)
> ==17767==  Address 0xca is not stack'd, malloc'd or (recently) free'd
> 
> 
> That is the crash. It is a pity that your compiler is broken and does not do
> any debug symbols, but we probably have to live with that as it is.
> 
> Günther, does this ring any bell for you?
> 
> Volker
> 

Can you analyze that data so far to find the root cause auf this or do you need more to find a solution ... or "do I have to live with that" ? :)

regards
martin
Comment 6 Volker Lendecke 2010-01-28 05:38:45 UTC
I am sorry, but I am very busy with paying customer's requests. I spend about 13 hours yesterday doing that. I have to sleep a bit, so this will have to wait.

Volker
Comment 7 Martin Hochreiter 2010-01-28 05:54:36 UTC
Volker, take care about the issue when you have the time to - it is absolutly not urgent as I run samba 3.028 in parallel.
I just wanted to know if you have a point to get into it (sometimes)

thank you
martin
Comment 8 Orion Poplawski 2010-01-28 16:30:43 UTC
I seem to be seeing the same thing.  3.4.5 on CentOS 5.4 32-bit.

(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x00f93df0 in raise () from /lib/libc.so.6
#2  0x00f95701 in abort () from /lib/libc.so.6
#3  0x0091e0ec in dump_core () at lib/fault.c:337
#4  0x00930c4a in smb_panic (why=0xc89c1e "internal error") at lib/util.c:1496
#5  0x0091e6de in sig_fault (sig=11) at lib/fault.c:52
#6  <signal handler called>
#7  0x0022b2d0 in _talloc_free (ptr=0x832d888) at ../lib/talloc/talloc.c:539
#8  0x0022e0d3 in talloc_free (ptr=0x832d508) at ../lib/talloc/talloc.c:562
#9  0x00888b5b in api_spoolss_EnumPrinterData (p=0x83c6140) at ../librpc/gen_ndr/srv_spoolss.c:5706
#10 0x008b5d46 in api_pipe_request (p=0x83c6140) at rpc_server/srv_pipe.c:2329
#11 0x008b13ba in process_complete_pdu (p=0x83c6140) at rpc_server/srv_pipe_hnd.c:480
#12 0x008b1aec in np_write_send (mem_ctx=0x832d308, ev=0x82c18d0, handle=0x83d9160, 
    data=0x832d348 "\005", len=40) at rpc_server/srv_pipe_hnd.c:741
#13 0x006d8f90 in api_fd_reply (conn=0x83c5538, vuid=100, req=0x832d288, setup=0x83285c0, 
    data=0x84abc80 "\005", params=0x0, suwcnt=2, tdscnt=56, tpscnt=0, mdrcnt=3696, mprcnt=0)
    at smbd/ipc.c:250
#14 0x006d94d9 in handle_trans (conn=0x83c5538, req=0x832d288, state=0x847d650) at smbd/ipc.c:513
#15 0x006da330 in reply_trans (req=0x832d288) at smbd/ipc.c:755
#16 0x0073b9be in switch_message (type=37 '%', req=0x832d288, size=144) at smbd/process.c:1377
#17 0x0073e570 in process_smb (conn=0x83db208, inbuf=<value optimized out>, nread=144, 
    unread_bytes=0, encrypted=false, deferred_pcd=0x0) at smbd/process.c:1408
#18 0x0073e98e in smbd_server_connection_handler (ev=0x82c18d0, fde=0x8326c30, flags=1, 
    private_data=0x83db208) at smbd/process.c:1887
#19 0x00941a47 in run_events (ev=0x82c18d0, selrtn=1, read_fds=0xbff8e74c, write_fds=0xbff8e6cc)
    at lib/events.c:126
#20 0x0073de06 in smbd_process () at smbd/process.c:820
#21 0x00c06175 in smbd_accept_connection (ev=0x82c18d0, fde=0x83dafc0, flags=1, 
    private_data=0x83da9f0) at smbd/server.c:395
#22 0x00941a47 in run_events (ev=0x82c18d0, selrtn=2, read_fds=0xbff8ea18, write_fds=0xbff8e998)
    at lib/events.c:126
#23 0x00941cd9 in s3_event_loop_once (ev=0x82c18d0, location=0xd0c878 "smbd/server.c:681")
    at lib/events.c:185
#24 0x00942306 in _tevent_loop_once (ev=0x82c18d0, location=0xd0c878 "smbd/server.c:681")
    at ../lib/tevent/tevent.c:490
#25 0x00c05cf8 in main (argc=0, argv=0xbff8ee24) at smbd/server.c:681
(gdb) up 7
#7  0x0022b2d0 in _talloc_free (ptr=0x832d888) at ../lib/talloc/talloc.c:539
539                     _TLIST_REMOVE(tc->parent->child, tc);


#9  0x00888b5b in api_spoolss_EnumPrinterData (p=0x83c6140) at ../librpc/gen_ndr/srv_spoolss.c:5706
5706            talloc_free(r);
(gdb) print p
$3 = (pipes_struct *) 0x83c6140
(gdb) print *p
$4 = {next = 0x0, prev = 0x0, client_address = "192.168.0.99", '\0' <repeats 33 times>, 
  server_info = 0x83c6348, syntax = {uuid = {time_low = 305419896, time_mid = 4660, 
   time_hi_and_version = 43981, clock_seq = "�", node = "\001#Eg\211�"}, if_version = 1}, 
  contexts = 0x83c6820, hdr = {major = 5 '\005', minor = 0 '\0', pkt_type = 0 '\0', flags = 3 '\003', 
    pack_type = "\020\000\000", frag_len = 56, auth_len = 0, call_id = 238}, hdr_req = {
    alloc_hint = 32, context_id = 0, opnum = 72}, auth = {auth_type = PIPE_AUTH_TYPE_NONE, 
    auth_level = PIPE_AUTH_LEVEL_NONE, a_u = {schannel_auth = 0x0, auth_ntlmssp_state = 0x0}, 
    auth_data_free_func = 0}, dc = 0x0, pipe_bound = true, fault_state = false, 
  bad_handle_fault_state = false, rng_fault_state = false, endian = false, in_data = {
    current_in_pdu = 0x847d358 "\005", pdu_needed_len = 0, pdu_received_len = 56, data = {io = true, 
      bigendian_data = false, align = 4 '\004', is_dynamic = true, data_offset = 32, 
      buffer_size = 32, grow_size = 3692, data_p = 0x847f500 "", mem_ctx = 0x848e4d0, 
      sess_key = 0x0}}, out_data = {rdata = {io = false, bigendian_data = false, align = 4 '\004', 
      is_dynamic = true, data_offset = 152, buffer_size = 256, grow_size = 152, 
      data_p = 0x83d5450 "@", mem_ctx = 0x848e4d0, sess_key = 0x0}, data_sent_length = 0, frag = {
      io = false, bigendian_data = false, align = 4 '\004', is_dynamic = false, data_offset = 0, 
      buffer_size = 0, grow_size = 48, data_p = 0x0, mem_ctx = 0x848e4d0, sess_key = 0x0}, 
    current_pdu_sent = 0}, mem_ctx = 0x848e4d0, pipe_handles = 0x83d3400}
(gdb) print r
No symbol "r" in current context.

#12 0x008b1aec in np_write_send (mem_ctx=0x832d308, ev=0x82c18d0, handle=0x83d9160, 
    data=0x832d348 "\005", len=40) at rpc_server/srv_pipe_hnd.c:741
741                     process_complete_pdu(p);

(gdb) print *ev
$6 = {ops = 0xd8e4a0, fd_events = 0x8326c30, timer_events = 0x832aac0, immediate_events = 0x0, 
  signal_events = 0x83d4ae8, additional_data = 0x0, pipe_fde = 0x83db070, pipe_fds = {6, 7}, 
  debug_ops = {debug = 0x9415c0 <s3_event_debug>, context = 0x0}, nesting = {allowed = false, 
    level = 1, hook_fn = 0, hook_private = 0x0}}
(gdb) print *handle
$7 = {type = FAKE_FILE_TYPE_NAMED_PIPE, private_data = 0x83c6140}


Started seeing after making the following changes:
< old, > new
32d31
<    printcap name = /etc/printcap
39c38,40
<    use client driver = yes
---
>    printcap name = cups
> # Why?
>    use client driver = no
Comment 9 Orion Poplawski 2010-01-28 16:37:18 UTC
Reverted to original smb.conf with no effect.  Other change was to upload a new PS driver for our Ricoh Aficio CL7200 to the driver area and try printing using that.
Comment 10 Orion Poplawski 2010-01-28 16:38:20 UTC
Martin - what version did you downgrade to?  3.0.28?
Comment 11 Orion Poplawski 2010-01-29 10:24:23 UTC
(gdb) print *(struct talloc_chunk *)(ptr-48)
$5 = {next = 0x6e, prev = 0x0, parent = 0x95f88d0, child = 0x0, refs = 0x0, destructor = 0, 
  name = 0x7b85f1 "uint32_t", size = 4, flags = 3893685368, pool = 0x95f8550}

Next doesn't look valid.
Comment 12 Jeremy Allison 2010-01-29 10:37:31 UTC
Indeed - this looks like memory corruption. Can you run this under valgrind with :

valgrind --trace-children=yes /usr/sbin/smbd

Jeremy.
Comment 13 Orion Poplawski 2010-01-29 11:09:47 UTC
It fails to run under valgrind for me, or valgrind exits early.  Are Martin's valgrind logs of any use?
Comment 14 Martin Hochreiter 2010-01-29 11:13:36 UTC
well, with "memory corruption" you mean one of the 4x1 GB RAM banks of this HP
Proliant DL360 is corrupt?

I run the new valgrind on monday because I need proper activity on this PDC

@Orion

I have the Ubuntu 8.04 packages (3.028a) running in parallel with 3.4.5 self-compiled from the source

regards
Martin
Comment 15 Martin Hochreiter 2010-01-29 11:17:07 UTC
Additonal - 

I did the same thing on 3.4.5 as Orion - 

I tried the "point and click" on a windows client and it was not working any more on the 3.4.5 pdc as windows asked for the drivers (altough they are uploaded on the proper samba share and the operation is working with 3.0.28a) - so I uploaded
the cups PCL driver again and my people are using those drivers to print over the pdc ...

just to mention that

regards
(I do have a samba 3.4.5 running on another ubuntu 8.04 64 bit but without printing support)
Comment 16 Orion Poplawski 2010-01-29 11:23:01 UTC
(In reply to comment #14)
> well, with "memory corruption" you mean one of the 4x1 GB RAM banks of this HP
> Proliant DL360 is corrupt?

No - it's a program error.
Comment 17 Orion Poplawski 2010-01-29 11:27:11 UTC
Would compiling with:

#if defined(DEVELOPER) && defined(VALGRIND_MAKE_MEM_NOACCESS)                                  

be of any help?
Comment 18 Jeremy Allison 2010-01-29 12:21:02 UTC
Orion Poplawski - are you using --valgrind-trace-children=yes ?
"Exiting early" usually just means you're seeing the parent daemon die, and need to wait for the child to print errors.
I'll take a look at Martin's vargrind logs, but we really need debug symbols to help isolate this and his logs don't contain any. It's a simple as compiling with -g or including the "samba-dbg" pacakge.
Jeremy.
Comment 19 Martin Hochreiter 2010-02-01 02:57:17 UTC
So, i did as requested - i compiled with the -g option and started valgrind as Jeremy told me.
I tried to follow the debug help from the samba documentation (see at the bottom if I catched the correct smbd) and I hope that I have done anything correctly .... 

var/log/samba http://195.202.144.10/~nsc/varlogsamba.rar
valgrind.out http://195.202.144.10/~nsc/valgrind.out

Hope it helps
regards
Martin

----------------------------

root     30370  1.0  1.3 181300 55316 ?        Ss   09:46   0:02 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30383  1.6  1.4 181936 59600 ?        S    09:46   0:03 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30392  0.0  0.0  57812  1996 ?        Ss   09:46   0:00 /opt/samba345/sbin/nmbd -D
root     30394  1.7  1.5 183992 62004 ?        S    09:46   0:03 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
10459    30395  4.4  1.9 201820 78616 ?        S    09:46   0:08 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30396  6.5  1.9 195580 76844 ?        S    09:46   0:12 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30398  1.1  1.6 188684 64096 ?        S    09:46   0:02 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30427  2.0  1.6 189964 65220 ?        S    09:47   0:03 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30428  2.7  1.6 189548 65360 ?        S    09:47   0:04 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
root     30429  1.3  1.4 183956 59732 ?        S    09:47   0:02 /usr/bin/valgrind.bin --trace-children=yes /opt/samba345/sbin/smbd -d 0
10459    30528  0.9  0.5  76180 21660 ?        S    09:49   0:00 valgrind.bin --trace-children=yes /bin/sh -c /bin/sleep 90000
10459    30529  1.2  0.5  81692 22348 ?        S    09:49   0:00 valgrind.bin --trace-children=yes /bin/sleep 90000
nagios   30561  0.0  0.0  93196   616 ?        S    09:50   0:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios   30562  0.0  0.0   3944   564 ?        S    09:50   0:00 sh -c /opt/nagios/libexec/check_disk_smb -H 192.168.1.100 -W RK -s Box -u Manager -p plk78tdz -w 80 -c 95
nagios   30564  0.0  0.1  22000  4140 ?        R    09:50   0:00 /usr/bin/perl -w /opt/nagios/libexec/check_disk_smb -H 192.168.1.100 -W RK -s Box -u Manager -p plk78tdz -w 80 -c 95
root     30568  0.0  0.0  50500  2112 pts/0    R+   09:50   0:00 ps aux
root@rk01:/var/log# gdb  
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
(gdb) Quit
(gdb) gdQuit
(gdb) quit
root@rk01:/var/log# gdb /opt/samba345/sbin/smbd 30395 
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
Attaching to program: /opt/samba345/sbin/smbd, process 30395
0x0000000038066fc1 in ?? ()
(gdb) bt
#0  0x0000000038066fc1 in ?? ()
#1  0x0000000000000008 in ?? ()
#2  0x00000004049bbe50 in ?? ()
#3  0x00000004049bbe70 in ?? ()
#4  0x0000000038d2a0f0 in ?? ()
#5  0x000000000000003d in ?? ()
#6  0x00000004049bbe70 in ?? ()
#7  0x000000000000003d in ?? ()
#8  0x0000000038d2a0f0 in ?? ()
#9  0x0000000000000001 in ?? ()
#10 0x0000000038d2a0e0 in ?? ()
#11 0x0000000038eed8d8 in ?? ()
#12 0x00000000380353fa in ?? ()
#13 0x0000000402144fd8 in ?? ()
#14 0x0000000402144fd8 in ?? ()
#15 0x0000000000000057 in ?? ()
#16 0x00000007feffe618 in ?? ()
#17 0x0000000038d2a0f0 in ?? ()
#18 0x0000000038eed968 in ?? ()
#19 0x0000000000000000 in ?? ()
Comment 20 Martin Hochreiter 2010-02-01 02:59:16 UTC
and as I see in the process list - I have to change my passwords ....
(stupid mistake)
Comment 21 Orion Poplawski 2010-02-02 14:13:39 UTC
Some -d 10 info:

[2010/02/02 12:01:14,  4] rpc_server/srv_pipe.c:2297(api_rpcTNP)
  api_rpcTNP: \spoolss op 0x48 - api_rpcTNP: rpc command: SPOOLSS_ENUMPRINTERDATA
[2010/02/02 12:01:14,  6] rpc_server/srv_pipe.c:2327(api_rpcTNP)
  api_rpc_cmds[72].fn == 0x8f2730
[2010/02/02 12:01:14,  1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug)
       spoolss_EnumPrinterData: struct spoolss_EnumPrinterData
          in: struct spoolss_EnumPrinterData
              handle                   : *
                  handle: struct policy_handle
                      handle_type              : 0x00000000 (0)
                      uuid                     : 00000003-0000-0000-684b-7a76b1420000
              enum_index               : 0x00000000 (0)
              value_offered            : 0x00000080 (128)
              data_offered             : 0x00000000 (0)
[2010/02/02 12:01:14,  4] rpc_server/srv_lsa_hnd.c:180(find_policy_by_hnd_internal)
Kzv
  [0010] B1 42 00 00                                       .B.. 
[2010/02/02 12:01:14,  5] rpc_server/srv_spoolss_nt.c:7951(_spoolss_EnumPrinterData)
  _spoolss_EnumPrinterData
[2010/02/02 12:01:14,  4] rpc_server/srv_lsa_hnd.c:180(find_policy_by_hnd_internal)
Kzv
  [0010] B1 42 00 00                                       .B.. 
[2010/02/02 12:01:14,  4] rpc_server/srv_spoolss_nt.c:377(get_printer_snum)
  short name:dali
[2010/02/02 12:01:14, 10] printing/nt_printing.c:4630(get_a_printer_internal)
  get_a_printer: [dali] level 2
[2010/02/02 12:01:14,  5] printing/print_cups.c:1634(cups_pull_comment_location)
  pulling dali location
[2010/02/02 12:01:14, 10] printing/print_cups.c:87(cups_connect)
  connecting to cups server /var/run/cups/cups.sock:631
[2010/02/02 12:01:14,  8] printing/nt_printing.c:2783(unpack_devicemode)
  Unpacked devicemode [\\EARTH\dali](Letter)
[2010/02/02 12:01:14, 10] printing/nt_printing.c:2822(add_new_printer_key)
  add_new_printer_key: Inserted new data key [PrinterDriverData]
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:TrayFormSize], len: 4
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:TrayFormTable], len: 110
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:TrayFormMapSize], len: 4
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:TrayFormMap], len: 39
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:TrayFormKeywordSize], len: 4
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:TrayFormKeyword], len: 26
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:AutoConfigMode], len: 1
[2010/02/02 12:01:14,  8] printing/nt_printing.c:3732(unpack_values)
  specific: [PrinterDriverData:UiHideCapability], len: 4
[2010/02/02 12:01:14,  9] printing/nt_printing.c:4092(get_a_printer_2)
  Unpacked printer [dali] name [\\EARTH\dali] running driver [RICOH Aficio CL7200 PS]
[2010/02/02 12:01:14,  1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug)
       spoolss_EnumPrinterData: struct spoolss_EnumPrinterData
          out: struct spoolss_EnumPrinterData
              value_name               : 'TrayFormSize'
              value_needed             : *
                  value_needed             : 0x0000000c (12)
              type                     : *
                  type                     : REG_DWORD (4)
              data                     : *
                  data: ARRAY(0)
              data_needed              : *
                  data_needed              : 0x00000004 (4)
              result                   : WERR_OK
[2010/02/02 12:01:14,  0] lib/fault.c:46(fault_report)

It pretty reliably faults in api_spoolss_EnumPrinterData trying to free memory it allocates at the start of that routine, so the corruption probably occurs somewhere in that routine.
Comment 22 Orion Poplawski 2010-02-02 15:14:07 UTC
After srv_spoolss_nt.c:8059:

   r->out.value_name = talloc_strdup(r, regval_name(val));

Then:

2: *(struct talloc_chunk *) (void *) (r->out.value_name - 48) = {next = 0x82d4848, 
  prev = 0x0, parent = 0x82d44c8, child = 0x0, refs = 0x0, destructor = 0, 
  name = 0x82d4c48 "TrayFormSize", size = 13, flags = 3893685368, pool = 0x82d4148}
1: *r = {in = {handle = 0x82d4618, enum_index = 0, value_offered = 128, data_offered = 0}, 
  out = {value_name = 0x82d4c48 "TrayFormSize", value_needed = 0x82d47c8, type = 0x82d4808, 
    data = 0x82d4848 "\030H-\b\030L-\b", data_needed = 0x82d4878, result = {v = 0}}}

Notice that the r->out.data pointer is the same as *(struct talloc_chunk *) (void *) (r->out.value_name - 48) -> next.  So when we write to data:

8077:    memcpy(r->out.data, regval_data_p(val), regval_size(val));

we end up stomping on the talloc list pointers.

No idea why this happens.  Don't really know anything about how talloc works.
Comment 23 Orion Poplawski 2010-02-02 15:21:28 UTC
Looks like data is getting allocated as a uint8_t:

        r->out.data = talloc_zero_array(r, uint8_t, r->in.data_offered);

But we're writing 4 bytes:

8077            memcpy(r->out.data, regval_data_p(val), regval_size(val));
(gdb) print regval_size(val)
$9 = 4
(gdb) print regval_data_p(val)
$10 = (uint8_t *) 0x844de50 "n"




Comment 24 Orion Poplawski 2010-02-02 15:25:48 UTC
(gdb) print val
$12 = (REGISTRY_VALUE *) 0x844dd10
(gdb) print *val
$13 = {
  valuename = "TrayFormSize\000_d��\027r;\000\002\000\000\000\002$\000\f\000\017\020\001\005\000\000\000\000\000\005\025\000\000\000�B\237\220\224_d��\027r;\000\002\000\000\000\t\030\000\f\000\017\020\001\002\000\000\000\000\000\005 \000\000\000 \002\000\000\000\002\030\000\f\000\017\020\001\002\000\000\000\000\000\005 \000\000\000 \002\000\000\001\002\000\000\000\000\000\005 \000\000\000 \002\000\000\001\002\000\000\000\000\000\005 \000\000\000\020\001\000\000@", '\0' <repeats 11 times>, "h�D\b", '\0' <repeats 12 times>, "��D\b\n\000\000\000s�\024�\000\000\000\000\b\001\000\000H\000\000\000/bin/bash\000E"..., type = 4, size = 4, data_p = 0x844de50 "n"}
(gdb) print val->data_p
$14 = (uint8_t *) 0x844de50 "n"
(gdb) print val->data_p[1]
$15 = 0 '\0'
(gdb) print val->data_p[2]
$16 = 0 '\0'
(gdb) print val->data_p[3]
$17 = 0 '\0'
Comment 25 Orion Poplawski 2010-02-02 15:28:18 UTC
So, r->out.data is allocated to be the size of r->in.data_offered, which in this case is 0.
Comment 26 Orion Poplawski 2010-02-02 15:38:24 UTC
I don't see how you know how big to allocate r->out.data until you look up the registry value and see how big it is.  At the very least we need a check that we don't copy more data than is allocated.
Comment 27 Orion Poplawski 2010-02-02 16:17:48 UTC
Is it true that is we simply don't allocate r->out.data, that the talloc_strdup() call will allocate it for us?  Should that be done throughout srv_spoolss.c?
Comment 28 Jeremy Allison 2010-02-02 16:32:17 UTC
talloc_strdup is an allocating call. I'll take a look carefully at the internals of _spoolss_EnumPrinterData() to see if I can find the error.
Thanks !
Jeremy.
Comment 29 Jeremy Allison 2010-02-02 16:43:59 UTC
Created attachment 5259 [details]
Proposed patch for 3.4.x, 3.5.0 and master

Can you try this patch please ? I think it should fix the crash.
Jeremy.
Comment 30 Jeremy Allison 2010-02-02 16:50:15 UTC
Created attachment 5260 [details]
Proposed patch for 3.4.x, 3.5.0 and master (slightly better)

Slightly better patch, protects against val->data_p being null as well.
Jeremy.
Comment 31 Orion Poplawski 2010-02-02 17:21:59 UTC
One thought:

If r->in.data_offered < regval_size(val), we still return with r->out.data_needed = regval_size(val).  Client may not like that.  No idea how these things really work though.
Comment 32 Jeremy Allison 2010-02-02 17:29:29 UTC
I think that's correct. r->out.data_needed is telling the client how much data we want to return. I'll look at the 3.3.x code to make sure. Does the patch work in your environment ?
Comment 33 Orion Poplawski 2010-02-02 17:43:36 UTC
Appears to be working fine.  No crashes.  Duplex printing issues, but...
Comment 34 Jeremy Allison 2010-02-02 17:48:38 UTC
Thanks. Let me do a little more research as to whether we should return regval_size(val) or MIN(regval_size(val),r->in.data_offered) and I'll update the patch and commit.

Thanks a *LOT* for tracking this down for me - much appreciated !

Jeremy.
Comment 35 Jeremy Allison 2010-02-02 18:13:53 UTC
Ok, I checked the MS-RPRN spec. 3.1.4.1.7 Dynamically Typed Query Parameters:

It says:

--------------------------------------------------------------
  nSize or cbData: This parameter MUST be the size, in bytes, of the buffer. The value of nSize or cbData MAY be larger than the required size for the requested dynamically typed values.

  pcbNeeded or pcbData: This parameter MUST be a non-null pointer to a variable into which the server MUST copy the actual number of bytes between the start of the buffer and the last byte written by the server into the buffer, both inclusive, or the required size of the buffer in bytes if the value of the cbBuf parameter is smaller than the actual size of the data to return to the  caller.
--------------------------------------------------------------

In our code:

cbData == r->in.data_offered
pcbNeeded == r->out.data_needed

I think the text here is using "cbBuf" instead of "cbData" (I'll raise a CAR on this), but the meaning is that pcbNeeded should be filled with the actual size of the data we would have *liked* to return - which in our case is regval_size(val), so the patch is correct as is. The client knows how much the maximum data is could have received back, as that was what it sent in r->in.data_offered.
Comment 36 Jeremy Allison 2010-02-02 18:19:01 UTC
In the latest version of MS-RPRN the section is : 3.1.4.1.2 Dynamically Typed Query Parameters.
Comment 37 Jeremy Allison 2010-02-02 18:36:40 UTC
Created attachment 5261 [details]
git-am fix for 3.5.0.
Comment 38 Jeremy Allison 2010-02-02 18:40:15 UTC
Created attachment 5262 [details]
git-am fix for 3.4.6.
Comment 39 Jeremy Allison 2010-02-02 18:41:13 UTC
Confirmed fixed by submitter. Re-assigning to Karolin for inclusion in 3.5.0 final and 3.4.6. Karolin, as this one is a crash bug it's important to get in for 3.5.0 final.

Thanks,

Jeremy.
Comment 40 Martin Hochreiter 2010-02-03 04:03:53 UTC
Great work folks!!

I have the newer samba 3.4.5 (including the patch) running for 3 hours now without any problems or smbd_panics - formerly I had smbd_panics almost within an hour when I tried it so for me the problem seems to be solved.

Thank you very much!

regards
Martin
Comment 41 Guenther Deschner 2010-02-08 05:41:13 UTC
*** Bug 7073 has been marked as a duplicate of this bug. ***
Comment 42 Jeremy Allison 2010-02-08 16:39:04 UTC
Raising to "blocker", as we really must get this into 3.5.0 final. Looks like this bug got closed by mistake - it needs to be in the "open" state until the fixes get pushed to 3.5.0 and 3.4.6.

Jeremy.
Comment 43 Karolin Seeger 2010-02-09 02:19:15 UTC
(In reply to comment #42)
> Raising to "blocker", as we really must get this into 3.5.0 final. Looks like
> this bug got closed by mistake - it needs to be in the "open" state until the
> fixes get pushed to 3.5.0 and 3.4.6.

You are right. Not sure why I thought I pushed the patches... Sorry!
Now, both patches have been pushed.
Closing out bug report.

Thanks for verifying, Jeremy!
Comment 44 Jeremy Allison 2010-02-09 16:59:05 UTC
Sorry, re-opening to cover second worrying use of memcpy.
Jeremy.
Comment 45 Jeremy Allison 2010-02-09 17:01:05 UTC
Created attachment 5303 [details]
git-am fix for 3.5.0 and 3.4.6.

Guenther, please review and ask Karolin to push if you're ok. Safely checks the other memcpy inside rpc_spoolss_nt.c

Jeremy.
Comment 46 Guenther Deschner 2010-02-09 18:30:59 UTC
Comment on attachment 5303 [details]
git-am fix for 3.5.0 and 3.4.6.

yep, looks fine
Comment 47 Guenther Deschner 2010-02-09 18:32:08 UTC
Karolin, please include that last patch as well.
Comment 48 Karolin Seeger 2010-02-10 02:51:39 UTC
(In reply to comment #47)
> Karolin, please include that last patch as well.
> 

Done.
Closing out bug report.

Thanks!