Bug 7237 - Many segfaults in Samba after switching to 3.5.1
Summary: Many segfaults in Samba after switching to 3.5.1
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.5
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.5.1
Hardware: x64 Windows XP
: P3 critical
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-10 15:39 UTC by Marc Muehlfeld
Modified: 2010-03-15 03:24 UTC (History)
0 users

See Also:


Attachments
Level 10 debug file (267.99 KB, application/octet-stream)
2010-03-11 15:25 UTC, Marc Muehlfeld
no flags Details
Network trace with tcpdump (16.16 KB, application/octet-stream)
2010-03-11 15:26 UTC, Marc Muehlfeld
no flags Details
smb.conf (4.52 KB, text/plain)
2010-03-11 15:26 UTC, Marc Muehlfeld
no flags Details
patch for 3.5 (2.27 KB, patch)
2010-03-11 17:41 UTC, Guenther Deschner
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Muehlfeld 2010-03-10 15:39:03 UTC
I upgraded from 3.3.11 to 3.5.1. Everything seemed to be fine for about 15-30 mins. After that machines I got hundreds of segfaults of smbd. E. g. one of the computer (XP Sp3 with all Updates) surely wasn't used during that time. It was just turned on and a user logged in. No program was opened.

In the log I got hundreds of:

[2010/03/10 22:10:15.238815,  0] lib/fault.c:46(fault_report)
  ===============================================================
[2010/03/10 22:10:15.260745,  0] lib/fault.c:47(fault_report)
  INTERNAL ERROR: Signal 11 in pid 6097 (3.5.1)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2010/03/10 22:10:15.260779,  0] lib/fault.c:49(fault_report)
  
  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2010/03/10 22:10:15.260796,  0] lib/fault.c:50(fault_report)
  ===============================================================
[2010/03/10 22:10:15.260811,  0] lib/util.c:1465(smb_panic)
  PANIC (pid 6097): internal error
[2010/03/10 22:10:15.368860,  0] lib/util.c:1569(log_stack_trace)
  BACKTRACE: 30 stack frames:
   #0 /usr/sbin/smbd(log_stack_trace+0x1a) [0x2ab01e00ab8b]
   #1 /usr/sbin/smbd(smb_panic+0x55) [0x2ab01e00ac8f]
   #2 /usr/sbin/smbd [0x2ab01dffba81]
   #3 /lib64/libc.so.6 [0x2ab01fae02d0]
   #4 /lib64/libc.so.6(strlen+0x30) [0x2ab01fb296f0]
   #5 /lib64/libc.so.6(__strdup+0x16) [0x2ab01fb29426]
   #6 /usr/sbin/smbd [0x2ab01e059f9e]
   #7 /usr/sbin/smbd(make_user_info_map+0x211) [0x2ab01e05a48e]
   #8 /usr/sbin/smbd(make_user_info_netlogon_network+0xe5) [0x2ab01e05ab11]
   #9 /usr/sbin/smbd [0x2ab01df3840e]
   #10 /usr/sbin/smbd(_netr_LogonSamLogonWithFlags+0x9e) [0x2ab01df38a33]
   #11 /usr/sbin/smbd(_netr_LogonSamLogon+0x7f) [0x2ab01df38ad3]
   #12 /usr/sbin/smbd [0x2ab01df419f2]
   #13 /usr/sbin/smbd(api_pipe_request+0x416) [0x2ab01dfa5aef]
   #14 /usr/sbin/smbd [0x2ab01df9efb6]
   #15 /usr/sbin/smbd(np_write_send+0xa0c) [0x2ab01dfa0040]
   #16 /usr/sbin/smbd(reply_pipe_write_and_X+0x23e) [0x2ab01ddccb23]
   #17 /usr/sbin/smbd(reply_write_and_X+0x197) [0x2ab01ddd271d]
   #18 /usr/sbin/smbd [0x2ab01de11321]
   #19 /usr/sbin/smbd [0x2ab01de13df1]
   #20 /usr/sbin/smbd [0x2ab01de145e8]
   #21 /usr/sbin/smbd(run_events+0x146) [0x2ab01e019528]
   #22 /usr/sbin/smbd(smbd_process+0x86f) [0x2ab01de136c6]
   #23 /usr/sbin/smbd [0x2ab01e305c2a]
   #24 /usr/sbin/smbd(run_events+0x146) [0x2ab01e019528]
   #25 /usr/sbin/smbd [0x2ab01e019797]
   #26 /usr/sbin/smbd(_tevent_loop_once+0x84) [0x2ab01e019b19]
   #27 /usr/sbin/smbd(main+0xf79) [0x2ab01e30598b]
   #28 /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ab01facd994]
   #29 /usr/sbin/smbd [0x2ab01dd9cb89]
[2010/03/10 22:10:15.369047,  0] lib/util.c:1470(smb_panic)
  smb_panic(): calling panic action [/usr/local/bin/panic-action 6097]
[2010/03/10 22:10:16.544779,  0] lib/util.c:1478(smb_panic)
  smb_panic(): action returned status 0
[2010/03/10 22:10:16.557822,  0] lib/fault.c:326(dump_core)
  dumping core in /var/log/samba//cores/smbd
[2010/03/10 22:10:16.671561,  1] smbd/service.c:1069(make_connection_snum)
  it-01 (10.1.0.254) connect to service muehlfeld initially as user muehlfeld
(uid=1061, gid=513) (pid 6343)




And the panic-action script send me for for every crash mails with the folloging content:

(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
0x00002ab01fb49c85 in waitpid () from /lib64/libc.so.6
#0  0x00002ab01fb49c85 in waitpid () from /lib64/libc.so.6
#1  0x00002ab01faec331 in do_system () from /lib64/libc.so.6
#2  0x00002ab01e00acf5 in smb_panic () from /usr/sbin/smbd
#3  0x00002ab01dffba81 in sig_fault () from /usr/sbin/smbd
#4  <signal handler called>
#5  0x00002ab01fb296f0 in strlen () from /lib64/libc.so.6
#6  0x00002ab01fb29426 in strdup () from /lib64/libc.so.6
#7  0x00002ab01e059f9e in make_user_info () from /usr/sbin/smbd
#8  0x00002ab01e05a48e in make_user_info_map () from /usr/sbin/smbd
#9  0x00002ab01e05ab11 in make_user_info_netlogon_network ()
   from /usr/sbin/smbd
#10 0x00002ab01df3840e in _netr_LogonSamLogon_base () from /usr/sbin/smbd
#11 0x00002ab01df38a33 in _netr_LogonSamLogonWithFlags () from /usr/sbin/smbd
#12 0x00002ab01df38ad3 in _netr_LogonSamLogon () from /usr/sbin/smbd
#13 0x00002ab01df419f2 in api_netr_LogonSamLogon () from /usr/sbin/smbd
#14 0x00002ab01dfa5aef in api_pipe_request () from /usr/sbin/smbd
#15 0x00002ab01df9efb6 in process_complete_pdu () from /usr/sbin/smbd
#16 0x00002ab01dfa0040 in np_write_send () from /usr/sbin/smbd
#17 0x00002ab01ddccb23 in reply_pipe_write_and_X () from /usr/sbin/smbd
#18 0x00002ab01ddd271d in reply_write_and_X () from /usr/sbin/smbd
#19 0x00002ab01de11321 in switch_message () from /usr/sbin/smbd
#20 0x00002ab01de13df1 in process_smb () from /usr/sbin/smbd
#21 0x00002ab01de145e8 in smbd_server_connection_handler () from /usr/sbin/smbd
#22 0x00002ab01e019528 in run_events () from /usr/sbin/smbd
#23 0x00002ab01de136c6 in smbd_process () from /usr/sbin/smbd
#24 0x00002ab01e305c2a in smbd_accept_connection () from /usr/sbin/smbd
#25 0x00002ab01e019528 in run_events () from /usr/sbin/smbd
#26 0x00002ab01e019797 in s3_event_loop_once () from /usr/sbin/smbd
#27 0x00002ab01e019b19 in _tevent_loop_once () from /usr/sbin/smbd
#28 0x00002ab01e30598b in main () from /usr/sbin/smbd
The program is running.  Quit anyway (and detach it)? (y or n) [answered Y; input
not from terminal]




Please let me know if there's anything that help you fixing the problem.
Comment 1 Guenther Deschner 2010-03-10 15:57:04 UTC
looking into this
Comment 2 Guenther Deschner 2010-03-10 17:49:08 UTC
Hm, I cannot reproduce that with neither master nor 3.5.1 and a joined Windows XP 32bit SP3 (including all fixes) workstation. Login, wait, no crash.

Can you post your smb.conf and/or maybe get some more information of that incoming SamLogon call ? (network trace or high debug level logfile) ?

THanks.
Comment 3 Marc Muehlfeld 2010-03-11 15:25:46 UTC
Created attachment 5490 [details]
Level 10 debug file
Comment 4 Marc Muehlfeld 2010-03-11 15:26:06 UTC
Created attachment 5491 [details]
Network trace with tcpdump
Comment 5 Marc Muehlfeld 2010-03-11 15:26:21 UTC
Created attachment 5492 [details]
smb.conf
Comment 6 Marc Muehlfeld 2010-03-11 15:27:57 UTC
I added a level 10 debug log and a network trace, that contains the core dump messages.
Comment 7 Guenther Deschner 2010-03-11 15:39:31 UTC
ok, I see what fails. Thanks for the logs. Expect a fix available soon.
Comment 8 Guenther Deschner 2010-03-11 17:41:46 UTC
Created attachment 5493 [details]
patch for 3.5

This fixes it for me (also RPC-NETLOGON-S3 torture test checks this everytime now).
Comment 9 Guenther Deschner 2010-03-11 17:42:33 UTC
Marc, could you please test that patch ? Thanks!
Comment 10 Jeremy Allison 2010-03-11 19:02:09 UTC
Patch is correct. Re-assigning to Karolin for inclusion in 3.5.2.

Thanks !

Jeremy.
Comment 11 Marc Muehlfeld 2010-03-12 15:10:24 UTC
The patch seems to fix the issue.

Thanks.
Comment 12 Guenther Deschner 2010-03-12 17:31:33 UTC
Thanks, Marc, for verifiying. Karolin, please proceed. This absolutely needs to be in 3.5.2.
Comment 13 Karolin Seeger 2010-03-15 03:24:17 UTC
Pushed to v3-5-test.
Closing out bug report.

Thanks!