Bug 11964 - smb_panic: forked spoolssd crashes if CUPS becomes unavailable
Summary: smb_panic: forked spoolssd crashes if CUPS becomes unavailable
Status: CLOSED WORKSFORME
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Printing (show other bugs)
Version: 4.3.9
Hardware: All Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: printing-maintainers
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-10 18:59 UTC by Alex K
Modified: 2021-01-08 01:22 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex K 2016-06-10 18:59:45 UTC
I use spoolss in fork mode to provide printing for Windows clients: 

rpc_server:spoolss = external
rpc_daemon:spoolssd = fork

After upgrading to Samba 4.3.9 from 4.1.22 I noticed that printing subsystem sometimes becomes unavailable until I restart Samba. In the log.spoolssd I see this stack trace: 

[2016/06/09 13:31:13.930374,  0] ../source3/printing/load.c:68(load_printers)
  PANIC: assert failed at ../source3/printing/load.c(68): pcap_cache_loaded(NULL)
[2016/06/09 13:31:13.974370,  0] ../source3/lib/util.c:789(smb_panic_s3)
  PANIC (pid 3626): assert failed: pcap_cache_loaded(NULL)
[2016/06/09 13:31:14.003288,  0] ../source3/lib/util.c:900(log_stack_trace)
  BACKTRACE: 16 stack frames:
   #0 /usr/lib/x86_64-linux-gnu/samba/libsmbregistry.so.0(log_stack_trace+0x1a) [0x7f43b25cd14a]
   #1 /usr/lib/x86_64-linux-gnu/samba/libsmbregistry.so.0(smb_panic_s3+0x20) [0x7f43b25cd220]
   #2 /usr/lib/x86_64-linux-gnu/libsamba-util.so.0(smb_panic+0x2f) [0x7f43b33448df]
   #3 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(load_printers+0x17d) [0x7f43b2e9f93d]
   #4 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(+0x1216f9) [0x7f43b2f1c6f9]
   #5 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(+0x12180d) [0x7f43b2f1c80d]
   #6 /usr/lib/x86_64-linux-gnu/libtevent.so.0(tevent_common_check_signal+0x257) [0x7f43afcbb967]
   #7 /usr/lib/x86_64-linux-gnu/libsmbconf.so.0(run_events_poll+0x24) [0x7f43b12658c4]
   #8 /usr/lib/x86_64-linux-gnu/libsmbconf.so.0(+0x25c60) [0x7f43b1265c60]
   #9 /usr/lib/x86_64-linux-gnu/libtevent.so.0(_tevent_loop_once+0x8d) [0x7f43afcb7d5d]
   #10 /usr/lib/x86_64-linux-gnu/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7f43afcb7efb]
   #11 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(start_spoolssd+0x49a) [0x7f43b2f1d1da]
   #12 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(printing_subsystem_init+0xb3) [0x7f43b2e85e23]
   #13 smbd(main+0x11ba) [0x7f43b39dc30a]
   #14 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f43af910f45]
   #15 smbd(+0x7ad6) [0x7f43b39dcad6]

This happens when CUPS is restaring for whatever reason, and only kills forked spoolssd, leaving other smbd processes intact, which makes it hard to detect and restart whole service. 

log.smbd fills up with these messages after crash:

[2016/06/09 13:43:13.492344,  0] ../source3/smbd/lanman.c:842(api_DosPrintQGetInfo)
  api_DosPrintQGetInfo: could not connect to spoolss: NT_STATUS_UNSUCCESSFUL


Would be nice if spoolssd detected the crash and restarted automatically in this case, or didn't crash at all. 


Samba 4.3.9, Ubuntu 14.04 x64, CUPS 2.1.3. 
I don't think I saw this issue with Samba 4.1.22, however I'm not 100% sure.
Comment 1 Jeremy Allison 2016-06-10 22:48:38 UTC
Hmm. pcap_cache_loaded() only returns an error if printer_list_get_last_refresh() fails, which usually means a tdb parse error.

Can you instrument printer_list_get_last_refresh() to find out exactly what is failing here ?
Comment 2 Alex K 2016-06-16 15:56:26 UTC
I remember seeing "corrupted gencache.tdb" message on the console, but I thought it was no big deal because it was just cache. Btw, it only occurred with printservers that have lots of printers, 800+. 

I wonder if this happens because spoolssd was enumerating the printers and writing them to cache when CUPS restarted, thus unable to finish and corrupting the cache.

The issue is currently mostly gone, I can't catch it and can't reproduce it at will. Will update the bug once I have more info.