Bug 14144 - Samba periodically loses printer drivers
Summary: Samba periodically loses printer drivers
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Printing (show other bugs)
Version: 4.9.9
Hardware: All Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: printing-maintainers
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-27 16:38 UTC by Alex K
Modified: 2020-06-02 13:52 UTC (History)
2 users (show)

See Also:


Attachments
testparm.txt (11.54 KB, text/plain)
2019-09-27 16:38 UTC, Alex K
no flags Details
logs-for-1-hour-spoolssd.zip (1.76 MB, application/zip)
2019-09-27 16:40 UTC, Alex K
no flags Details
logs-for-1-hour-other.zip (359.16 KB, application/zip)
2019-09-27 16:41 UTC, Alex K
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex K 2019-09-27 16:38:28 UTC
I have ~100 printservers that allow Windows machines to print. Every day one of them is losing the drivers associated with all the printers. Specifically, 

"rpcclient localhost -c enumprinters"
returns 

	name:[\\LOCALHOST\tars-plotter]
	description:[\\LOCALHOST\tars-plotter,,DESIGNJET Z5200 POSTSCRIPT]

instead of 

	name:[\\LOCALHOST\tars-plotter]
	description:[\\LOCALHOST\tars-plotter,HP Designjet Z5200ps PS3,DESIGNJET Z5200 POSTSCRIPT]

Note the "HP Designjet Z5200ps PS3" driver association missing in the first output. 

In other words, a server will be working for a couple of months normally, and then one day all the driver associations are gone. The drivers are still there, registry seems to have all the records for the drivers, but the drivers are no longer associated with corresponding printers. 

Windows clients will report "No driver is set for this printer" when connecting to such a server. 

This happens to every server: high load ones serving many users, idle standby ones, testing ones where not a single user ever connects. This been happening for years since Samba 3.x, but I couldn't find a way to collect level10 logs until now when it finally happened on a testing server with level = 10 debug logs enabled. 


I now have a monitoring script that checks driver assignments every 15 minutes and alerts me when they disappear. 

Here, the script alerted me on [2019/09/21 18:01:00]. 
I collected all the logs for the previous hour (the script should have checked the drivers at least two times before that and found them in place), and attached in a zip file here. 

I'm also attaching testparm -sv output. 
Please help catching this elusive bug.
Comment 1 Alex K 2019-09-27 16:38:48 UTC
Created attachment 15495 [details]
testparm.txt
Comment 2 Alex K 2019-09-27 16:40:46 UTC
Created attachment 15496 [details]
logs-for-1-hour-spoolssd.zip
Comment 3 Alex K 2019-09-27 16:41:01 UTC
Created attachment 15497 [details]
logs-for-1-hour-other.zip
Comment 4 David Disseldorp 2019-09-28 11:54:44 UTC
Thanks for the detailed description + logs.

The bug sounds eerily familiar, I think we've come across it in the past.
Looking at the delete_and_reload_printers_full() invocations, it appears that all printers remain present from the start of the log at 17:00 through to ~17:48, at which point we see:
log.spoolssd.bq-[2019/09/21 17:48:05.957943,  6, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/printer_list.c:104(printer_list_get_printer)
log.spoolssd.bq:  Failed to fetch record! The printer database is empty?
log.spoolssd.bq-[2019/09/21 17:48:05.957973,  3, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:100(delete_and_reload_printers_full)
log.spoolssd.bq:  removing stale printer irongiant-color
--
log.spoolssd.bq-[2019/09/21 17:48:06.129206,  6, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/printer_list.c:104(printer_list_get_printer)
log.spoolssd.bq:  Failed to fetch record! The printer database is empty?
log.spoolssd.bq-[2019/09/21 17:48:06.129227,  3, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:100(delete_and_reload_printers_full)
log.spoolssd.bq:  removing stale printer duke-color
...

At which point all printers get purged, but are again added shortly afterwards following a subsequent printcap cache update:

log.spoolssd.bq-[2019/09/21 17:50:06.120887,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [irongiant-color], if it doesn't exist.
--
log.spoolssd.bq-[2019/09/21 17:50:06.123434,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [duke-color], if it doesn't exist.
--
log.spoolssd.bq-[2019/09/21 17:50:06.125685,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [calculon-color], if it doesn't exist.
--
log.spoolssd.bq-[2019/09/21 17:50:06.127878,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [marvin-color], if it doesn't exist.
--
log.spoolssd.bq-[2019/09/21 17:50:06.130212,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [dublin-color], if it doesn't exist.
--
log.spoolssd.bq-[2019/09/21 17:50:06.132409,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [00000000-default], if it doesn't exist.
--
log.spoolssd.bq-[2019/09/21 17:50:06.134771,  8, pid=133646, effective(0, 0), real(0, 0)] ../source3/printing/queue_process.c:119(delete_and_reload_printers_full)
log.spoolssd.bq:  Adding default registry entry for printer [dalek-color], if it doesn't exist.


At this staged I'd suppose that delete_and_reload_printers_full() is running against an invalid printer_list.tdb, either due to cups providing an incorrect CUPS_GET_PRINTERS response, or (more likely) something racy in Samba's cups printcap cache update codepath.
Comment 5 Alex K 2019-09-30 14:33:58 UTC
Thank you for looking into this!

> cups providing an incorrect CUPS_GET_PRINTERS response

Luckily, I have debug logs for CUPS as well. With this server being idle, it's easily to match the connection in the logs. 

Here is 2019/09/21 17:48:06 (recorded by CUPS as 17:48:05):

D [21/Sep/2019:17:48:05 +0000] [Client 11024] Server address is "[v1.::1]".
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Accepted from localhost:45834 (IPv6)
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Waiting for request.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] POST / HTTP/1.1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=200, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] No authentication data provided.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=100, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] 2.0 CUPS-Get-Printers 1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Returning IPP successful-ok for CUPS-Get-Printers (no URI) from localhost.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Content-Length: 109505
D [21/Sep/2019:17:48:05 +0000] [Client 11024] cupsdSendHeader: code=200, type="application/ipp", auth_type=0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] con->http=0x564a83e852d0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] cupsdWriteClient error=0, used=0, state=HTTP_STATE_POST_SEND, data_encoding=HTTP_ENCODING_LENGTH, data_remaining=109505, response=0x564a83e6fcd0(IPP_STATE_DATA), pipe_pid=0, file=-1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Writing IPP response, ipp_state=IPP_STATE_ATTRIBUTE, old wused=0, new wused=0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] bytes=1, http_state=7, data_remaining=109505
D [21/Sep/2019:17:48:05 +0000] [Client 11024] con->http=0x564a83e852d0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] cupsdWriteClient error=0, used=0, state=HTTP_STATE_POST_SEND, data_encoding=HTTP_ENCODING_LENGTH, data_remaining=109505, response=0x564a83e6fcd0(IPP_STATE_DATA), pipe_pid=0, file=-1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Writing IPP response, ipp_state=IPP_STATE_DATA, old wused=0, new wused=0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] bytes=0, http_state=0, data_remaining=109505
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Flushing write buffer.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] New state is HTTP_STATE_WAITING
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Waiting for request.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] POST / HTTP/1.1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Read: status=200, state=6
D [21/Sep/2019:17:48:05 +0000] [Client 11024] No authentication data provided.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] 2.0 CUPS-Get-Classes 1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Returning IPP successful-ok for CUPS-Get-Classes (no URI) from localhost.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Content-Length: 75
D [21/Sep/2019:17:48:05 +0000] [Client 11024] cupsdSendHeader: code=200, type="application/ipp", auth_type=0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] con->http=0x564a83e852d0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] cupsdWriteClient error=0, used=0, state=HTTP_STATE_POST_SEND, data_encoding=HTTP_ENCODING_LENGTH, data_remaining=75, response=0x564a83dfa010(IPP_STATE_DATA), pipe_pid=0, file=-1
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Writing IPP response, ipp_state=IPP_STATE_DATA, old wused=0, new wused=0
D [21/Sep/2019:17:48:05 +0000] [Client 11024] bytes=0, http_state=0, data_remaining=75
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Flushing write buffer.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] New state is HTTP_STATE_WAITING
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Waiting for request.
D [21/Sep/2019:17:48:05 +0000] [Client 11024] HTTP_STATE_WAITING Closing for error 32 (Broken pipe)
D [21/Sep/2019:17:48:05 +0000] [Client 11024] Closing connection.


I see that CUPS replied with 109505 bytes of data and success status. The output doesn't differ from 2019/09/21 17:50:06 that you mentioned later, when Samba asked for the list of printers again. Same 109505 bytes were transferred. 

D [21/Sep/2019:17:50:06 +0000] [Client 11043] Server address is "[v1.::1]".
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Accepted from localhost:45980 (IPv6)
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Waiting for request.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] POST / HTTP/1.1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=200, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] No authentication data provided.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] 2.0 CUPS-Get-Printers 1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Returning IPP successful-ok for CUPS-Get-Printers (no URI) from localhost.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Content-Length: 109505
D [21/Sep/2019:17:50:06 +0000] [Client 11043] cupsdSendHeader: code=200, type="application/ipp", auth_type=0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] con->http=0x564a83e852d0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] cupsdWriteClient error=0, used=0, state=HTTP_STATE_POST_SEND, data_encoding=HTTP_ENCODING_LENGTH, data_remaining=109505, response=0x564a83e9c880(IPP_STATE_DATA), pipe_pid=0, file=-1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Writing IPP response, ipp_state=IPP_STATE_ATTRIBUTE, old wused=0, new wused=0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] bytes=1, http_state=7, data_remaining=109505
D [21/Sep/2019:17:50:06 +0000] [Client 11043] con->http=0x564a83e852d0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] cupsdWriteClient error=0, used=0, state=HTTP_STATE_POST_SEND, data_encoding=HTTP_ENCODING_LENGTH, data_remaining=109505, response=0x564a83e9c880(IPP_STATE_DATA), pipe_pid=0, file=-1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Writing IPP response, ipp_state=IPP_STATE_DATA, old wused=0, new wused=0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] bytes=0, http_state=0, data_remaining=109505
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Flushing write buffer.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] New state is HTTP_STATE_WAITING
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Waiting for request.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] POST / HTTP/1.1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=200, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] No authentication data provided.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Read: status=100, state=6
D [21/Sep/2019:17:50:06 +0000] [Client 11043] 2.0 CUPS-Get-Classes 1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Returning IPP successful-ok for CUPS-Get-Classes (no URI) from localhost.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Content-Length: 75
D [21/Sep/2019:17:50:06 +0000] [Client 11043] cupsdSendHeader: code=200, type="application/ipp", auth_type=0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] con->http=0x564a83e852d0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] cupsdWriteClient error=0, used=0, state=HTTP_STATE_POST_SEND, data_encoding=HTTP_ENCODING_LENGTH, data_remaining=75, response=0x564a83e32d90(IPP_STATE_DATA), pipe_pid=0, file=-1
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Writing IPP response, ipp_state=IPP_STATE_DATA, old wused=0, new wused=0
D [21/Sep/2019:17:50:06 +0000] [Client 11043] bytes=0, http_state=0, data_remaining=75
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Flushing write buffer.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] New state is HTTP_STATE_WAITING
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Waiting for request.
D [21/Sep/2019:17:50:06 +0000] [Client 11043] HTTP_STATE_WAITING Closing for error 32 (Broken pipe)
D [21/Sep/2019:17:50:06 +0000] [Client 11043] Closing connection.


What's puzzling for me here is that I have 27Gb of debug logs going back for a few weeks, but this is one and only invocation of printer_list_get_printer in the logs. Shouldn't it happen every 60 seconds?
Comment 6 Alex K 2020-06-02 13:52:14 UTC
Quick note: we upgraded to 4.11.6 since then, and it keeps happening.