Hi team, After I upgraded my printservers from Samba 3 to Samba 4, their CPU load spiked dramatically. Every 60 seconds (printcap cache time) all Samba processes re-read printcap configuration. With a couple of hundreds processes doing it at exactly the same moment the load on the server is huge. Why does Samba 4 causes more load when re-reading printcap info? Perhaps you can make just one smbd process re-read the printcap? Or at least have them do that at random moment instead of the same second as every other process? I'm attaching testparm output, strace output of an idle smbd process, and a graph of CPU/memory load over the last three months (before and after upgrade).
Created attachment 10008 [details] testparm.txt - testparm output
Created attachment 10009 [details] strace of idle smbd
Created attachment 10010 [details] CPU load before and after upgrade from 3 to 4
Hmmm. Are you sure it's reloading the printcap that's causing this ? There are other 60-second cycles in Samba.
No, I didn't think about it. Maybe it is something else, I was hoping strace would provide some clue. Here is what is defined in the configuration for various timeouts: $ testparm -sv | grep time.*[0-9] passwd chat timeout = 2 deadtime = 15 lpq cache time = 30 name cache timeout = 660 ctdb timeout = 0 ctdb locktime warn threshold = 0 printcap cache time = 60 cups connection timeout = 30 machine password timeout = 604800 username map cache time = 0 lock spin time = 200 oplock break wait time = 0 ldap timeout = 15 ldap connection timeout = 2 afs token lifetime = 604800 idmap cache time = 604800 idmap negative cache time = 120 winbind cache time = 300 dfree cache time = 0
Yeah, it probably *is* the printcap cache time though :-). Can you get an strace from an smbd when the load is spiking ? Jeremy.
Looks like it reloads the printers. I'm attaching the strace and log.smbd (debug level 7) of the spiking process. I changed printcap cache time to 3 minutes, but it's not clear if Samba is following this delay correctly. From the logs (two processes are running) it seems like one process is refreshing its cache every 3 min, and another every 4 min: [2014/06/12 19:41:43.497343, 5] ..(cups_cache_reload_async) [2014/06/12 19:42:45.560515, 5] ..(cups_cache_reload_async) [2014/06/12 19:45:45.670602, 5] ..(cups_cache_reload_async) [2014/06/12 19:48:45.800344, 5] ..(cups_cache_reload_async) [2014/06/12 19:49:49.453206, 5] ..(cups_cache_reload_async) [2014/06/12 19:51:45.971307, 5] ..(cups_cache_reload_async) [2014/06/12 19:53:52.395327, 5] ..(cups_cache_reload_async) [2014/06/12 19:54:46.141277, 5] ..(cups_cache_reload_async) ... [2014/06/12 20:10:04.214391, 5] ..(cups_cache_reload_async) [2014/06/12 20:12:46.938543, 5] ..(cups_cache_reload_async) [2014/06/12 20:14:07.182487, 5] ..(cups_cache_reload_async) [2014/06/12 20:15:47.038809, 5] ..(cups_cache_reload_async) [2014/06/12 20:18:10.145206, 5] ..(cups_cache_reload_async) [2014/06/12 20:18:47.145976, 5] ..(cups_cache_reload_async) print_queue_housekeeping is happening every 60 sec, doesn't seem to be configurable via smb.conf: [2014/06/12 20:19:47.197872, 5] ..(print_queue_housekeeping) [2014/06/12 20:20:47.209912, 5] ..(print_queue_housekeeping) [2014/06/12 20:21:47.267869, 5] ..(print_queue_housekeeping)
Created attachment 10026 [details] log.smbd of a spike log.smbd of a spiking process.
Created attachment 10027 [details] strace.log of a spiking process strace.log of a spiking process
Created attachment 10028 [details] log.smbd of a spike log.smbd of a spike
Created attachment 10029 [details] strace.log.gz of a spike strace.log.gz of a spike
I'm looking at a problem similar to this at the moment on a similar 3.6 code-base: Upon pulling down a list of printers from the printing backend (CUPS in this case), the background printer process writes this data to printer_list.tdb. It then notifies all smbd processes, which sees each one traverse printer_list.tdb and update its share listings. During traversal, the printer_list.tdb database is locked _exclusively_ by every smbd process. With one smbd process per client connection, contention is only encountered when the number of clients and number of printers reaches a large number (hundreds or thousands). Using traverse_read() for the per-client smbd printer_list.tdb traversal does reduce lock contention, and appears to improve the responsiveness of smbd. However, with all smbds now able to perform the traversal in parallel, there is a large spike in CPU usage across the period from printcap update until all processes are finished with the traversal. I'm now looking at implementing a change notification based method for printcap updates, such that the background printing process identifies added and removed printers and only tells per-client smbds of these changes, saving the cost of processing all printers obtained in every printcap update. @Jeremy, does this approach sound sane to you?
I can confirm that I only see this problem on servers where there are 100+ printers and 2000+ users.
(In reply to comment #12) > I'm looking at a problem similar to this at the moment on a similar 3.6 > code-base: > > Upon pulling down a list of printers from the printing backend (CUPS in this > case), the background printer process writes this data to printer_list.tdb. It > then notifies all smbd processes, which sees each one traverse printer_list.tdb > and update its share listings. During traversal, the printer_list.tdb database > is locked _exclusively_ by every smbd process. > > With one smbd process per client connection, contention is only encountered > when the number of clients and number of printers reaches a large number > (hundreds or thousands). > > Using traverse_read() for the per-client smbd printer_list.tdb traversal does > reduce lock contention, and appears to improve the responsiveness of smbd. > However, with all smbds now able to perform the traversal in parallel, > there is a large spike in CPU usage across the period from printcap update > until all processes are finished with the traversal. > > I'm now looking at implementing a change notification based method for > printcap updates, such that the background printing process identifies added > and removed printers and only tells per-client smbds of these changes, saving > the cost of processing all printers obtained in every printcap update. > > @Jeremy, does this approach sound sane to you? I'm not Jeremy, but I'm still uttering thoughts here :-) First, we might be sitting on the fcntl thundering herd problem that eventually led to the tdb mutex work. Then, a broader question: Why does smbd have to have an up-to-date printer list in memory at all? When a client issues an enumeration RPC, sure, that's the last moment when this list should be assembled, but that comes in at arbitrary, random times. smbd only needs to know that its view is not current anymore, so that *when* such an enumeration RPC comes in it can update. This can be done very cheaply with the TDB_SEQNUM trick. Other calls always go to specific printer names. If smbd is missing one, it can look specificially at a tdb key without doing a traverse. Unfortunately I haven't looked at the printing subsystem in a while, so I don't know where I'm wrong, but if there's one thing I've learned in the past, then it is that enumeration of *anything* should be avoided at high cost :-)
(In reply to comment #14) ... > I'm not Jeremy, but I'm still uttering thoughts here :-) Thanks for the feedback anyhow :-) > First, we might be sitting on the fcntl thundering herd problem that eventually > led to the tdb mutex work. With traverse_read() the fcntl thundering herd doesn't appear to be an issue, it's the sheer number of processes doing the traversal + processing in parallel that contributes to the high CPU usage. FWIW, the (3.6 based) traverse_read changes I'm talking about can be seen at http://git.samba.org/?p=ddiss/samba.git;a=commitdiff;h=56dbb02308ce54dbe141ab5bc951aa8e40066252 > Then, a broader question: Why does smbd have to have an up-to-date printer list > in memory at all? When a client issues an enumeration RPC, sure, that's the > last moment when this list should be assembled, but that comes in at arbitrary, > random times. smbd only needs to know that its view is not current anymore, so > that *when* such an enumeration RPC comes in it can update. This can be done > very cheaply with the TDB_SEQNUM trick. Other calls always go to specific > printer names. If smbd is missing one, it can look specificially at a tdb key > without doing a traverse. > > Unfortunately I haven't looked at the printing subsystem in a while, so I don't > know where I'm wrong, but if there's one thing I've learned in the past, then > it is that enumeration of *anything* should be avoided at high cost :-) Updating on client enumeration sounds like a reasonable approach, but would be much more intrusive than the existing push-based model. IIUC, we'd have a workflow similar to: - background print process poles printer backend (e.g. CUPS) at printcap cache time interval, fills printer_list.tdb (includes TDB_SEQNUM update), handles AD unpublishing of deleted printers, and awaits MSG_PRINTER_PCAP messages. - forked per-client smbd processes await enumeration requests. On receiving such a request, they compare the printer_list.tdb TDB_SEQNUM with their in-memory value (initially zero). - if TDB_SEQNUM doesn't match, then they process the current list of printers and update their in memory seqnum value. I'm not so sure I see how this removes the traversal, but it would reduce the number of processes hitting the printer_list.tdb to only the ones that are interested. One reason why I initially favoured the background process change-notification approach is that it would potentially also allow us to convert the CUPS printing backend to accept asynchronous CUPS notifications via the subscriptions API - http://www.cups.org/documentation.php/ref-subscriptions-conf.html .
(In reply to comment #15) > FWIW, the (3.6 based) traverse_read changes I'm talking about can be seen at > http://git.samba.org/?p=ddiss/samba.git;a=commitdiff;h=56dbb02308ce54dbe141ab5bc951aa8e40066252 This change looks sensible to me. Does it also apply to 4.0.x/4.1.x (or can it be easily forward-ported) ?
(In reply to comment #15) > Updating on client enumeration sounds like a reasonable approach, but would be > much more intrusive than the existing push-based model. IIUC, we'd have a > workflow similar to: > - background print process poles printer backend (e.g. CUPS) at printcap cache > time interval, fills printer_list.tdb (includes TDB_SEQNUM update), handles AD > unpublishing of deleted printers, and awaits MSG_PRINTER_PCAP messages. > - forked per-client smbd processes await enumeration requests. On receiving > such a request, they compare the printer_list.tdb TDB_SEQNUM with their > in-memory value (initially zero). > - if TDB_SEQNUM doesn't match, then they process the current list of printers > and update their in memory seqnum value. Yes, this seems to be the most scalable way of doing things I think. Nice summation of the logic needed ! > I'm not so sure I see how this removes the traversal, but it would reduce the > number of processes hitting the printer_list.tdb to only the ones that are > interested. It doesn't remove the traversal, but it makes it read-only and only at the point required, which should make it much more efficient. > One reason why I initially favoured the background process change-notification > approach is that it would potentially also allow us to convert the CUPS > printing backend to accept asynchronous CUPS notifications via the > subscriptions API - A background process change-notification approach would also fit nicely into the above refactoring wouldn't it ? It makes the background process printer list processing much more efficient. Jeremy.
(In reply to comment #17) > (In reply to comment #15) > > I'm not so sure I see how this removes the traversal, but it would reduce the > > number of processes hitting the printer_list.tdb to only the ones that are > > interested. > > It doesn't remove the traversal, but it makes it read-only and only > at the point required, which should make it much more efficient. Well, not necessarily more efficient as such, but hopefully more distributed over time, so we avoid these heavy spikes. Volker
Created attachment 10097 [details] 24 CPU server with 1k printers & 400 clients doing an exclusive traverse
Created attachment 10098 [details] 24 CPU server with 1k printers & 400 clients doing a traverse read
(In reply to comment #20) > Created attachment 10098 [details] > 24 CPU server with 1k printers & 400 clients doing a traverse read Wow, traverse read looks a LOT worse than exclusive traverse. Maybe much more parallelism kicking in?
(In reply to comment #21) > (In reply to comment #20) > > Created attachment 10098 [details] [details] > > 24 CPU server with 1k printers & 400 clients doing a traverse read > > Wow, traverse read looks a LOT worse than exclusive traverse. Maybe much more > parallelism kicking in? I wouldn't say it's worse - smbd is much more responsive, as the lock contention is mostly gone. That said, the traverse_read change alone doesn't appear to be a viable solution for the high CPU usage, which is why I plan to couple it with your suggested update-on-enumerate change.
(In reply to comment #16) > (In reply to comment #15) > > > FWIW, the (3.6 based) traverse_read changes I'm talking about can be seen at > > http://git.samba.org/?p=ddiss/samba.git;a=commitdiff;h=56dbb02308ce54dbe141ab5bc951aa8e40066252 > > This change looks sensible to me. Does it also apply to 4.0.x/4.1.x (or can it > be easily forward-ported) ? It should be easy to forward port, but as mentioned in comment#12, converting to printer_list.tdb traverse_read alone doesn't really fix the high CPU usage. With exclusive traverse (attachment#10097 [details]), the printer_list.tdb lock contention means that the printcap update takes an eternity to be propagated to all client smbd processes, but CPU usage is only moderate due to the lock-step behaviour. With traverse_read (attachment#10098 [details]), the printer_list.tdb lock contention is gone, but that means that (currently) all smbds process printer_list.tdb in parallel, resulting in a relatively short but large spike in CPU usage.
I was on vacation last week. Am now back and testing a 3.6 patch-set based on the traverse-only-if-changed + traverse-on-client-enum modifications suggested by Volker: The following changes since commit 260ab8a34c076e98868a2c8de2a7dc14138d3947: VERSION: Bump version up to 3.6.25. (2014-06-23 11:26:19 +0200) are available in the git repository at: git://git.samba.org/ddiss/samba.git printer_list_traverse_read_when_changed_36s for you to fetch changes up to 2d5a47a9ce35e5105674db7343242c8f82211834: smbd: only reprocess printer_list.tdb if it changed (2014-07-23 15:37:18 +0200) ---------------------------------------------------------------- David Disseldorp (11): printing: traverse_read the printer list for share updates printing: only reload printer shares on client enum printing: reload printer_list.tdb from in memory list printing: expose printer_list.tdb transaction helpers printing: perform atomic printer_list.tdb updates printing: remove unused pcap_cache_add() printing: add printer_list_clean_all() function printing: use printer_list_clean_all() to replace pcap cache printing: remove unused printer_list_clean_old() printing: return last change time with pcap_cache_loaded() smbd: only reprocess printer_list.tdb if it changed Volker Lendecke (1): smbd: Avoid double-free in get_print_db_byname source3/printing/load.c | 4 ++-- source3/printing/pcap.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------- source3/printing/pcap.h | 13 ++++++------- source3/printing/print_aix.c | 17 ++++++++++++++--- source3/printing/print_iprint.c | 16 ++++++++++------ source3/printing/print_standard.c | 8 ++++++-- source3/printing/print_svid.c | 11 +++++++---- source3/printing/printer_list.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------- source3/printing/printer_list.h | 16 ++++++++++------ source3/printing/printing_db.c | 4 ++-- source3/printing/spoolssd.c | 17 +---------------- source3/rpc_server/spoolss/srv_spoolss_nt.c | 8 ++++++++ source3/rpc_server/srvsvc/srv_srvsvc_nt.c | 1 + source3/smbd/server.c | 27 +++++---------------------- source3/smbd/server_reload.c | 21 +++++++++++++++++++++ source3/web/swat.c | 4 ++-- 16 files changed, 220 insertions(+), 163 deletions(-) I wouldn't say it's ready for review yet. Will perform some new benchmarks and report back.
Created attachment 10149 [details] 24 CPU server with 1k printers & 400 idle clients doing a traverse read on enumeration only Note that (as with the previous graphs) none of the clients are performing share enumeration, and as per-client smbd processes now only update their printer share list on client enumeration, there is very little stress on the system.
(In reply to comment #24) > I was on vacation last week. Am now back and testing a 3.6 patch-set based on > the traverse-only-if-changed + traverse-on-client-enum modifications suggested > by Volker: > > The following changes since commit 260ab8a34c076e98868a2c8de2a7dc14138d3947: Took a look nevertheless, this looks really, really interesting! Two thoughs come to my mind by a quick glance (did not do a real review yet): We already have a good printer_list_get_last_refresh to avoid traverses. This could be tuned further by only accessing the LAST_REFRESH record if the database seqnum changed. This would involve two static variables inside printer_list_get_last_refresh, the current seqnum and the LAST_REFRESH value. If the seqnum did not change, just return the last refresh value from memory. The other comment involves the database transactions. In current master dbwrap (have not looked at 3.6), transactions are only possible on persistent databases. I'm not sure the printer list is really persistent as passdb.tdb or secrets.tdb is.
Created attachment 10150 [details] 24 CPU server with 1k printers, 400 idle clients, and 40 netshareenum rpcclients, doing a traverse read on enumeration only For this graph, an additional 40 clients are added to the existing 400 idle client load. These extra 40 clients are issuing netshareenum(level=1) requests at regular ~2s intervals. The short, moderate spikes in CPU usage occur when the background print process updates printer_list.tdb (printcap cache time = 120). On subsequent enumeration the per-client smbd processes notice the printer list change and reload their printer share inventories. The per-client smbd reload only occurs on the first enumeration after the printer_list.tdb change.
Thanks for the feedback Volker! (In reply to comment #26) > (In reply to comment #24) > > I was on vacation last week. Am now back and testing a 3.6 patch-set based on > > the traverse-only-if-changed + traverse-on-client-enum modifications suggested > > by Volker: > > > > The following changes since commit 260ab8a34c076e98868a2c8de2a7dc14138d3947: > > Took a look nevertheless, this looks really, really interesting! > > Two thoughs come to my mind by a quick glance (did not do a real review yet): > > We already have a good printer_list_get_last_refresh to avoid traverses. This > could be tuned further by only accessing the LAST_REFRESH record if the > database seqnum changed. This would involve two static variables inside > printer_list_get_last_refresh, the current seqnum and the LAST_REFRESH value. > If the seqnum did not change, just return the last refresh value from memory. For change detection I used the printer_list_get_last_refresh record only. I could change it over to use the seqnum, but wouldn't expect much in terms of a performance gain. I don't quite see how using both fields would be necessary. > The other comment involves the database transactions. In current master dbwrap > (have not looked at 3.6), transactions are only possible on persistent > databases. I'm not sure the printer list is really persistent as passdb.tdb or > secrets.tdb is. Oh, I missed this gem. printer_list.tdb is not persistent, so I'll have to convert it over to use locking. Would be good if dbwrap/TDB transaction_start() returned error against non-persistent databases.
(In reply to comment #28) > For change detection I used the printer_list_get_last_refresh record only. I > could change it over to use the seqnum, but wouldn't expect much in terms of a > performance gain. I don't quite see how using both fields would be necessary. True, possibly we would not gain much, in particular as it's unlikely that many smbds come in really concurrently trying to read that record. But I've learned the hard way that anything like that can become a bottleneck. As this is a simple local optimization, it's perfectly fine to leave it for a later day, should it turn out to be a problem. Volker
(In reply to comment #28) ... > > The other comment involves the database transactions. In current master dbwrap > > (have not looked at 3.6), transactions are only possible on persistent > > databases. I'm not sure the printer list is really persistent as passdb.tdb or > > secrets.tdb is. > > Oh, I missed this gem. printer_list.tdb is not persistent, so I'll have to > convert it over to use locking. Would be good if dbwrap/TDB transaction_start() > returned error against non-persistent databases. OK, looks like this is a new restriction, and is only there for the case of CTDB... 416 int dbwrap_transaction_start(struct db_context *db) 417 { 418 if (!db->persistent) { 419 /* 420 * dbwrap_ctdb has two different data models for persistent 421 * and non-persistent databases. Transactions are supported 422 * only for the persistent databases. This check is here to 423 * prevent breakages of the cluster case, autobuild at this 424 * point only tests non-clustered Samba. Before removing this 425 * check, please make sure that this facility has also been 426 * added to dbwrap_ctdb. 427 * 428 * Thanks, vl 429 */ 430 DEBUG(1, ("transactions not supported on non-persistent " 431 "database %s\n", db->name)); 432 return -1; 433 } Locking logic is going to make this much uglier :(.
(In reply to comment #30) > 416 int dbwrap_transaction_start(struct db_context *db) > 417 { > 418 if (!db->persistent) { > 419 /* > 420 * dbwrap_ctdb has two different data models for persistent > 421 * and non-persistent databases. Transactions are supported > 422 * only for the persistent databases. This check is here to > 423 * prevent breakages of the cluster case, autobuild at this > 424 * point only tests non-clustered Samba. Before removing > this > 425 * check, please make sure that this facility has also been > 426 * added to dbwrap_ctdb. > 427 * > 428 * Thanks, vl > 429 */ > 430 DEBUG(1, ("transactions not supported on non-persistent " > 431 "database %s\n", db->name)); > 432 return -1; > 433 } > > Locking logic is going to make this much uglier :(. Why do partial lists hurt badly? The individual record should be fine I guess. And with a fresh enumeration everything should clean up eventually.
(In reply to comment #29) ... > > Locking logic is going to make this much uglier :(. > > Why do partial lists hurt badly? The individual record should be fine I guess. > And with a fresh enumeration everything should clean up eventually. If a client smbd reloads from a partial list, then it will remove (valid) printers that haven't been added to the list yet. Instead of locking, I could go back to the old procedure of: - write timestamp X - populate new entries from replacement pcap cache - weed out any old entries with a timestamp less than X
(In reply to comment #32) > If a client smbd reloads from a partial list, then it will remove (valid) > printers that haven't been added to the list yet. > > Instead of locking, I could go back to the old procedure of: > - write timestamp X > - populate new entries from replacement pcap cache > - weed out any old entries with a timestamp less than X Ok. I had not thought about the possibility to first delete all printers and then re-add the current list. This would indeed be disturbing to users. Your new approach certainly looks better and is of course less disruptive.
David - I also looked over this code. Looking good to me (other than the transaction issue). Ping me when you're done and I'll help review to get it into master + back ported for 4.x.x. Cheers, Jeremy.
Created attachment 10173 [details] printcap update correctness test
Created attachment 10174 [details] simple smbclient and rpcclient load generator
I plan to submit the following upstream today... The following changes since commit fb9d8c402614556d7a36f9e9efb72b3f4afe838a: lib: Remove unused nstrcpy (2014-08-04 09:58:16 +0200) are available in the git repository at: git://git.samba.org/ddiss/samba.git printer_list_traverse_read_when_changed_notrans_master for you to fetch changes up to 0b823e038e1c807b2ff9f13256a4b1b7b33c7fbd: smbd: split printer reload processing (2014-08-04 14:11:21 +0200) ---------------------------------------------------------------- David Disseldorp (7): printing: traverse_read the printer list for share updates printing: only reload printer shares on client enum printing: reload printer_list.tdb from in memory list printing: remove pcap_cache_add() printing: return last change time with pcap_cache_loaded() smbd: only reprocess printer_list.tdb if it changed smbd: split printer reload processing source3/printing/load.c | 4 ++-- source3/printing/pcap.c | 54 ++++++++++++++++++++++++++---------------------------- source3/printing/pcap.h | 13 ++++++------- source3/printing/print_aix.c | 17 ++++++++++++++--- source3/printing/print_iprint.c | 16 ++++++++++------ source3/printing/print_standard.c | 8 ++++++-- source3/printing/print_svid.c | 11 +++++++---- source3/printing/printer_list.c | 17 +++++++++++------ source3/printing/printer_list.h | 4 ++-- source3/printing/queue_process.c | 10 ++++++---- source3/printing/spoolssd.c | 38 +++++++------------------------------- source3/rpc_server/spoolss/srv_spoolss_nt.c | 11 ++++++++++- source3/rpc_server/srvsvc/srv_srvsvc_nt.c | 1 + source3/smbd/proto.h | 2 ++ source3/smbd/server.c | 20 -------------------- source3/smbd/server_reload.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------- 16 files changed, 194 insertions(+), 127 deletions(-)
Created attachment 10187 [details] patch series for v4-1-test branch, cherry-picked from master One very slight change from master patch-set, in order to handle absence of 4286e147a62af784 - loadparm: rename lp[cfg]_print_ok to lp[cfg]_printable for consistency with docs
Created attachment 10190 [details] patch series for v4-0-test branch Mostly the same as the 4.1 patch-set, with Swat and minor context changes.
Created attachment 10191 [details] patch series for v3-6 branch
Comment on attachment 10187 [details] patch series for v4-1-test branch, cherry-picked from master LGTM.
Comment on attachment 10190 [details] patch series for v4-0-test branch LGTM.
Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next.
Thanks for the review Jeremy!
Pushed to autobuild-v4-[0|1]-test.
Pushed to both branches. Closing out bug report. Thanks!
Created attachment 10295 [details] screenshot-after.png Just to show you guys how awesome you are (look at the blue line -- CPU). Thanks, much appreciated!