Since upgrading to 2.2.8 I have noticed several incidents where an smbd
daemon is sucking down large amounts of CPU pinging back and forth with a
Windows box which seems to be doing printer status queries. This happens mostly
during times of moderate-to-heavy load.
Sometimes killing the daemon breaks the cycle. Sometimes it justs continues
with another daemon.
I _never_ saw this with 2.2.5.
I can find the client IP address using lsof. Looking at the network traffic
between the client and the server using tcpdump, I see a whole lot of very small
packets. Looking at the packets, I see what looks like status queries.
This is a real killer since I now have to monitor the server constantly and kill
these little buggers whenever they pop up. They each tend to eat half a CPU on
a very hefty machine. My load average is hitting 30 when it should never exceed
5. Since I have about 10,000 users scattered all over Campus, and since this is
a random, intermittant problem, I really don't know where to start looking.
Since I have no explicit "lpq cache time" directive in smb.conf, I should be
getting caching with a ten second cache time. Since my "lpq command" is being
invoked many times per second, it appears that caching is not working. In fact,
I can't find any place in the Samba code where caching is done!
One possibly relevant detail is that I am using "disable spoolss = yes" because
I have found the printer driver database to be unreliable when I tried it about
a year ago.
The printer driver stuff has had a lot of work put in to it between 2.2.5 and
2.2.8. If you have the time you might like to investiagate turning on spoolss
Otherwise, some level 10 debug logs and other details (client OS's used, printer
drivers installed) would be great.
Thanks for the quick reply!
I may be able to play with spoolss support some time in the future. The main
problems with it are:
Drivers must be installed via a Windows box. This cannot be automated.
If the Samba driver database gets corrupted, all the drivers have to be
re-installed pointy-clicky manually.
If it is possible to restore the Samba driver database from backup, I
haven't seen anything about it.
But this is not my big problem at the moment.
I can appreciate your desire for level 10 debug logs. I will try to get some.
With a load average of 30, I'm wondering what the effect on my server will be of
setting the debug level to 10.
I'm also guessing that the log will get rather large. What mechanism would you
like me to use to send you the log?
As I was begining this note, my print server melted down, disrupting printing
service for the entire Campus for half an hour.
Created attachment 157 [details]
Samba config file
You asked for some more details (client OS's used, printer drivers installed):
The clients are mostly Win 2K/XP with some 98/ME.
I use the Adobe PostScript driver.
I have attached my smb.conf.
Created attachment 158 [details]
level 10 debug log
I hope this is enough of a snapshot.
The following item in WHATSNEW.txt in 3.0.0rc4 is potentially relevant:
25) Fix coherency bug in print handle/printer object caching code
that could cause XP clients to infinitely loop while updating
their local printer cache.
If I could build 3.0.0rc4 successfully under AIX 4.3.3, my problem might be
solved. But see Bugzilla Bug 526.
I would be happy to patch 2.2.8a, but I can't figure out how to relate the above
"fix" to a change in a particular file. Even using CVS.
I hope I'm not being a pest, but I'm dying here.
We're into our heavy end-of-semester printing season, and this problem is
rearing its ugly head again. I've noticed a couple of additional clues.
First, whenever an smbd process goes whacky, it requires 'kill -9' to get rid of it.
Second, the problem builds up slowly over a period of days. My current survival
strategy is to a) tell everyone to use lpr if at all possible, and b) thoroughly
kill every smbd process and restart Samba once per day.
I've seen this before with 'disable spolss = yes' (which
you have set in your smb.conf I noticed). Check the network
traffic when the smbd starts sucking up CPU time and see
if the client is actually sending data. From you log, it
looks like and Xp client is just continually polling the
printer. You really should revisit the spoolss support in 3.0.1rc2
Sorry, but the 2.2 is not under development any longer.
If you can reproduce this bug against the latest 3.0 release,
please reopen this bug and change the version in the report.