Since upgrading to 2.2.8 I have noticed several incidents where an smbd daemon is sucking down large amounts of CPU pinging back and forth with a Windows box which seems to be doing printer status queries. This happens mostly during times of moderate-to-heavy load. Sometimes killing the daemon breaks the cycle. Sometimes it justs continues with another daemon. I _never_ saw this with 2.2.5. I can find the client IP address using lsof. Looking at the network traffic between the client and the server using tcpdump, I see a whole lot of very small packets. Looking at the packets, I see what looks like status queries. This is a real killer since I now have to monitor the server constantly and kill these little buggers whenever they pop up. They each tend to eat half a CPU on a very hefty machine. My load average is hitting 30 when it should never exceed 5. Since I have about 10,000 users scattered all over Campus, and since this is a random, intermittant problem, I really don't know where to start looking. Since I have no explicit "lpq cache time" directive in smb.conf, I should be getting caching with a ten second cache time. Since my "lpq command" is being invoked many times per second, it appears that caching is not working. In fact, I can't find any place in the Samba code where caching is done! One possibly relevant detail is that I am using "disable spoolss = yes" because I have found the printer driver database to be unreliable when I tried it about a year ago.
The printer driver stuff has had a lot of work put in to it between 2.2.5 and 2.2.8. If you have the time you might like to investiagate turning on spoolss support again. Otherwise, some level 10 debug logs and other details (client OS's used, printer drivers installed) would be great.
Thanks for the quick reply! I may be able to play with spoolss support some time in the future. The main problems with it are: Drivers must be installed via a Windows box. This cannot be automated. If the Samba driver database gets corrupted, all the drivers have to be re-installed pointy-clicky manually. If it is possible to restore the Samba driver database from backup, I haven't seen anything about it. But this is not my big problem at the moment. I can appreciate your desire for level 10 debug logs. I will try to get some. With a load average of 30, I'm wondering what the effect on my server will be of setting the debug level to 10. I'm also guessing that the log will get rather large. What mechanism would you like me to use to send you the log? As I was begining this note, my print server melted down, disrupting printing service for the entire Campus for half an hour.
Created attachment 157 [details] Samba config file
You asked for some more details (client OS's used, printer drivers installed): The clients are mostly Win 2K/XP with some 98/ME. I use the Adobe PostScript driver. I have attached my smb.conf.
Created attachment 158 [details] level 10 debug log I hope this is enough of a snapshot.
The following item in WHATSNEW.txt in 3.0.0rc4 is potentially relevant: 25) Fix coherency bug in print handle/printer object caching code that could cause XP clients to infinitely loop while updating their local printer cache. If I could build 3.0.0rc4 successfully under AIX 4.3.3, my problem might be solved. But see Bugzilla Bug 526. I would be happy to patch 2.2.8a, but I can't figure out how to relate the above "fix" to a change in a particular file. Even using CVS. I hope I'm not being a pest, but I'm dying here.
We're into our heavy end-of-semester printing season, and this problem is rearing its ugly head again. I've noticed a couple of additional clues. First, whenever an smbd process goes whacky, it requires 'kill -9' to get rid of it. Second, the problem builds up slowly over a period of days. My current survival strategy is to a) tell everyone to use lpr if at all possible, and b) thoroughly kill every smbd process and restart Samba once per day.
I've seen this before with 'disable spolss = yes' (which you have set in your smb.conf I noticed). Check the network traffic when the smbd starts sucking up CPU time and see if the client is actually sending data. From you log, it looks like and Xp client is just continually polling the printer. You really should revisit the spoolss support in 3.0.1rc2
Sorry, but the 2.2 is not under development any longer. If you can reproduce this bug against the latest 3.0 release, please reopen this bug and change the version in the report. Thanks.
database cleanup