Hello, I have a problem on our AIX system with samba-3.0.7 and 3.0.8-pre2. Summary: I get weird jobid numbers from the queue command in smbclient using that AIX server. I cannot delete anything. Long story: when I login with smbclient to a printers share for the first time and I run the queue command, I see the job IDs starting from 1 and enumerating to the last one. This looks okay. When I repeat the queue command, I see weird numbers. They beginn with 256 and increase in steps of 256 (not 1). Sometimes, when a fresh job is comming in, I see an entry with the expected number (say, for 101 job, I see 245, 512, ..., 25600 and then 101). When I try to delete any of this jobs, either with the number reported by smbclient, or the one I get by division of the reported one by 256, it always says Job FOO deleted. But nothing is deleted, it is still in the queue (visible with the queue command and lpq on the command line). BTW, I know this behaviour from my Linux server using lprng. There, it always says "Job FOO deleted" when nothing has been deleted, but "Error cancelling job FOO : NT_STATUS_OK" if it was successful. Only once I was able to do delete a job on the AIX server, when I set cache time to 30 and entered "cancel nr" very quickly after the _first_ run of "queue". I was not able to reproduce it again, no matter what I used as cache time. To make sure that it is not our fault, I added some log commans to the "lprm command", "lpr command" and "lpq command" vars - the last ones do work as expected, lprm command does not work. Not executed at all. I compared the output of our lpq command with the example in lpq_parse.c, the format is identical. With debugging, the following messages (grep'ed for "job") appear: [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(84) pjobid_to_rap: jobid 10750 maps to RAP jobid 38400 [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(64) pjobid_to_rap: called. [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(84) pjobid_to_rap: jobid 10748 maps to RAP jobid 38144 [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(64) pjobid_to_rap: called. [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(84) pjobid_to_rap: jobid 58 maps to RAP jobid 38656 [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(64) pjobid_to_rap: called. [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(84) pjobid_to_rap: jobid 5668 maps to RAP jobid 38912 [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(64) pjobid_to_rap: called. [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(84) pjobid_to_rap: jobid 10763 maps to RAP jobid 39168 [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(64) pjobid_to_rap: called. [2004/11/04 20:30:50, 10] printing/printing.c:pjobid_to_rap(84) pjobid_to_rap: jobid 5674 maps to RAP jobid 39424 [2004/11/04 20:30:52, 10] printing/printing.c:rap_to_pjobid(106) rap_to_pjobid called. [2004/11/04 20:30:52, 10] printing/printing.c:rap_to_pjobid(126) rap_to_pjobid: Failed to lookup RAP jobid 38656 And now, I really confused. What are those "jobids" assigned to RAP entries? They are not real jobids (that would be somewhere between 500 and 800) and they are not lines in lpq output or something like that. And why there is something not beeing resolved that has been stored before (38656)? Thanks for your work, Eduard.
Might be realted to BUG 1941
(In reply to comment #1) > Might be realted to BUG 1941 Maybe. But that one is marked as fixed, but this bug is not. I get the same results with 3.0.8.
Yeah./ I mentioned bug 1941 based on the subject (related to lprng) but then I looked at the details and realized that the lprng issue reported was just a misconfiguration. I'm still looking into this one.
*** This bug has been marked as a duplicate of 2170 ***
not fixed. Reopening....
*** Bug 2298 has been marked as a duplicate of this bug. ***
could you test 3.0.11 and let me know how things go. Lots of printing changes in that release.
The problems with jobids are still present in the 3.0.11 release. And it's even worse: when I kill a job (assuming that the real jobid is the showed/256) the job gets killed ok. But then, when I do a queue command the killed job appears with a new jobid (the highest jobid of the queue+256). My samba server is running on a Solaris 9 sparc and the clients are Debian Linux.
ok. I didnt' believe that this was fixed in 3.0.11 but wanted to start frrom the latest code. I'll work on this some more. Thanks for the feedback.
I cannot reproduce this one. Can you attach a gzipped level 10 debug log with this report? If I understand correctly, all of this surrounds using smbclient to cancel print jobs. Do WinNT/2k/xp clients work ok ? What about Windows 9x ?
could you attach your smb.conf and the output from `lpq -P<printername>`. Thanks.
(In reply to comment #10) > I cannot reproduce this one. Can you attach a gzipped level > 10 debug log with this report? > > If I understand correctly, all of this surrounds using > smbclient to cancel print jobs. Do WinNT/2k/xp clients > work ok ? What about Windows 9x ? I can reproduce the described behavior exact. My system is a Samba-Printserver at Solaris 8 in a domain. smbclient run at the same system.When I start a print job at a Windows-XP-client and I do the que-command twice at smbclient I see the following result: Domain=[BIFAB] OS=[Unix] Server=[Samba 3.0.11] smb: \> que 1 115760 Testseite smb: \> que 256 115760 Testseite I think the canceling-behavior is a result of this. I will create an attachment for this test with loglevel 10.
Created attachment 1033 [details] log.smbd (comment #12)
(In reply to comment #13) > Created an attachment (id=1033) [edit] > log.smbd (comment #12) > Because nobody is interested in fixing this bug, I did it myself. The following patch for samba-3.0.15pre2 fix the failure in behavior of the the que-command (get weird jobid numbers from the queue command in smbclient) and the failure in the behavior of the cancel command in smbclient. The reason for the bug is a byte order problem, that means we have this bug at all BIG-ENDIAN machines. In the functions pjobid_to_rap, rap_to_pjobid, rap_jobid_delete an uint16 is extracted with SVAL from a tdb-file. At BIG-ENDIAN machines this switch the byte oder. The programmer forgot to put the uint16 with SSVAL in the tdb-file. An simple cast work only on LITTLE-ENDIAN machines. For example the functions get_updating_pid and set_updating_pid work in the right way.
Created attachment 1174 [details] This is the fix for this Bug
Olaf, Thanks very much for the fix. I knew it was a byte ordering issue when storing and retreiving the job id's. But the actual 32-bit job id are ok. It didn't realize it was the 16-bit RAP job ids. And with a dead sparc, I could never reproduce it. Wonderful work. Patch applied. WIll be on the next 3.0.15 snapshot.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.