Bug 7233 - print fails with jobs >4GB from Win7 clients
print fails with jobs >4GB from Win7 clients
Status: ASSIGNED
Product: Samba 3.4
Classification: Unclassified
Component: Printing
3.4.6
x86 Linux
: P3 major
: ---
Assigned To: Jeremy Allison
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-10 07:04 UTC by Sebastian Kloska
Modified: 2010-03-30 17:30 UTC (History)
0 users

See Also:


Attachments
Proposed patch for 3.4.x (1.52 KB, patch)
2010-03-10 19:45 UTC, Jeremy Allison
no flags Details
level 10 log for the >4GB bug (599.48 KB, text/plain)
2010-03-15 18:03 UTC, Sebastian Kloska
no flags Details
Wireshark Screenshot (160.63 KB, image/png)
2010-03-23 06:16 UTC, Sebastian Kloska
no flags Details
Patch for 3.5.x and master (5.38 KB, patch)
2010-03-23 15:47 UTC, Jeremy Allison
no flags Details
Patch for 3.4.x. (5.23 KB, patch)
2010-03-23 16:02 UTC, Jeremy Allison
no flags Details
git-am format patch for 3.5.2. (5.88 KB, patch)
2010-03-24 14:59 UTC, Jeremy Allison
no flags Details
git-am patch for 3.5.2. (7.53 KB, patch)
2010-03-24 15:09 UTC, Jeremy Allison
no flags Details
git-am patch for 3.4.x. (7.38 KB, patch)
2010-03-24 15:14 UTC, Jeremy Allison
no flags Details
Fixed git-am patch for 3.4.8. (7.48 KB, patch)
2010-03-25 13:04 UTC, Jeremy Allison
no flags Details
patch against samba 3.4.7 that tries to solve the 4GB problem via offset calculation relative to the current 4GB chunk (6.25 KB, patch)
2010-03-29 08:33 UTC, Sebastian Kloska
no flags Details
git-am fix for 3.5.2. (8.29 KB, patch)
2010-03-29 18:30 UTC, Jeremy Allison
no flags Details
git-am fix for 3.4.x. (8.23 KB, patch)
2010-03-29 18:41 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Kloska 2010-03-10 07:04:55 UTC
Printing from jobs with sizes above 4GB failes with the newest Win7 client. The spool file grows up to 4.1GB and stops growing afterwards. No errors are reported by the windows client and the windows spooler shows full progress. log log level 2
the samba server reports the following.


[2010/03/10 12:06:04,  1] smbd/service.c:1063(make_connection_snum)
  w7-dgraf (::ffff:192.168.102.97) connect to service Hold initially as user guest (uid=2009, gid=7000) (pid 20847)
[2010/03/10 12:19:17,  1] smbd/service.c:1240(close_cnum)
  w7-dgraf (::ffff:192.168.102.97) closed connection to service Hold
[2010/03/10 12:19:53,  0] lib/util_sock.c:539(read_fd_with_timeout)
[2010/03/10 12:19:53,  0] lib/util_sock.c:1491(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  read_fd_with_timeout: client 0.0.0.0 read error = Connection reset by peer.
http://fixunix.com/hewlett-packard/506247-re-samba-creation-smbpasswd-file.html
Comment 1 Sebastian Kloska 2010-03-10 07:14:19 UTC
> http://fixunix.com/hewlett-packard/506247-re-samba-creation-smbpasswd-file.html
> 

 Ups -- this is a cut and paste artefact.
 Forgot to mention that the Win7 client is 64bit

Comment 2 Jeremy Allison 2010-03-10 19:16:29 UTC
I'll need more than a log level2 I'm afraid. Can you get me a wireshark trace of the Win7 client sending a <4Gb file to a Windows printserver, plus a wireshark trace of the same thing against smbd ? I don't need the full log, just the part around the 4gb mark.

Thanks,

Jeremy.
Comment 3 Jeremy Allison 2010-03-10 19:45:01 UTC
Created attachment 5487 [details]
Proposed patch for 3.4.x

Oooh. I think this may be our fault and an obvious flaw.... (sorry).

Can you try this patch. It should apply cleanly to 3.4.x. If you can confirm it works I'll get this into all current branches for the next release.

Thanks !

Jeremy.
Comment 4 Sebastian Kloska 2010-03-11 05:06:54 UTC
(In reply to comment #3)
That didn't seem to do the trick. I'll generate a more verbose log. It'd take a while until I find the time to install wireshark on my winbox. Is the log of use without the wireshark protocol ?
Comment 5 Jeremy Allison 2010-03-11 20:01:10 UTC
So long as it's a debug level 10 around the place where the Win7 client cancels the job, yes.

Jeremy.
Comment 6 Sebastian Kloska 2010-03-15 18:03:59 UTC
Created attachment 5500 [details]
level 10 log for the >4GB bug
Comment 7 Jeremy Allison 2010-03-15 19:47:36 UTC
This log doesn't seem to cover the case I'm interested in. It stops way before the 0xFFFFFFFF offset position which is the interesting problem.

How are you able to reproduce creating a 4+Gb print job ? If you can let me know how you do this in a way I can repeat I should be able to track down and fix this.

Jeremy.
Comment 8 Sebastian Kloska 2010-03-16 05:18:39 UTC
The whole log is much larger. I just posted the part around the point where the spool file stopped growing. Didn't know how large an attachment can be. I could post the whole log (356M) or place it on my web server. Or you may tell me which values to look for so I could cut out the interesting part.

The 7.1 GB job is the output of a pdf which is rather small in file size (4MB) but large in pages with one bitmap object referenced on every second page. Unfortunately I'm not allowed to hand out the pdf. The printer driver also seems to be involved in this problem, since not all printer drivers produce this error although our printer driver works fine when connected to a win server. I'll try to give you a setup in which to reproduce this error.
Comment 9 Sebastian Kloska 2010-03-16 11:10:56 UTC
I've placed a 1.5 M pdf file on http://www.oncaphillis.net/luke.pdf that generates >5GB PS when using Acobat Reader 9 (be patient it's a DSL up channel).

The printer drive we are using is the bizhub PRO 1050 3.0.28 which can be found under 
http://onyxftp.mykonicaminolta.com/download/SearchResults.aspx?productid=402
Comment 10 Sebastian Kloska 2010-03-22 13:48:40 UTC
So now I actually did a WireShark trace which is quite large although I've started at around 3.8 GB, If someone could tell me how to remove the smb.file_data field from the WriteRequest package it would be significantly larger.   
Comment 11 Sebastian Kloska 2010-03-23 06:16:40 UTC
Created attachment 5525 [details]
Wireshark Screenshot

Wireshark screenshot for two SMB packages on the 4GB border
Comment 12 Sebastian Kloska 2010-03-23 06:18:32 UTC
So I started to work with wireshark and found out that there is indeed an File Offset overflow. See screenshot. Makes me wonder how 4GB file transfer should work at all when the offset value is 4 byte only.
 
Comment 13 Jeremy Allison 2010-03-23 13:23:29 UTC
Thanks - that's *great* information. What I expected. The reason it works with Windows is that Windows is ignoring the file offset when it's writing to a print file. I'll make us do the same.

Give me a little time to work on this.

Jeremy.
Comment 14 Sebastian Kloska 2010-03-23 13:59:58 UTC
That's great news

Thank you


Comment 15 Jeremy Allison 2010-03-23 15:47:23 UTC
Created attachment 5527 [details]
Patch for 3.5.x and master

If running 3.5.x, please try this patch. Should fix it. I'll post a follow-up patch for 3.4.x.
Jeremy.
Comment 16 Jeremy Allison 2010-03-23 16:02:20 UTC
Created attachment 5528 [details]
Patch for 3.4.x.

Please test.

Jeremy.
Comment 17 Sebastian Kloska 2010-03-24 07:34:06 UTC
Unfortunately that didn't do the trick. The spool file of a patched version 3.4.7stops growing exactly at 0x100000000 and I couldn't even get the unpatched version 3.5.1 up and running on my up-to-date fedora 12 it aborts with

 #0 /home/kloska/smb351/sbin/smbd(log_stack_trace+0x2d) [0x4e4aff]
 #1 /home/kloska/smb351/sbin/smbd(smb_panic+0x78) [0x4e4c4c]
 #2 /home/kloska/smb351/sbin/smbd [0x4d4282]
 #3 [0xc53400]
 #4 /lib/libc.so.6(abort+0x17a) [0xc8104a]
 #5 /usr/lib/libtalloc.so.1 [0xbc931c]
 #6 /usr/lib/libtalloc.so.1(talloc_free+0x21b) [0xbcdfcb]
 #7 /usr/lib/libwbclient.so.0(wbcFreeMemory+0x21) [0xb06b21]
 #8 /usr/lib/libwbclient.so.0(wbcSidToGid+0xde) [0xb0943e]
 #9 /home/kloska/smb351/sbin/smbd(winbind_sid_to_gid+0x39) [0x4b7a94]
 #10 /home/kloska/smb351/sbin/smbd(sid_to_gid+0x117) [0x4959f0]
 #11 /home/kloska/smb351/sbin/smbd(create_local_nt_token+0x321) [0x53d9bd]
 #12 /home/kloska/smb351/sbin/smbd(get_root_nt_token+0x153) [0x53df7d]
 #13 /home/kloska/smb351/sbin/smbd(svcctl_init_keys+0x2e) [0x3f5118]
 #14 /home/kloska/smb351/sbin/smbd(registry_init_full+0x8b) [0x8177cb]
 #15 /home/kloska/smb351/sbin/smbd(main+0xa50) [0x8469d1]
 #16 /lib/libc.so.6(__libc_start_main+0xe6) [0xc6aa86]
 #17 /home/kloska/smb351/sbin/smbd [0x244421]

Comment 18 Volker Lendecke 2010-03-24 07:38:22 UTC
Please replace libwbclient with the just compiled version.

Volker
Comment 19 Jeremy Allison 2010-03-24 11:41:28 UTC
Are you also running with the patch in:

https://bugzilla.samba.org/attachment.cgi?id=5487

I think you'll need both for 3.4.x.

Jeremy.
Comment 20 Sebastian Kloska 2010-03-24 12:14:59 UTC
Oh - shame !! Installing at a non standard position and not setting LD_LIBRARY_PATH. You may now officially call me an idiot. The patched 3.5.1 seems to handle files above 4GB. I didn't apply the previous patch to the 3.4.6 -- so I'll give it a try tomorrow.

Sebastian
 

Comment 21 Jeremy Allison 2010-03-24 14:59:53 UTC
Created attachment 5534 [details]
git-am format patch for 3.5.2.

Fix tested and passed by reporter.

Jeremy.
Comment 22 Jeremy Allison 2010-03-24 15:09:39 UTC
Created attachment 5535 [details]
git-am patch for 3.5.2.

Contains both 66b7fcc8ef6b43c219dcae43a68d861cf71b80fa and    59545276f2bd6df8ee5e75e96e707fd86ff44350 from master.

Replaces previous version that didn't contain both fixes.

Jeremy.
Comment 23 Jeremy Allison 2010-03-24 15:14:24 UTC
Created attachment 5536 [details]
git-am patch for 3.4.x.

Fix for 3.4.x.

Jeremy.
Comment 24 Jeremy Allison 2010-03-24 15:14:51 UTC
Re-assigning to Karolin for inclusion once reviewed.

Jeremy.
Comment 25 Sebastian Kloska 2010-03-25 08:45:16 UTC
unfortunately applying both patches to 3.4.x does not solve the problem.

Sebastian
Comment 26 Jeremy Allison 2010-03-25 13:04:04 UTC
Created attachment 5542 [details]
Fixed git-am patch for 3.4.8.

Sebastian, here is a corrected fix for 3.4.8, which should apply cleanly on top of the 3.4.7 code.

Sorry for the mistake in the previous patch, there was a chunk missing that didn't correctly set the offset in the SMBwrite() reply call. This patch should work for 3.4.x. Please test and confirm and thanks a *lot* for testing these fixes.

Jeremy.
Comment 27 Sebastian Kloska 2010-03-26 05:04:52 UTC
Still not fixed. Samba 3.4.7 succeeds in breaking the 4GB barrier, but it seems to corrupt the stream since the PS interpreter chokes about offending commands 
that seem to be raw data. Looking at the amount of pages that it prints it looks like its around the 2GB area. Could be a faulty offset  calculation with a signed number.
For 3.5.x I only checked that it succeeded in transferring more than 4GB, so I can not confirm the 3.5.x fix too. I try the 3.5.x version and come back to you.
Comment 28 Jeremy Allison 2010-03-26 11:23:18 UTC
Ok, I'm putting this on hold until we fix the source of the file corruption. I'm going to revert the "always write at EOF offset" changes to master until we know what's wrong. Can you test the patch on 3.5.x please ? I'm guessing it will also fail due to the 2GB corruption but it would help a lot to know.

Jeremy.
Comment 29 Jeremy Allison 2010-03-26 11:35:20 UTC
What would be really helpful is a wireshark trace - only capturing the packet SMB headers so it's not too huge - that allows me to track the offsets + write_amount sent on each print write. That way I can determine the algorithm that Windows uses when the offset wraps at 32-bits.

Jeremy.
Comment 30 Sebastian Kloska 2010-03-26 19:13:40 UTC
I had a look at the 3.5.x patch and it seems to work. No data stream corruption At least none that killed the PS interpreter. I'm just pulling the faulty spool file generated under 3.4.x and compare them with the version from 3.5.x to see where in the stream the difference appears. A new filtered wire shark protocol might take some time, but the screen shot I've uploaded gives an overview. These are to packages exactly on the 4GB border. Before that the client transferred data chunks of 16583 bytes and places the last chunk exactly on the 4GB limit. There was nothing in the flags or somewhere else which (as far as my minimal knowledge about the SMB protocol goes) distinguished the packages before.

Sebastian


Comment 31 Jeremy Allison 2010-03-26 19:32:12 UTC
Ok I'm really puzzled. The 3.4.x patch is functionally identical to the 3.5.2 patch. Are you sure you applied the 3.4.x patch here ?:

https://bugzilla.samba.org/attachment.cgi?id=5542&action=view

All writes to the print spool file should be being done at EOF with this patch (which is what the 3.5.2 patch does). Any chance you can help me track down the difference in the 3.5.x and 3.4.x versions because right now I just can't see it ?

Jeremy
Comment 32 Sebastian Kloska 2010-03-29 08:30:48 UTC
Ok -- I've double-checked and -- yes -- I did apply the right patch. I dived a little bit into the 3.4.7 code and found out that the windows client IS actually seeking within the print stream. That seems to happen more often if a file to  print is already available e.g. printing via

 copy file \\server\printer

but it also happens from the print spooler. You may easily validate it by printing out the startpos in reply_write. I don't have the slightest idea why it seemed to work with the 3.5.x patch. I attach  a patch against a clean 3.4.7  which also includes your stuff. I've only worked on reply_write not reply_writeraw. The basic idea is that although the client may seek around it may only do that within the current 4GB chunk. So we check via fstat64 how large the current file is and interprede startpos as an offset to the current 4GB chunk. This is all very dirty code since I'm not aware about the samba coding style. But you may use it for tseting. It would be nice if someone else could validate this approach.

Sebastian
  
Comment 33 Sebastian Kloska 2010-03-29 08:33:32 UTC
Created attachment 5558 [details]
patch against samba 3.4.7 that tries to solve the 4GB problem via offset calculation relative to the current 4GB chunk
Comment 34 Sebastian Kloska 2010-03-29 14:54:48 UTC
right now I've only tested it with Vista and successfully transferred 15GB -- no corruption at all. I'll have a look at Win7 tomorrow

Sebastian

Comment 35 Jeremy Allison 2010-03-29 16:34:11 UTC
Hmmm. This is a really interesting fix.... I'll look into generating a new patch containing this. Thanks a *lot* for your help !

Jeremy.


Comment 36 Jeremy Allison 2010-03-29 18:30:43 UTC
Created attachment 5563 [details]
git-am fix for 3.5.2.

Latest version of this patch for 3.5.x. Sebastian can you please test ? I'll also create a 3.4.x version as well.

Thanks for all the help on this.

Jeremy.
Comment 37 Jeremy Allison 2010-03-29 18:41:27 UTC
Created attachment 5564 [details]
git-am fix for 3.4.x.

Latest version of this patch for 3.4.x. Sebastian can you please test ?

Thanks for all the help on this.

Jeremy.
Comment 38 Jeremy Allison 2010-03-29 19:40:29 UTC
FYI. I have an idea on how to create an smbtorture test so we can probe Windows behavior on this.

Jeremy.
Comment 39 Sebastian Kloska 2010-03-30 09:07:12 UTC
both patches seem to work find

Sebastian
 

Comment 40 Jeremy Allison 2010-03-30 17:30:02 UTC
Ok, pushed to master. I have a smbtorture test that does not reproduce this with a regular <4gb file - not a print spool file. So I think it's correct to do this only with print files.

I'm not going to recommend this for 3.4.x or 3.5.x at the moment. I want to let it cook a bit in master before recommending applying these patches. Hope that's ok for you guys (you can always hand patch for the time being).

Jeremy.