Bug 12717 - Printing from Windows to Samba very slow, while copying is fast
Summary: Printing from Windows to Samba very slow, while copying is fast
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Printing (show other bugs)
Version: 4.5.7
Hardware: All Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: printing-maintainers
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-23 22:26 UTC by Alex K
Modified: 2017-06-22 15:37 UTC (History)
0 users

See Also:


Attachments
server-machine.dump.zip (3.87 MB, application/zip)
2017-04-12 16:18 UTC, Alex K
no flags Details
printing.dump.zip (4.16 MB, application/zip)
2017-04-28 17:50 UTC, Alex K
no flags Details
copying.dump.zip (4.35 MB, application/zip)
2017-04-28 17:57 UTC, Alex K
no flags Details
request latency stats across print job (38.19 KB, image/png)
2017-04-28 22:18 UTC, David Disseldorp
no flags Details
request latency stats across copy (4.35 MB, application/zip)
2017-04-28 22:18 UTC, David Disseldorp
no flags Details
request latency stats across copy (fixed) (46.08 KB, image/png)
2017-04-28 22:30 UTC, David Disseldorp
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex K 2017-03-23 22:26:42 UTC
I have Samba 4.5.7 running on Ubuntu 14.04 x64 print servers, with Windows 7 and Windows 10 being their printing clients. 

There seem to be an issue with Windows spooling a print job to Samba. The actual spooling process is extremely slow: 50kbps or so. Due to that, 10Mb PostScript job takes 5+ minutes to print. 

At the same time, if I simply copy a 10Mb file from Windows to Samba's print$ share, it gets copied right away, instantly. 

Debugging it with log level 10 I noticed that api_rpcTNP complains about every segment of the spooled document:

rpc input buffer underflow (parse error?)

Looking at the log, I see that Samba is receiving the file in 256Kb segments, and complains about each segment. For some reason, first segment is only 78Kb. 

For my 15Mb testing print job it made ~60 segments, and there is parse error after each one of them. According to the source code, the error would appear when pkt->frag_length is less than p->in_data.data.length. I'm not a developer and not sure, but it feels like Windows is passing wrong size of the packet fragment? 

My smb.conf limits max protocol version to SMB2.

Here is the smallest log excerpt that I can paste: 

[2017/03/22 11:57:03.536591,  5, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/srv_pipe.c:1404(api_pipe_request)
  Requested spoolss rpc service
[2017/03/22 11:57:03.536607,  4, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/srv_pipe.c:1429(api_rpcTNP)
  api_rpcTNP: spoolss op 0x13 - api_rpcTNP: rpc command: SPOOLSS_WRITEPRINTER
[2017/03/22 11:57:03.536627,  6, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/srv_pipe.c:1469(api_rpcTNP)
  api_rpc_cmds[19].fn == 0x7f8d3e1f00f0
[2017/03/22 11:57:03.536679,  1, pid=21933, effective(54604, 5000), real(54604, 0)] ../librpc/ndr/ndr.c:450(ndr_print_function_debug)
       spoolss_WritePrinter: struct spoolss_WritePrinter
          in: struct spoolss_WritePrinter
              handle                   : *
                  handle: struct policy_handle
                      handle_type              : 0x00000000 (0)
                      uuid                     : 00000010-0000-0000-d258-fdc8ad550000
              data                     : DATA_BLOB length=75949
  [0000] 25 21 50 53 2D 41 64 6F   62 65 2D 33 2E 30 0A 25   %!PS-Ado be-3.0.%
  [0010] 63 75 70 73 4A 6F 62 54   69 63 6B 65 74 3A 20 6A   cupsJobT icket: j
  [0020] 6F 62 2D 68 6F 6C 64 2D   75 6E 74 69 6C 3D 6E 6F   ob-hold- until=no
<snip 0x1280 lines>
  [12870] 31 20 5D 20 64 65 66 20   0D 0A 25 25 45 6E 64 53   1 ] def  ..%%EndS
  [12880] 65 74 75 70 0D 0A 0D 0A   75 73 65 72 64 69 63 74   etup.... userdict
  [12890] 20 62 65 67 69 6E 20 2F   65 68 73 61 76 65 20 73    begin / ehsave s
  [128A0] 61 76 65 20 64 65 66 20   65 6E 64 0D 0A            ave def  end..
              _data_size               : 0x000128ad (75949)

[2017/03/22 11:57:03.734747,  6, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/rpc_handles.c:339(find_policy_by_hnd_internal)
  Found policy hnd[1] [0000] 00 00 00 00 10 00 00 00   00 00 00 00 D2 58 FD C8   ........ .....X..
  [0010] AD 55 00 00                                        .U.. 
[2017/03/22 11:57:03.734802,  6, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/rpc_handles.c:339(find_policy_by_hnd_internal)
  Found policy hnd[1] [0000] 00 00 00 00 10 00 00 00   00 00 00 00 D2 58 FD C8   ........ .....X..
  [0010] AD 55 00 00                                        .U.. 
[2017/03/22 11:57:03.734859,  4, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/spoolss/srv_spoolss_nt.c:490(get_printer_snum)
  short name:dalek-color
[2017/03/22 11:57:03.734890, 10, pid=21933, effective(54604, 5000), real(54604, 0)] ../source3/printing/printing.c:469(print_job_find)
  print_job_find: looking up job 1 for share dalek-color
[2017/03/22 11:57:03.734930, 10, pid=21933, effective(54604, 5000), real(54604, 0)] ../source3/printing/printing.c:497(print_job_find)
  print_job_find: returning system job -1 for jobid 1.
[2017/03/22 11:57:03.735092,  5, pid=21933, effective(54604, 5000), real(54604, 0)] ../source3/printing/notify.c:327(send_spoolss_notify2_msg)
  send_spoolss_notify2_msg: appending message 0x01/0x16 for printer dalek-color to notify_queue_head
[2017/03/22 11:57:03.735136,  1, pid=21933, effective(54604, 5000), real(54604, 0)] ../librpc/ndr/ndr.c:450(ndr_print_function_debug)
       spoolss_WritePrinter: struct spoolss_WritePrinter
          out: struct spoolss_WritePrinter
              num_written              : *
                  num_written              : 0x000128ad (75949)
              result                   : WERR_OK
[2017/03/22 11:57:03.735233,  5, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/srv_pipe.c:1496(api_rpcTNP)
  api_rpcTNP: called spoolss successfully
[2017/03/22 11:57:03.735261, 10, pid=21933, effective(54604, 5000), real(54604, 0), class=rpc_srv] ../source3/rpc_server/srv_pipe.c:1501(api_rpcTNP)
  api_rpcTNP: rpc input buffer underflow (parse error?)
[2017/03/22 11:57:03.735278, 10, pid=21933, effective(54604, 5000), real(54604, 0)] ../lib/util/util.c:555(dump_data)
  [0000] 76 0D 0A 63 76 69 7B 6D   6F 76 65 74 6F 20 73 68   v..cvi{m oveto sh
  [0010] 6F 77 7D 72 65 70 65 61   74 20 73 68 6F 77 70 61   ow}repea t showpa
  [0020] 67 65 20 71 75 69 74 7D   7B 63 6C 65 61 72 74 6F   ge quit} {clearto
<snip 0x11A0 lines>
  [11A00] 31 20 73 65 74 6C 69 6E   65 6A 6F 69 6E 0D 0A 2F   1 setlin ejoin../
  [11A10] 6D 79 73 65 74 75 70 20   5B 20 30 20 2D 37 32 20   mysetup  [ 0 -72 
  [11A20] 36 30 30 20 56 20 2D 37   32 20 36 30 30 20 56 20   600 V -7 2 600 V 
  [11A30] 30 20 36 30 30 2E 30 30   30 39 31 20 37 38 30 2E   0 600.00 091 780.
  [11A40] 30 30 30 39 31 20 5D 20   64 65 66 20 0D 0A 25 25   00091 ]  def ..%%
  [11A50] 45 6E 64 53 65 74 75 70   0D 0A 0D 0A 75 73 65 72   EndSetup ....user
  [11A60] 64 69 63 74 20 62 65 67   69 6E 20 2F 65 68 73 61   dict beg in /ehsa
  [11A70] 76 65 20 73 61 76 65 20   64 65 66 20 65 6E 64 0D   ve save  def end.
  [11A80] 0A 00 00 00 AD 28 01 00                             .....(.. 
[2017/03/22 11:57:03.887327,  4, pid=21933, effective(54604, 5000), real(54604, 0)] ../source3/smbd/sec_ctx.c:439(pop_sec_ctx)
  pop_sec_ctx (54604, 5000) - sec_ctx_stack_ndx = 0
[2017/03/22 11:57:03.887375,  1, pid=21933, effective(54604, 5000), real(54604, 0)] ../librpc/ndr/ndr.c:413(ndr_print_debug)
       &r: struct ncacn_packet
          rpc_vers                 : 0x05 (5)
          rpc_vers_minor           : 0x00 (0)
          ptype                    : DCERPC_PKT_RESPONSE (2)
          pfc_flags                : 0x03 (3)
                 1: DCERPC_PFC_FLAG_FIRST    
                 1: DCERPC_PFC_FLAG_LAST     
                 0: DCERPC_PFC_FLAG_PENDING_CANCEL_OR_HDR_SIGNING
                 0: DCERPC_PFC_FLAG_CONC_MPX 
                 0: DCERPC_PFC_FLAG_DID_NOT_EXECUTE
                 0: DCERPC_PFC_FLAG_MAYBE    
                 0: DCERPC_PFC_FLAG_OBJECT_UUID
          drep: ARRAY(4)
              [0]                      : 0x10 (16)
              [1]                      : 0x00 (0)
              [2]                      : 0x00 (0)
              [3]                      : 0x00 (0)
          frag_length              : 0x0020 (32)
          auth_length              : 0x0000 (0)
          call_id                  : 0x00000005 (5)
          u                        : union dcerpc_payload(case 2)
          response: struct dcerpc_response
              alloc_hint               : 0x00000008 (8)
              context_id               : 0x0000 (0)
              cancel_count             : 0x00 (0)
              _pad                     : DATA_BLOB length=0
              stub_and_verifier        : DATA_BLOB length=8
  [0000] AD 28 01 00 00 00 00 00                             .(...... 
[2017/03/22 11:57:03.887566, 10, pid=21933, effective(54604, 5000), real(54604, 0)] ../source3/rpc_server/rpc_server.c:509(named_pipe_packet_process)
  Sending 1 fragments in a total of 8 bytes


I tested this also with Samba 4.3.12, and with Windows 7 and Windows 10, no difference. I hope you can reproduce it in your environment or maybe advise on which logs I should submit and how. Files of log level 10 size are huge. 

testparm -sv

# Global parameters
[global]
	bind interfaces only = No
	config backend = file
	dos charset = CP850
	enable core files = Yes
	interfaces = 
	multicast dns register = Yes
	netbios aliases = 
	netbios name = PRINT-SANE2-HOT
	netbios scope = 
	realm = AD.CORP.ACME.COM
	server services = s3fs, rpc, nbt, wrepl, ldap, cldap, kdc, drepl, winbindd, ntp_signd, kcc, dnsupdate, dns
	server string = Samba 4.5.6-Debian
	share backend = classic
	unix charset = UTF-8
	workgroup = ACME
	browse list = Yes
	domain master = No
	enhanced browsing = Yes
	lm announce = Auto
	lm interval = 60
	local master = No
	os level = 20
	preferred master = Auto
	allow dns updates = secure only
	dns forwarder = 
	dns update command = /usr/sbin/samba_dnsupdate
	machine password timeout = 604800
	nsupdate command = /usr/bin/nsupdate -g
	rndc command = /usr/sbin/rndc
	spn update command = /usr/sbin/samba_spnupdate
	mangle prefix = 1
	mangling method = hash2
	max stat cache size = 256
	stat cache = Yes
	client ldap sasl wrapping = sign
	ldap admin dn = 
	ldap connection timeout = 2
	ldap delete dn = No
	ldap deref = auto
	ldap follow referral = Auto
	ldap group suffix = 
	ldap idmap suffix = 
	ldap machine suffix = 
	ldap page size = 1000
	ldap passwd sync = no
	ldap replication sleep = 1000
	ldap server require strong auth = allow_sasl_over_tls
	ldap ssl = start tls
	ldap ssl ads = No
	ldap suffix = 
	ldap timeout = 15
	ldap user suffix = 
	lock spin time = 200
	oplock break wait time = 0
	smb2 leases = Yes
	debug class = No
	debug hires timestamp = Yes
	debug pid = No
	debug prefix timestamp = No
	debug uid = No
	ldap debug level = 0
	ldap debug threshold = 10
	log file = 
	logging = 
	log level = 2
	max log size = 20000
	syslog = 1
	syslog only = No
	timestamp logs = Yes
	abort shutdown script = 
	add group script = 
	add machine script = 
	add user script = 
	add user to group script = 
	allow nt4 crypto = No
	delete group script = 
	delete user from group script = 
	delete user script = 
	domain logons = No
	enable privileges = Yes
	init logon delay = 100
	init logon delayed hosts = 
	logon drive = 
	logon home = \\%N\%U
	logon path = \\%N\%U\profile
	logon script = 
	reject md5 clients = No
	set primary group script = 
	shutdown script = 
	add share command = 
	afs token lifetime = 604800
	afs username map = 
	allow insecure wide links = No
	async smb echo handler = No
	auto services = 
	cache directory = /var/cache/samba
	change notify = Yes
	change share command = 
	cluster addresses = 
	clustering = No
	config file = 
	ctdbd socket = 
	ctdb locktime warn threshold = 0
	ctdb timeout = 0
	default service = 
	delete share command = 
	homedir map = auto.home
	kernel change notify = Yes
	lock directory = /var/run/samba
	log writeable files on exit = No
	message command = 
	nbt client socket address = 0.0.0.0
	ncalrpc dir = /var/run/samba/ncalrpc
	NIS homedir = No
	nmbd bind explicit broadcast = Yes
	panic action = 
	perfcount module = 
	pid directory = /var/run/samba
	registry shares = No
	remote announce = 
	remote browse sync = 
	reset on zero vc = No
	smbd profiling level = off
	state directory = /var/lib/samba
	usershare allow guests = No
	usershare max shares = 100
	usershare owner only = Yes
	usershare path = /var/lib/samba/usershares
	usershare prefix allow list = 
	usershare prefix deny list = 
	usershare template share = 
	utmp = No
	utmp directory = 
	wtmp directory = 
	addport command = 
	addprinter command = 
	cups connection timeout = 30
	cups encrypt = No
	cups server = localhost:631
	deleteprinter command = 
	disable spoolss = No
	enumports command = 
	iprint server = 
	load printers = Yes
	lpq cache time = 30
	os2 driver map = 
	printcap cache time = 60
	printcap name = cups
	show add printer wizard = Yes
	cldap port = 389
	client ipc max protocol = default
	client ipc min protocol = default
	client max protocol = default
	client min protocol = CORE
	client use spnego = Yes
	dcerpc endpoint servers = epmapper, wkssvc, rpcecho, samr, netlogon, lsarpc, drsuapi, dssetup, unixinfo, browser, eventlog6, backupkey, dnsserver
	defer sharing violations = Yes
	dgram port = 138
	disable netbios = Yes
	enable asu support = No
	eventlog list = 
	large readwrite = Yes
	max mux = 50
	max ttl = 259200
	max wins ttl = 518400
	max xmit = 16644
	min receivefile size = 0
	min wins ttl = 21600
	name resolve order = lmhosts wins host bcast
	nbt port = 137
	nt pipe support = Yes
	nt status support = Yes
	read raw = Yes
	rpc big endian = No
	server max protocol = SMB2
	server min protocol = LANMAN1
	server multi channel support = No
	smb2 max credits = 8192
	smb2 max read = 8388608
	smb2 max trans = 8388608
	smb2 max write = 8388608
	smb ports = 445 139
	svcctl list = 
	time server = No
	unicode = Yes
	unix extensions = No
	use spnego = Yes
	web port = 901
	write raw = Yes
	algorithmic rid base = 1000
	allow dcerpc auth level connect = No
	allow trusted domains = Yes
	auth methods = 
	check password script = 
	client ipc signing = default
	client lanman auth = No
	client NTLMv2 auth = Yes
	client plaintext auth = No
	client schannel = Auto
	client signing = if_required
	client use spnego principal = No
	dedicated keytab file = 
	encrypt passwords = Yes
	guest account = nobody
	kerberos method = default
	kpasswd port = 464
	krb5 port = 88
	lanman auth = No
	log nt token command = 
	map to guest = Bad Uid
	map untrusted to domain = No
	ntlm auth = No
	ntp signd socket directory = /var/lib/samba/ntp_signd
	null passwords = No
	obey pam restrictions = No
	old password allowed period = 60
	pam password change = No
	passdb backend = tdbsam
	passdb expand explicit = No
	passwd chat = *new*password* %n\n *new*password* %n\n *changed*
	passwd chat debug = No
	passwd chat timeout = 2
	passwd program = 
	password hash gpg key ids = 
	password server = *
	preload modules = 
	private dir = /var/lib/samba/private
	raw NTLMv2 auth = No
	rename user script = 
	restrict anonymous = 2
	root directory = 
	samba kcc command = /usr/sbin/samba_kcc
	security = ADS
	server role = auto
	server schannel = Auto
	server signing = if_required
	smb passwd file = /etc/samba/smbpasswd
	tls cafile = tls/ca.pem
	tls certfile = tls/cert.pem
	tls crlfile = 
	tls dh params file = 
	tls enabled = Yes
	tls keyfile = tls/key.pem
	tls priority = NORMAL:-VERS-SSL3.0
	tls verify peer = as_strict_as_possible
	unix password sync = No
	username level = 0
	username map = 
	username map cache time = 0
	username map script = 
	aio max threads = 100
	deadtime = 15
	getwd cache = Yes
	hostname lookups = No
	keepalive = 300
	max disk size = 0
	max open files = 16424
	max smbd processes = 0
	name cache timeout = 660
	socket options = TCP_NODELAY
	use mmap = Yes
	get quota command = 
	host msdfs = Yes
	set quota command = 
	create krb5 conf = Yes
	idmap backend = tdb
	idmap cache time = 604800
	idmap gid = 
	idmap negative cache time = 120
	idmap uid = 
	include system krb5 conf = Yes
	neutralize nt4 emulation = No
	reject md5 servers = No
	require strong key = Yes
	template homedir = /home/%D/%U
	template shell = /bin/false
	winbind cache time = 300
	winbindd privileged socket directory = /var/lib/samba/winbindd_privileged
	winbindd socket directory = /var/run/samba/winbindd
	winbind enum groups = No
	winbind enum users = No
	winbind expand groups = 0
	winbind max clients = 200
	winbind max domain connections = 1
	winbind nested groups = Yes
	winbind normalize names = No
	winbind nss info = template
	winbind offline logon = No
	winbind reconnect delay = 30
	winbind refresh tickets = No
	winbind request timeout = 60
	winbind rpc only = No
	winbind sealed pipes = Yes
	winbind separator = \
	winbind trusted domains only = No
	winbind use default domain = No
	dns proxy = Yes
	wins hook = 
	wins proxy = No
	wins server = 
	wins support = No
	spoolssd:prefork_child_min_life = 120
	spoolssd:prefork_min_children = 5
	rpc_daemon:spoolssd = fork
	rpc_server:spoolss = external
	spoolss: architecture = Windows x64
	idmap config * : backend = tdb
	comment = 
	path = 
	administrative share = No
	browseable = Yes
	case sensitive = Auto
	default case = lower
	delete veto files = No
	hide dot files = Yes
	hide files = 
	hide special files = No
	hide unreadable = No
	hide unwriteable files = No
	mangled names = Yes
	mangling char = ~
	map archive = Yes
	map hidden = No
	map readonly = yes
	map system = No
	preserve case = Yes
	short preserve case = Yes
	store dos attributes = No
	veto files = 
	veto oplock files = 
	blocking locks = Yes
	csc policy = manual
	fake oplocks = Yes
	kernel oplocks = No
	kernel share modes = Yes
	level2 oplocks = Yes
	locking = Yes
	oplock contention limit = 2
	oplocks = Yes
	posix locking = Yes
	strict locking = Auto
	afs share = No
	available = Yes
	copy = 
	delete readonly = No
	dfree cache time = 0
	dfree command = 
	directory name cache size = 100
	dmapi support = No
	dont descend = 
	dos filemode = No
	dos filetime resolution = No
	dos filetimes = Yes
	fake directory create times = No
	follow symlinks = Yes
	fstype = NTFS
	include = /etc/samba/print.conf
	magic output = 
	magic script = 
	postexec = 
	preexec = 
	preexec close = No
	root postexec = 
	root preexec = /usr/bin/renice +18 -p %d
	root preexec close = No
	spotlight = No
	volume = 
	wide links = No
	cups options = raw
	default devmode = Yes
	force printername = Yes
	lppause command = 
	lpq command = %p
	lpresume command = 
	lprm command = 
	max print jobs = 1000
	max reported print jobs = 0
	printable = No
	print command = 
	printer name = 
	printing = cups
	printjob username = %U
	print notify backchannel = No
	queuepause command = 
	queueresume command = 
	use client driver = No
	acl allow execute always = No
	acl check permissions = Yes
	acl map full control = Yes
	durable handles = Yes
	ea support = No
	map acl inherit = No
	nt acl support = Yes
	profile acls = No
	access based share enum = No
	acl group control = No
	admin users = "@domain admins"
	create mask = 0744
	directory mask = 0755
	force create mode = 0000
	force directory mode = 0000
	force group = 
	force unknown acl user = No
	force user = 
	guest ok = No
	guest only = No
	hosts allow = 
	hosts deny = 
	inherit acls = Yes
	inherit owner = No
	inherit permissions = No
	invalid users = 
	read list = 
	read only = Yes
	smb encrypt = default
	valid users = 
	write list = 
	aio read size = 0
	aio write behind = 
	aio write size = 0
	allocation roundup size = 1048576
	block size = 1024
	max connections = 0
	min print space = 0
	strict allocate = No
	strict rename = No
	strict sync = No
	sync always = No
	use sendfile = No
	write cache size = 0
	msdfs proxy = 
	msdfs root = No
	msdfs shuffle referrals = No
	ntvfs handler = unixuid, default
	vfs objects = 


[printers]
	comment = All Printers
	path = /var/spool/samba
	browseable = No
	printable = Yes
	admin users = "@domain admins" 
	guest ok = Yes


[print$]
	comment = Printer Drivers
	path = /var/lib/samba/printers
	admin users = "@domain admins" 
	create mask = 02777
	directory mask = 02777
	force create mode = 02775
	force directory mode = 02775
	write list = "@domain admins" 


[admin$]
	comment = Admin Share
	path = /vol/admin
	browseable = No
	include = /etc/samba/usershare.conf
	guest ok = Yes
	read only = No
Comment 1 David Disseldorp 2017-04-11 16:14:18 UTC
Sorry about the slow response here, Alex. Please provide a network trace captured across the print attempt, and also a separate capture across the manual print$ I/O.
Comment 2 Alex K 2017-04-12 16:18:16 UTC
Created attachment 13151 [details]
server-machine.dump.zip

Here is a TCP dump collected on the server (zipped). I press "Print" at about 15:53, the resulting 7Mb print job finishes spooling by 15:57. 

Please note lots of tiny fragments. 

For comparison, here is a dump of 8Mb file copy operation to the same server, under print$\x64 share. It takes 2 seconds. 

I can't attach it due to 5Mb attach limit, so here is a Drive link:
https://docs.google.com/uc?export=download&id=0B-vV7Qx5rjpELW5hRnVwRDdEbjQ
Comment 3 David Disseldorp 2017-04-12 22:35:47 UTC
Thanks. Will hopefully get a chance to look at this next week.
Comment 4 David Disseldorp 2017-04-28 17:08:40 UTC
(In reply to Alex K from comment #2)
> Created attachment 13151 [details]
> server-machine.dump.zip
> 
> Here is a TCP dump collected on the server (zipped). I press "Print" at
> about 15:53, the resulting 7Mb print job finishes spooling by 15:57. 
> 
> Please note lots of tiny fragments. 
> 
> For comparison, here is a dump of 8Mb file copy operation to the same
> server, under print$\x64 share. It takes 2 seconds. 
> 
> I can't attach it due to 5Mb attach limit, so here is a Drive link:
> https://docs.google.com/uc?export=download&id=0B-vV7Qx5rjpELW5hRnVwRDdEbjQ

Both network traces appear to only capture client->server traffic, so it's difficult to work out what's going on....

One thing that does stand out is, as you've pointed out, under spoolss the print job is being transferred as a batch of 4K requests, whereas the regular copy uses (perhaps pipelined) 64K SMB2 I/Os.

Aside from that there appear to be some delays around the  RpcRemoteFindFirstPrinterChangeNotificationEx requests. You're (gladly) running with "print notify backchannel" disabled, which should ensure that these requests are handled quickly on the server, but perhaps it triggers a slow error path on the client.

Could you please provide captures which include both client->server *and* server->client traffic, this will make it a lot easier to see where the high latencies are occurring.
Comment 5 Alex K 2017-04-28 17:50:36 UTC
Created attachment 13181 [details]
printing.dump.zip

Sorry, don't know how I didn't notice that. 
Here is a dump of a full session. Same file printed, same hosts.
Comment 6 Alex K 2017-04-28 17:57:06 UTC
Created attachment 13182 [details]
copying.dump.zip

Here is a dump of a copy process. I'm copying a 2Mb file so that I could attach it to Bugzilla. It was over in less than 2 seconds.
Comment 7 David Disseldorp 2017-04-28 22:18:33 UTC
Created attachment 13184 [details]
request latency stats across print job
Comment 8 David Disseldorp 2017-04-28 22:18:58 UTC
Created attachment 13185 [details]
request latency stats across copy
Comment 9 David Disseldorp 2017-04-28 22:29:52 UTC
At a high level, the request latency stats appear pretty consistent. There don't appear to be any huge outliers for the print job.
What does stand out is the sheer number of write requests (1940) for the print job. Of those, I count 1884 write requests of size 4280 (includes DCE/RPC + spoolss overhead), which would imply that the client is transferring ~7MB to the print server.

@Andreas / Günther: are your aware of any mechanisms that might be available to tell a spoolss client to use larger print job I/Os?
Comment 10 David Disseldorp 2017-04-28 22:30:51 UTC
Created attachment 13186 [details]
request latency stats across copy (fixed)
Comment 11 Jeremy Allison 2017-04-29 03:53:52 UTC
Oh hang on a minute. If it's transferring the data using DCE/RPC PDU's then 4280 bytes is the max PDU size DCE will use:

In the Samba source we have:

#define TSTREAM_SMBXCLI_NP_MAX_BUF_SIZE 4280

So this is a Windows client restriction - the only way to fix this would be for it to not transfer the print data over DCE/RPC. I don't know how to make it do that.
Comment 12 Alex K 2017-06-22 15:37:53 UTC
I've found some old discussion about this on the jcifs mailing list, but I can't tell if it is related at all or can be used at all. 

https://lists.samba.org/archive/jcifs/2010-April/009202.html , this quote in particular:

> I've dug in and it's looks to me that there is a significant performance
win to using AndX blocks to send large chunks of data.  A printer opened
via OpenPrinterEx() from a Windows system subsequently writes data in
64KB chunks using "Write AndX".  I believe this is a performance win
because the remote system itself is allowed to fill up its own local
buffer and then do a "slow" transaction with a larger chunk of data. 
The amount of data going over the wire is the same, but since heavy
processing is done only after 64KB is sent (vs. about 4KB with normal
Write) there is a net improvement.