Bug 6606 - cli_read_andx_done() not handling short reads
Summary: cli_read_andx_done() not handling short reads
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: Client Tools (show other bugs)
Version: unspecified
Hardware: Other All
: P3 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
: 6744 (view as bug list)
Depends on:
Reported: 2009-08-04 01:33 UTC by Andrew Tridgell
Modified: 2009-11-30 13:56 UTC (History)
2 users (show)

See Also:

Patch for 3.3 (4.86 KB, patch)
2009-09-18 13:12 UTC, Volker Lendecke
no flags Details
Patch for 3.3 (4.86 KB, patch)
2009-09-18 20:09 UTC, Volker Lendecke
no flags Details
Next attempt for 3.3 (8.38 KB, patch)
2009-09-21 03:06 UTC, Volker Lendecke
no flags Details
Patch for 3.4 (8.29 KB, patch)
2009-10-12 04:25 UTC, Volker Lendecke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Tridgell 2009-08-04 01:33:34 UTC
Hi Volker,

cli_read_andx_done() handles oversized reads, but doesn't handle short reads.
The server is free to return less than asked for (down to mincnt for example)
and may do that especially if the read is larger than 64k. Currently get in 
s3 smbclient will result in a short file if the server returns less than
the full maxcnt number of bytes.

The fix could be a little tricky because of the multiple outstanding calls. I
think you'll need to issue more readx requests for the bytes that are missing.

I've fixed the bug in s4 that resulted in the short reads, but I think
this can affect other servers. I think early windows versions would
return less than 64k if you asked for more than 64k.

Cheers, Tridge
Comment 1 Andrew Tridgell 2009-08-04 21:06:50 UTC
I think the clireadwrite.c code should also handle the case where the server
returns 0 for a readx call larger than 64k. Windows servers do this, and
Samba4 did this until recently. I think the client should check for a zero
return and retry the operation with a read size less than 64k.
Comment 2 Volker Lendecke 2009-09-14 00:15:48 UTC
I think I fixed the case for short reads with 1f34ffa0caae5e3a3ca843.

Tridge, can you check that?


Comment 3 Volker Lendecke 2009-09-18 13:12:56 UTC
Created attachment 4720 [details]
Patch for 3.3

This needs review. It worked fine in my tests, but I'd like the reporter of 6744 to ack this, together with Jeremy.

Comment 4 Volker Lendecke 2009-09-18 13:14:16 UTC
1f34ffa0caae5e3a and 24309bdb2efca3 need to go into 3.4.2 after review. Jeremy, do you have time to review those?


Comment 5 Volker Lendecke 2009-09-18 19:50:55 UTC
Ok, just found that the patch for 3.3 still has a bug. working on it.

Comment 6 Volker Lendecke 2009-09-18 20:09:04 UTC
Created attachment 4722 [details]
Patch for 3.3

Ok, I know how this wrong patch ended up here. I had already fixed that bug, but during my commit I just typed in "git commit --amend" instead of "git commit -a --amend". Sorry for that.

Comment 7 Volker Lendecke 2009-09-19 00:41:14 UTC
Karolin, sorry to say that, but this together with 6744 justifies another 3.2 release. Other opinions?

Comment 8 Volker Lendecke 2009-09-19 10:18:53 UTC
*** Bug 6744 has been marked as a duplicate of this bug. ***
Comment 9 Chris Allen 2009-09-20 12:58:53 UTC
G'day Volker,

After applying the patch, I get a segv (added "-g" and removed "-pie" to get backtrace):

$ LD_LIBRARY_PATH=/var/tmp/sernet-src/samba-3.3.7/source/bin:/usr/lib/debug gdb --args /var/tmp/sernet-src/samba-3.3.7/source/bin/smbclient \\\\galah\\py24test -c 'get test_datetime.py'
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /var/tmp/sernet-src/samba-3.3.7/source/bin/smbclient \\\\galah\\py24test -c get\ test_datetime.py
[Thread debugging using libthread_db enabled]
[New Thread 0xb7acb6d0 (LWP 7931)]
Domain=[MATRIX_SCIENCE] OS=[Windows NT 4.0] Server=[NT LAN Manager 4.0]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7acb6d0 (LWP 7931)]
talloc_get_name (ptr=0x0) at lib/talloc/talloc.c:156
156		if (unlikely((tc->flags & (TALLOC_FLAG_FREE | ~0xF)) != TALLOC_MAGIC)) { 
(gdb) where
#0  talloc_get_name (ptr=0x0) at lib/talloc/talloc.c:156
#1  0x081cc203 in talloc_check_name_abort (ptr=0x0, 
    name=0x83e7fc6 "struct cli_readall_state") at lib/util.c:2993
#2  0x080ed524 in cli_readall_done (subreq=0x8963e20)
    at libsmb/clireadwrite.c:206
#3  0x081a1562 in async_req_done (req=0x1) at lib/async_req.c:80
#4  0x081008e4 in cli_state_handler (event_ctx=0x88eeaa8, event=0x8963c10, 
    flags=1, p=0x8942bc0) at libsmb/async_smb.c:299
#5  0x081ddd3f in run_events (event_ctx=0x88eeaa8, selrtn=1, 
    read_fds=0xbfab699c, write_fds=0xbfab691c) at lib/events.c:255
#6  0x081ddf8a in event_loop_once (ev=0x88eeaa8) at lib/events.c:312
#7  0x080ecfcd in cli_pull (cli=0x8942bc0, fnum=<value optimized out>, 
    start_offset=0, size=133244, window_size=524288, 
    sink=0x809e1d8 <writefile_sink>, priv=0xbfab6b24, received=0xbfab6b08)
    at libsmb/clireadwrite.c:566
#8  0x0809dddd in do_get (rname=0x8963a98 "\\test_datetime.py", 
    lname_in=<value optimized out>, reget=false) at client/client.c:1048
#9  0x0809e1c4 in cmd_get () at client/client.c:1124
#10 0x080a258c in process_command_string (cmd_in=<value optimized out>)
    at client/client.c:4082
#11 0x080a3f2c in main (argc=8, argv=0xbfab7714) at client/client.c:4529

btw, the second version of the attached patch appears to be exactly the same as the first?

$ md5sum attachment.cgi\?id\=472*
8bf9be9d7c900e749101e97973991964  attachment.cgi?id=4720
8bf9be9d7c900e749101e97973991964  attachment.cgi?id=4722

Comment 10 Volker Lendecke 2009-09-21 03:06:04 UTC
Created attachment 4724 [details]
Next attempt for 3.3

Now you see me confused. Can you try again? This patch should fix exactly that segfault.

Comment 11 Chris Allen 2009-09-21 06:12:06 UTC
OK, new patch appears to fix the problem (and still works with Win2K).  Thanks Volker!

$ LD_LIBRARY_PATH=/var/tmp/sernet-src/samba-3.3.7/source/bin /var/tmp/sernet-src/samba-3.3.7/source/bin/smbclient \\\\galah\\py24test -c 'get test_datetime.py'
Domain=[MATRIX_SCIENCE] OS=[Windows NT 4.0] Server=[NT LAN Manager 4.0]
getting file \test_datetime.py of size 133244 as test_datetime.py (9294.3 kb/s) (average 9294.4 kb/s)
$ cksum test_datetime_orig.py test_datetime.py
2759204463 133244 test_datetime_orig.py
2759204463 133244 test_datetime.py
Comment 12 Jeremy Allison 2009-10-08 23:03:01 UTC
Wow, this is complex stuff. My initial concern was with integer wrap in the offset and size calculations in cli_readall_done(), but I think it's ok due to the unsigned type used in cli_read_andx_recv(). Let me just do some tests with it under valgrind tomorrow at work and then I'll pass it over to Karolin for inclusion in 3.3.9.
Thanks !


Comment 13 Jeremy Allison 2009-10-09 12:51:47 UTC
+ 1 on the testing. Re-assigning to Karolin for inclusion in 3.3.9.
Comment 14 Karolin Seeger 2009-10-12 03:45:49 UTC
Pushed to v3-3-test.
Re-assigning to Volker (waiting for a patch for 3.4).
Comment 15 Volker Lendecke 2009-10-12 04:25:36 UTC
Created attachment 4830 [details]
Patch for 3.4
Comment 16 Volker Lendecke 2009-10-12 14:42:48 UTC
The attached patch for 3.4 is pretty much the same as the one for 3.3 (at least I hope so), adapted to the 3.4 async API. Not sure if this needs another round for review. Karo, up to you :-)

Comment 17 Jeremy Allison 2009-10-12 16:07:06 UTC
I only have one question about this, and it's the talloc heirarchy with the short read code. For a normal (full) read we have:

state->buf = buf;

where buf is talloc'ed off the struct cli_request struct which is a talloc child of cli.

In the short read case we have:

state->buf = talloc_array(state, uint8_t, state->size);

where buf is a talloc child of state.

Ah, I think I see - everything gets copied out in the cli_read_sink() call so the difference in talloc parents doesn't matter. Vl can you confirm my understanding, then I'll +1 and re-assign to Karolin for 3.4.3 ?


Comment 18 Volker Lendecke 2009-10-14 03:51:40 UTC
Indeed the talloc hierarchy is a bit different in the short read case. But as you noticed correctly, it is kept until after the sink function was called.

For me it did survive valgrind against a smbd deliberately sending short reads after random timeouts (see the BUILD_FARM_HACKS in aio_fork). So I'd say this piece is definitely ok.

Comment 19 Karolin Seeger 2009-10-14 04:42:55 UTC
Jeremy, can I pick the patch now?
Comment 20 Jeremy Allison 2009-10-14 11:38:10 UTC
Yes please do, sorry for the confusion.
Comment 21 Karolin Seeger 2009-10-15 02:05:29 UTC
Pushed to v3-4-test.
Closing out bug report.