Hi Volker, cli_read_andx_done() handles oversized reads, but doesn't handle short reads. The server is free to return less than asked for (down to mincnt for example) and may do that especially if the read is larger than 64k. Currently get in s3 smbclient will result in a short file if the server returns less than the full maxcnt number of bytes. The fix could be a little tricky because of the multiple outstanding calls. I think you'll need to issue more readx requests for the bytes that are missing. I've fixed the bug in s4 that resulted in the short reads, but I think this can affect other servers. I think early windows versions would return less than 64k if you asked for more than 64k. Cheers, Tridge
I think the clireadwrite.c code should also handle the case where the server returns 0 for a readx call larger than 64k. Windows servers do this, and Samba4 did this until recently. I think the client should check for a zero return and retry the operation with a read size less than 64k.
I think I fixed the case for short reads with 1f34ffa0caae5e3a3ca843. Tridge, can you check that? Thanks, Volker
Created attachment 4720 [details] Patch for 3.3 This needs review. It worked fine in my tests, but I'd like the reporter of 6744 to ack this, together with Jeremy. Volker
1f34ffa0caae5e3a and 24309bdb2efca3 need to go into 3.4.2 after review. Jeremy, do you have time to review those? Thanks, Volker
Ok, just found that the patch for 3.3 still has a bug. working on it. Volker
Created attachment 4722 [details] Patch for 3.3 Ok, I know how this wrong patch ended up here. I had already fixed that bug, but during my commit I just typed in "git commit --amend" instead of "git commit -a --amend". Sorry for that. Volker
Karolin, sorry to say that, but this together with 6744 justifies another 3.2 release. Other opinions? Volker
*** Bug 6744 has been marked as a duplicate of this bug. ***
G'day Volker, After applying the patch, I get a segv (added "-g" and removed "-pie" to get backtrace): $ LD_LIBRARY_PATH=/var/tmp/sernet-src/samba-3.3.7/source/bin:/usr/lib/debug gdb --args /var/tmp/sernet-src/samba-3.3.7/source/bin/smbclient \\\\galah\\py24test -c 'get test_datetime.py' GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... (gdb) run Starting program: /var/tmp/sernet-src/samba-3.3.7/source/bin/smbclient \\\\galah\\py24test -c get\ test_datetime.py [Thread debugging using libthread_db enabled] [New Thread 0xb7acb6d0 (LWP 7931)] Domain=[MATRIX_SCIENCE] OS=[Windows NT 4.0] Server=[NT LAN Manager 4.0] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb7acb6d0 (LWP 7931)] talloc_get_name (ptr=0x0) at lib/talloc/talloc.c:156 156 if (unlikely((tc->flags & (TALLOC_FLAG_FREE | ~0xF)) != TALLOC_MAGIC)) { (gdb) where #0 talloc_get_name (ptr=0x0) at lib/talloc/talloc.c:156 #1 0x081cc203 in talloc_check_name_abort (ptr=0x0, name=0x83e7fc6 "struct cli_readall_state") at lib/util.c:2993 #2 0x080ed524 in cli_readall_done (subreq=0x8963e20) at libsmb/clireadwrite.c:206 #3 0x081a1562 in async_req_done (req=0x1) at lib/async_req.c:80 #4 0x081008e4 in cli_state_handler (event_ctx=0x88eeaa8, event=0x8963c10, flags=1, p=0x8942bc0) at libsmb/async_smb.c:299 #5 0x081ddd3f in run_events (event_ctx=0x88eeaa8, selrtn=1, read_fds=0xbfab699c, write_fds=0xbfab691c) at lib/events.c:255 #6 0x081ddf8a in event_loop_once (ev=0x88eeaa8) at lib/events.c:312 #7 0x080ecfcd in cli_pull (cli=0x8942bc0, fnum=<value optimized out>, start_offset=0, size=133244, window_size=524288, sink=0x809e1d8 <writefile_sink>, priv=0xbfab6b24, received=0xbfab6b08) at libsmb/clireadwrite.c:566 #8 0x0809dddd in do_get (rname=0x8963a98 "\\test_datetime.py", lname_in=<value optimized out>, reget=false) at client/client.c:1048 #9 0x0809e1c4 in cmd_get () at client/client.c:1124 #10 0x080a258c in process_command_string (cmd_in=<value optimized out>) at client/client.c:4082 #11 0x080a3f2c in main (argc=8, argv=0xbfab7714) at client/client.c:4529 (gdb) btw, the second version of the attached patch appears to be exactly the same as the first? $ md5sum attachment.cgi\?id\=472* 8bf9be9d7c900e749101e97973991964 attachment.cgi?id=4720 8bf9be9d7c900e749101e97973991964 attachment.cgi?id=4722 Regards, Chris
Created attachment 4724 [details] Next attempt for 3.3 Now you see me confused. Can you try again? This patch should fix exactly that segfault. Volker
OK, new patch appears to fix the problem (and still works with Win2K). Thanks Volker! $ LD_LIBRARY_PATH=/var/tmp/sernet-src/samba-3.3.7/source/bin /var/tmp/sernet-src/samba-3.3.7/source/bin/smbclient \\\\galah\\py24test -c 'get test_datetime.py' Domain=[MATRIX_SCIENCE] OS=[Windows NT 4.0] Server=[NT LAN Manager 4.0] getting file \test_datetime.py of size 133244 as test_datetime.py (9294.3 kb/s) (average 9294.4 kb/s) $ cksum test_datetime_orig.py test_datetime.py 2759204463 133244 test_datetime_orig.py 2759204463 133244 test_datetime.py
Wow, this is complex stuff. My initial concern was with integer wrap in the offset and size calculations in cli_readall_done(), but I think it's ok due to the unsigned type used in cli_read_andx_recv(). Let me just do some tests with it under valgrind tomorrow at work and then I'll pass it over to Karolin for inclusion in 3.3.9. Thanks ! Jeremy.
+ 1 on the testing. Re-assigning to Karolin for inclusion in 3.3.9. Jeremy.
Pushed to v3-3-test. Re-assigning to Volker (waiting for a patch for 3.4).
Created attachment 4830 [details] Patch for 3.4
The attached patch for 3.4 is pretty much the same as the one for 3.3 (at least I hope so), adapted to the 3.4 async API. Not sure if this needs another round for review. Karo, up to you :-) Volker
I only have one question about this, and it's the talloc heirarchy with the short read code. For a normal (full) read we have: state->buf = buf; where buf is talloc'ed off the struct cli_request struct which is a talloc child of cli. In the short read case we have: state->buf = talloc_array(state, uint8_t, state->size); where buf is a talloc child of state. Ah, I think I see - everything gets copied out in the cli_read_sink() call so the difference in talloc parents doesn't matter. Vl can you confirm my understanding, then I'll +1 and re-assign to Karolin for 3.4.3 ? Jeremy.
Indeed the talloc hierarchy is a bit different in the short read case. But as you noticed correctly, it is kept until after the sink function was called. For me it did survive valgrind against a smbd deliberately sending short reads after random timeouts (see the BUILD_FARM_HACKS in aio_fork). So I'd say this piece is definitely ok. Volker
Jeremy, can I pick the patch now?
Yes please do, sorry for the confusion. Jeremy.
Pushed to v3-4-test. Closing out bug report. Thanks!