Bug 8666 - --debug=all9 fail
Summary: --debug=all9 fail
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-16 13:08 UTC by Chris Dunlop
Modified: 2013-05-27 00:21 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dunlop 2011-12-16 13:08:56 UTC
There are at least 2 recursive calls when running --debug=all9:

(gdb) bt
#1  0x0000000000434996 in perform_io (needed=60, flags=4) at ../rsync/io.c:608
#2  0x0000000000435b71 in send_msg (code=MSG_INFO, buf=0x7fffa60c7da0 "[receiver] perform_io(60, msgroom) needs to flush 33\n", len=53, convert=0) at ../rsync/io.c:955
#3  0x0000000000429840 in rwrite (code=FINFO, buf=0x7fffa60c7da0 "[receiver] perform_io(60, msgroom) needs to flush 33\n", len=53, is_utf8=0) at ../rsync/log.c:277
#4  0x0000000000429f37 in rprintf (code=FINFO, format=0x46c8c8 "[%s] perform_io(%ld, msgroom) needs to flush %ld\n") at ../rsync/log.c:433
#5  0x0000000000434996 in perform_io (needed=60, flags=4) at ../rsync/io.c:608

...and so on ad-infinitum. If the rprintf() at io.c:608 is patched out, you get this instead:

(gdb) bt
#0  0x0000000000429a11 in rwrite (code=FINFO, buf=0x7fff3006b8e0 "[generator] reduced size of iobuf.msg (-3)\n", len=43, is_utf8=0) at ../rsync/log.c:321
#1  0x0000000000429f37 in rprintf (code=FINFO, format=0x46c6e8 "[%s] reduced size of %s (-%d)\n") at ../rsync/log.c:433
#2  0x00000000004345a5 in reduce_iobuf_size (out=0x6839c0, new_size=32765) at ../rsync/io.c:488
#3  0x0000000000435b46 in send_msg (code=MSG_INFO, buf=0x7fff3006d770 "[generator] reduced size of iobuf.msg (-3)\n", len=43, convert=0) at ../rsync/io.c:964
#4  0x0000000000429a16 in rwrite (code=FINFO, buf=0x7fff3006d770 "[generator] reduced size of iobuf.msg (-3)\n", len=43, is_utf8=0) at ../rsync/log.c:321

...and so on ad-infinitum. If the rprintf() at io.c:488 is patched out you no longer get these recursive debug calls (but might there be others lurking?).

For what it's worth, this was whilst running:

/tmp/rsync-testing --rsync-path=/tmp/rsync-testing -aHXX --numeric-ids --debug=all9 --link-dest=../A --link-dest=../B C/ server:$PWD/C/

...where the directory hierarchies are quite substantial.

Cheers!

Chris.
Comment 1 Chris Dunlop 2011-12-16 13:40:01 UTC
(In reply to comment #0)
> There are at least 2 recursive calls when running --debug=all9:
> 
> (gdb) bt
> #1  0x0000000000434996 in perform_io (needed=60, flags=4) at ../rsync/io.c:608
> #2  0x0000000000435b71 in send_msg (code=MSG_INFO, buf=0x7fffa60c7da0
> "[receiver] perform_io(60, msgroom) needs to flush 33\n", len=53, convert=0) at
> ../rsync/io.c:955
> #3  0x0000000000429840 in rwrite (code=FINFO, buf=0x7fffa60c7da0 "[receiver]
> perform_io(60, msgroom) needs to flush 33\n", len=53, is_utf8=0) at
> ../rsync/log.c:277
> #4  0x0000000000429f37 in rprintf (code=FINFO, format=0x46c8c8 "[%s]
> perform_io(%ld, msgroom) needs to flush %ld\n") at ../rsync/log.c:433
> #5  0x0000000000434996 in perform_io (needed=60, flags=4) at ../rsync/io.c:608
> 
> ...and so on ad-infinitum. If the rprintf() at io.c:608 is patched out, you get
> this instead:
> 
> (gdb) bt
> #0  0x0000000000429a11 in rwrite (code=FINFO, buf=0x7fff3006b8e0 "[generator]
> reduced size of iobuf.msg (-3)\n", len=43, is_utf8=0) at ../rsync/log.c:321
> #1  0x0000000000429f37 in rprintf (code=FINFO, format=0x46c6e8 "[%s] reduced
> size of %s (-%d)\n") at ../rsync/log.c:433
> #2  0x00000000004345a5 in reduce_iobuf_size (out=0x6839c0, new_size=32765) at
> ../rsync/io.c:488
> #3  0x0000000000435b46 in send_msg (code=MSG_INFO, buf=0x7fff3006d770
> "[generator] reduced size of iobuf.msg (-3)\n", len=43, convert=0) at
> ../rsync/io.c:964
> #4  0x0000000000429a16 in rwrite (code=FINFO, buf=0x7fff3006d770 "[generator]
> reduced size of iobuf.msg (-3)\n", len=43, is_utf8=0) at ../rsync/log.c:321
> 
> ...and so on ad-infinitum. If the rprintf() at io.c:488 is patched out you no
> longer get these recursive debug calls (but might there be others lurking?).
>
> For what it's worth, this was whilst running:
> 
> /tmp/rsync-testing --rsync-path=/tmp/rsync-testing -aHXX --numeric-ids
> --debug=all9 --link-dest=../A --link-dest=../B C/ server:$PWD/C/
> 
> ...where the directory hierarchies are quite substantial.

After those calls are patched out but still with --debug=all9, I get a 29K line debug output that ends in:

unexpected tag 95 [sender/inc]
[sender] _exit_cleanup(code=12, file=../rsync/io.c, line=1615): entered
rsync error: error in rsync protocol data stream (code 12) at ../rsync/io.c(1615) [sender=3.1.0dev]
[sender] _exit_cleanup(code=12, file=../rsync/io.c, line=1615): about to call exit(12)

...and the receiving side got only as far as creating the top level directory. Same
thing down to --debug=all4. Even down at --debug=all1 I get an aborted transfer:

[sender] send_msg(0, 65532)
rsync: [sender] write error: Broken pipe (32)
[sender] got msg=2, len=31
[generator] send_msg(0, 22907)
rsync error: error in rsync protocol data stream (code 12) at ../rsync/io.c(823) [sender=3.1.0dev]
[sender] _exit_cleanup(code=10, file=../rsync/io.c, line=823): about to call exit(12)

Looks like something in --debug=all is writing into the io stream?  Without --debug=all the transfer progresses fine.


But that doesn't help me debug my problems with the new --link-by-hash stuff (all previous tests were on unpatched master).  At least I now know most of the problems I thought I was seeing with --link-by-hash were actually because of --debug=all!  Oh well, I guess I'll have to start using explicit debug flags.

Cheers,

Chris
Comment 2 Chris Dunlop 2011-12-17 02:07:13 UTC
Note: This is all with git master a3b62ff4 and link-by-hash.diff from 0b614560, with the two io.c rprintf() calls mentioned previously patched out. I'm pretty sure link-by-hash.diff isn't the cause of the problem.

Running with -vv works, running with -vvv fails with an apparently out of sync IO stream, e.g. this sort of thing showing up:

2011/12/17 12:08:04 [2039] recv_generator(x/x xxxxxxxxxxx xxxxxxxx xxxxxxxx/xx - xxx xxxxx xxxx/xx_xxxxxxx xxxxxxxx/xx xxxxxxxx/(xx-xx) xx xxxxxxxx - xxx x.pdf,408)
2011/12/17 12:08:04 [2039] [generator] receiving flist for dir 1964904705
2011/12/17 12:08:04 [2039] recv_file_name(x/x xxxxxxxxxxx xxxxxxxx xxxxxxxx/xx - xxx xxxxx xxxx/xx_xxxxxxx xxxxxxxx/xx xxxxxxxx/xx xxxxxxxxx xx.rsync.%stat)
2011/12/17 12:08:04 [2039] receive_xattr: xa index 57 out of range for x/x xxxxxxxxxxx xxxxxxxx xxxxxxxx/xx - xxx xxxxx xxxx/xx_xxxxxxx xxxxxxxx/xx xxxxxxxx/xx xxxxxxxxx x^A<B0><D2>I!<B3>^R9>^R<86>^xxx-xxxx) xxx x.pdf^A^eI7T;9>^R<8B>^Mxx) xxx x.pdf^AgaI]P;9>^R<8A>^N125) xxx x.pdf^A<EE>cI<D9>T;9>^R<85>^SBR2-3011)
2011/12/17 12:08:04 [2039] rsync error: error in rsync protocol data stream (code 12) at xattrs.c(669) [generator=3.1.0dev]
2011/12/17 12:08:04 [2039] [generator] _exit_cleanup(code=12, file=xattrs.c, line=669): about to call exit(12)
2011/12/17 12:08:04 [2040] rsync: [receiver] write error: Broken pipe (32)

Note the 'rsync.%stat' in the 3rd line above - the io stream has somehow gotten out of sync such that an xattr is turning up in the file name. I'm also not sure about that very large "flist for dir" on the 2nd line above, that looks out of the ordinary.

I have:

rsync-test --info=help
Options added for each increase in verbose level:
1) COPY,DEL,FLIST,MISC,NAME,STATS,SYMSAFE
2) BACKUP,MISC2,MOUNT,NAME2,REMOVE,SKIP

rsync-test --debug=help
Options added for each increase in verbose level:
2) BIND,CONNECT,CMD,DEL,DELTASUM,DUP,FILTER,FLIST,ICONV
3) ACL,BACKUP,CONNECT2,DEL2,DELTASUM2,EXIT,FILTER2,FLIST2,FUZZY,GENR,OWN,RECV,SEND,TIME
4) CMD2,DEL3,DELTASUM3,EXIT2,FLIST3,ICONV2,OWN2,PROTO,TIME2
5) CHDIR,DELTASUM4,FLIST4,FUZZY2,HASH,HASHLINK,HLINK

...and running with "-vv --debug=ACL,BACKUP,CONNECT2,DEL2,DELTASUM2,EXIT,FILTER2,FLIST2,GENR,SEND" works,
but adding in any of the other level 3 options (FUZZY,OWN,RECV,TIME) makes it stop working.

Note also that it doesn't seem to be entirely consistent as some of the option combinations fail whereas individual components of that combination might work, e.g.:

-vv --debug=FILTER2,FLIST2,FUZZY  fails
-vv --debug=FILTER2,FLIST2        ok
-vv --debug=FUZZY                 ok


Cheers,

Chris.
Comment 3 Wayne Davison 2011-12-21 22:08:18 UTC
I fixed several issues that arose with large amounts of verbosity.  Please give it a try and see if there are remaining issues.
Comment 4 Chris Dunlop 2011-12-29 01:41:48 UTC
Better, in that it gets further, but my test case (which succeeds without debugging) still fails with --debug=all9:

/tmp/rsync-testing --rsync-path=/tmp/rsync-testing -aHXX --numeric-ids --debug=all9 --link-dest=../A --link-dest=../B C/ server:$PWD/C/
...
rsync-testing: ../rsync/hlink.c:132: match_gnums: Assertion `gnum >= hlink_flist->ndx_start' failed.
rsync: connection unexpectedly closed (8144338 bytes received so far) [sender]
[sender] _exit_cleanup(code=12, file=../rsync/io.c, line=224): entered
rsync error: error in rsync protocol data stream (code 12) at ../rsync/io.c(224) [sender=3.1.0dev-9510fa9]
[sender] _exit_cleanup(code=12, file=../rsync/io.c, line=224): about to call exit(12)
Comment 5 Wayne Davison 2013-05-27 00:21:46 UTC
That's some kind of hard-link bug, so the recursion is fixed.