Hi. I sent two mails to the list on this date or the day before entitled... Regression: sockets for 3.0.9pre1 ? Rather than include them here, please refer to the list. Due to the nature of this bug seeming to be doing odd things with lengths, devices, etc, I've marked it as a P1 blocker. Feel free to adjust as appropriate. Thanks :)
There's no reason rsync would be doing anything differently from prior versions, so check what's wrong with your system.
Hi :) I replied on list since everything ended up over there first. If you prefer here, I can copy it all in. Thanks.
https://lists.samba.org/archive/rsync/2011-June/026506.html If you want to help solve this, you could do some experiments with a single socket or special file, find the shortest path length at which the problems start to occur, and use a system call tracer to see what is going wrong.
> https://lists.samba.org/archive/rsync/2011-June/026506.html > > If you want to help solve this, you could do some experiments with a single > socket or special file, find the shortest path length at which the problems > start to occur, and use a system call tracer to see what is going wrong. Hi :) That was the first note. I've done more testing since then. In the above spirit of listing email links, here are the rest of them to date... the last one includes the testing you speak of. I'm happy to do more checks, just tell me what you want me to dig up. https://lists.samba.org/archive/rsync/2011-June/026506.html https://lists.samba.org/archive/rsync/2011-June/026507.html https://lists.samba.org/archive/rsync/2011-June/026510.html https://lists.samba.org/archive/rsync/2011-June/026511.html https://lists.samba.org/archive/rsync/2011-June/026513.html https://lists.samba.org/archive/rsync/2011-June/026522.html
Noting exact version... 3.0.8 release tarball, plus this compilation patch to flist.c: http://gitweb.samba.org/?p=rsync.git;a=commitdiff;h=8686d3abba4c5b4baf443871cd19c673056a3392
Created attachment 6626 [details] Add debug output for the rdev major corruption Let's keep the rest of the discussion on this bug. I saw all the messages on the list, but you tried so many different cases that I overlooked the one with the short path where the rdev major number gets corrupted from 9 to 78. Since that case seems relatively simple, let's look at it first. Please apply the attached patch and run the case with -vvv to enable the existing mknod debug output so we can try to see where in the process the corruption occurs. Also see if --no-i-r makes the problem go away.
Hi :) Patch 6397 and 6626, configure, compiles, and exec's ok. Here's the build log... rsync.c:378: warning: unused parameter `fnamecmp' generator.c:216: warning: `desc' might be used uninitialized in this function generator.c:622: warning: unused parameter `fname' generator.c:665: warning: unused parameter `fnamecmp' generator.c:1112: warning: `type' might be used uninitialized in this function sender.c:143: warning: unused parameter `fname' sender.c:143: warning: unused parameter `file' io.c:470: warning: unused parameter `convert' I remade the entire real /dev to make sure nothing crept in over the years, rebooted, still broken. Also ran a few tar's of / | sha1 to make sure that all the kernel stuff is reliably readable, it is. Not sure if the patch is ok, rsync stalls in some different cases and needs ctrl-c. Search stall below. Empty is an empty dir, full is a freshly MAKEDEV'd dir. ### With patch 6397 and 6626, both cases stall. # Additionally, in both cases, --no-i-r stalls. # DEBUG is without --no-i-r. # rsync -nHaxi --delete /dev/ ./empty/ ... [sender] i=4302 /dev zs0 mode=020600 len=0 uid=0 gid=0 flags=0 [sender] i=4303 /dev fd/ mode=040555 len=1024 uid=0 gid=0 flags=4 send_file_list done file list sent ^C[sender] _exit_cleanup(code=20, file=rsync.c, line=549): entered rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [sender=3.0.8] [sender] _exit_cleanup(code=20, file=rsync.c, line=549): about to call exit(20) rsync: writefd_unbuffered failed to write 96 bytes to socket [Receiver]: Broken pipe (32) DEBUG 9: make_file(./fd1.1440): major(tmp_rdev) = 9 DEBUG 1: send_file_entry(fd1.1440): major(rdev) = 9 DEBUG 3: send_file_entry(fd1.1440): XMIT_SAME_RDEV_MAJOR DEBUG 9: make_file(./rfd1.1440): major(tmp_rdev) = 9 DEBUG 1: send_file_entry(rfd1.1440): major(rdev) = 9 DEBUG 6: recv_file_entry(fd1.1440): XMIT_SAME_RDEV_MAJOR, rdev_major was 3 DEBUG 7: recv_file_entry(fd1.1440): after assembly, major(rdev) = 3 DEBUG 8: recv_file_entry(fd1.1440): DEV_MAJOR(F_RDEV_P(file)) = 3 DEBUG 4: recv_file_entry(rfd1.1440): took major(rdev) = 3 from previous file fd1.1440 DEBUG 8: recv_file_entry(rfd1.1440): DEV_MAJOR(F_RDEV_P(file)) = 3 # rsync -nHaxi --delete /dev/ ./full/ ... [sender] i=4302 /dev zs0 mode=020600 len=0 uid=0 gid=0 flags=0 [sender] i=4303 /dev fd/ mode=040555 len=1024 uid=0 gid=0 flags=4 send_file_list done file list sent ^C[sender] _exit_cleanup(code=20, file=rsync.c, line=549): entered rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [sender=3.0.8] [sender] _exit_cleanup(code=20, file=rsync.c, line=549): about to call exit(20) rsync: writefd_unbuffered failed to write 96 bytes to socket [Receiver]: Broken pipe (32) DEBUG 9: make_file(./fd1.1440): major(tmp_rdev) = 9 DEBUG 1: send_file_entry(fd1.1440): major(rdev) = 9 DEBUG 3: send_file_entry(fd1.1440): XMIT_SAME_RDEV_MAJOR DEBUG 9: make_file(./rfd1.1440): major(tmp_rdev) = 9 DEBUG 1: send_file_entry(rfd1.1440): major(rdev) = 9 DEBUG 6: recv_file_entry(fd1.1440): XMIT_SAME_RDEV_MAJOR, rdev_major was 3 DEBUG 7: recv_file_entry(fd1.1440): after assembly, major(rdev) = 3 DEBUG 8: recv_file_entry(fd1.1440): DEV_MAJOR(F_RDEV_P(file)) = 3 DEBUG 4: recv_file_entry(rfd1.1440): took major(rdev) = 3 from previous file fd1.1440 DEBUG 8: recv_file_entry(rfd1.1440): DEV_MAJOR(F_RDEV_P(file)) = 3 ### With only patch 6397, stalls on empty case. # In both cases, --no-i-r does not stall. # rsync -nHaxi --delete /dev/ ./empty/ ... recv_files(fd/8) recv_files(fd/9) ^C[sender] _exit_cleanup(code=20, file=rsync.c, line=549): entered rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [sender=3.0.8] [sender] _exit_cleanup(code=20, file=rsync.c, line=549): about to call exit(20) # rsync -nHaxi --delete /dev/ ./full/ ... recv_files(fd/8) recv_files(fd/9) recv_files phase=1 generate_files phase=2 send_files phase=2 send files finished total: matches=0 hash_hits=0 false_alarms=0 data=0 recv_files phase=2 generate_files phase=3 recv_files finished generate_files finished client_run waiting on 19840 sent 72365 bytes received 18680 bytes 9583.68 bytes/sec total size is 49778 speedup is 0.55 (DRY RUN) [sender] _exit_cleanup(code=0, file=main.c, line=1052): entered [sender] _exit_cleanup(code=0, file=main.c, line=1052): about to call exit(0)
(In reply to comment #7) > Not sure if the patch is ok, rsync stalls in some different cases > and needs ctrl-c. As you can see, all the patch does is add print statements, so the hang must be the result of a timing-dependent bug that was triggered by the addition of the debug output. > # rsync -nHaxi --delete /dev/ ./empty/ > ... > [sender] i=4302 /dev zs0 mode=020600 len=0 uid=0 gid=0 flags=0 > [sender] i=4303 /dev fd/ mode=040555 len=1024 uid=0 gid=0 flags=4 > send_file_list done > file list sent > > ^C[sender] _exit_cleanup(code=20, file=rsync.c, line=549): entered > rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) > [sender=3.0.8] > [sender] _exit_cleanup(code=20, file=rsync.c, line=549): about to call exit(20) > rsync: writefd_unbuffered failed to write 96 bytes to socket [Receiver]: Broken > pipe (32) > > DEBUG 9: make_file(./fd1.1440): major(tmp_rdev) = 9 > DEBUG 1: send_file_entry(fd1.1440): major(rdev) = 9 > DEBUG 3: send_file_entry(fd1.1440): XMIT_SAME_RDEV_MAJOR > DEBUG 9: make_file(./rfd1.1440): major(tmp_rdev) = 9 > DEBUG 1: send_file_entry(rfd1.1440): major(rdev) = 9 > DEBUG 6: recv_file_entry(fd1.1440): XMIT_SAME_RDEV_MAJOR, rdev_major was 3 Well that's interesting, the receiver's rdev_major value got out of sync with the sender's. recv_file_entry::rdev_major is zero-initialized and the only assignment to it is instrumented with "DEBUG 5", so unless you redacted a previous device with major number 3 from the output, the value 3 could only appear via memory corruption. The fastest way to find the problem is probably to use a gdb watchpoint, like this: $ gdb --args rsync -nHaxi --delete /dev/ ./empty/ (gdb) set follow-fork-mode child (gdb) watch recv_file_entry::rdev_major (gdb) run ... (gdb) bt You could also try running rsync under valgrind if convenient.
> assignment to it is instrumented with "DEBUG 5" yes, seems there was such an assignment. i'll have that and at least gdb to post when at keyboard next week. thanks.
Any more info on this?
Way back then... I think I modified the patch a bit, ran against just a few files in a dir, and reconfirmed it. The workproduct is still here somewhere... a bet from memory says don't close this till someone can recheck. Perhaps me when that box gets rebuilt.
I found a bug in the handling of the hard-linked data where the sender sets the rdev_major value for hard-linked devices, but the receiver did not. So, hopefully this fixes the case where the major rdev number was getting set wrong. If there are other issues, either start one or more new bugs on GitHub or add more information here.