The Samba-Bugzilla – Bug 7454
assertion failed in finish_hard_link()
Last modified: 2011-06-15 19:26:28 UTC
I am rsyncing a localbackup to a remote machine. Local machine is running Debian stable on amd64, remoet is running FreeBSD on i386. Both machines have rsync 3.07.
I consistently get the following error after a few minutes rsyncing:
| Assertion failed: (node != NULL && node->data != NULL), function finish_hard_link, file hlink.c, line 536.
| rsync: connection unexpectedly closed (1544 bytes received so far) [sender]
| rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7]
Sometimes this happens after 4 minutes, sometimes after 25 minutes, and never reproducibly on the same file.
Please let me know if you need some further information to debug this.
I have never seen this bug triggered, so if you can distill things down to a set of sharable files that will demonstrate the bug, that would be great. Also, what are your command-line options?
Another way to potentially help is to upgrade both sides to 3.1.0dev (via the git repo or the latest nightly tar file) and use --debug=hlink3 for the transfer. Capture all the output, and if rsync fails, compress it, and email it to me.
Sorry for my long silence on this bug. It's rather tricky to track down, as I can only trigger it with a large tree (approx 1M files) and lots and lots of hardlinks.
I can still reliably reproduce it though, also with a recent git version of rsync, both when transferring linux-to-linux and linux-to-freebsd.
I'll let you know when I can reproduce it with a generated tree of files.
I have the same issue, only triggered with a large tree with lots of hard links. One side is Linux (QNAP NAS) and the other side is OS X both running 3.0.7.
(In reply to comment #4)
> I have the same issue, only triggered with a large tree with lots of hard
> links. One side is Linux (QNAP NAS) and the other side is OS X both running
What are the options that you are using? Are you using --delete without --owner (-o, implied by -a), perchance?
I am indeed. I've als,o noticed that the bug doesn't occur when using -a
I noticed I haven't reported the command line I'm using yet. Here it is:
rsync -rlHpztqP --delete --bwlimit=90 --exclude '/*/foo/' --exclude '/*/bar/' --exclude '.ssh/id*' --exclude '.gnupg/secring.gpg' daily.* weekly.* monthly.* user@remotehost:
Thank goodness! I had been totally stumped as to how this bug could be happening beyond data corruption occurring, and it would appear that both of you are being bitten by a corruption bug that can occur with an incremental-recursion transfer when --delete is used but --owner is not active. This bug was just recently identified and fixed, and will be released in 3.0.8.
I'm going to go out on a limb and say that this bug is also now fixed (in git). If someone tests the git code and disagrees, we can open this back up (but I'm doubtful that is the case).
I think it is indeed fixed in latest git. I started a run this afternoon, and it is still running after 4 hours, whereas it would abort within ten minutes before.
I am curious how this issue solve, so I check git log to find the diff.
Is this issue solved in following commitment?
Author: Matt McCutchen <email@example.com>
Date: Sat Jan 29 19:25:53 2011 -0800
Avoid changing file_extra_cnt during deletion.
Thanks in advance.
Yes, 57edc48 was the 3.1.0dev commit on the master branch. The commits on the b3.0.x branch were more contorted, since I first fixed the bug one way, and then changed to use Matt's idiom. You can see that set of changes as a single diff if you do this:
git diff c825514 83b94ef