After months of flawless mirroring, the rsync daemon process stopped working. Some files get downloaded but at the end removed with "failed verification -- update discarded (will try again)" The problems showed-up when I was running a 3.0.2-3.0.7 connection, but is still present with 3.0.8 on both sides (which is used below). The files are a few hundred megabytes each and do not change nor have disk read errors. Some files work, some don't... it might be something related to a pattern within the file or the filename. Running the process with -vvv shows this: got file_sum recv_generator(2882ns_envisat_isabel/ASA_IMS_1PNDPA20060528_190038_000000162048_00085_22183_1026.N1,131) recv_files(2882ns_envisat_isabel/ASA_IMS_1PNDPA20061015_190042_000000162052_00085_24187_1024.N1) 2882ns_envisat_isabel/ASA_IMS_1PNDPA20061015_190042_000000162052_00085_24187_1024.N1 got file_sum recv_generator(2882ns_envisat_isabel/ASA_IMS_1PNDPA20061015_190042_000000162052_00085_24187_1024.N1,132) [receiver] _exit_cleanup(code=2, file=rsync.c, line=652): about to call exit(2) [generator] _exit_cleanup(code=12, file=io.c, line=601): about to call exit(12) WARNING: 2882ns_envisat_isabel/ASA_IMS_1PNDPA20060528_190038_000000162048_00085_22183_1026.N1 failed verification -- update discarded (will try again). WARNING: 2882ns_envisat_isabel/ASA_IMS_1PNDPA20061015_190042_000000162052_00085_24187_1024.N1 failed verification -- update discarded (will try again). File-list index 133 not in 960 - 1144 (read_ndx_and_attrs) [receiver] rsync error: protocol incompatibility (code 2) at rsync.c(652) [receiver=3.0.8] rsync: connection unexpectedly closed (998 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(601) [generator=3.0.8] I hope this is sufficient information to find the cause.
This is probably a read error on the sending side. When rsync gets a read error from the OS, it ensures that the checksum won't match the data that was sent so that the file will be discarded by the receiver. Look for earlier errors in the transfer, or errors in the logs.
(In reply to comment #1) > This is probably a read error on the sending side. When rsync gets a read > error from the OS, it ensures that the checksum won't match the data that was > sent so that the file will be discarded by the receiver. Look for earlier > errors in the transfer, or errors in the logs. Read errors would show in dmesg, but there are no read errors. I can also access these files directly without errors. Besides, it is very reproducible. All files are hundreds of megs; it may have something to do with that. Or, with a newer rsync on an older glibc. If you want, I can provide access to the rsyncd.
I have seen this happen when either of the systems has a bad RAM chip. Often this causes files beyond a certain size to checksum incorrectly. I would suggest running memtest86 on both systems (especially if one is not using ECC RAM) just to make sure that there isn't a flaky DIMM causing this issue.
(In reply to comment #3) > I have seen this happen when either of the systems has a bad RAM chip. Often > this causes files beyond a certain size to checksum incorrectly. I would > suggest running memtest86 on both systems (especially if one is not using ECC > RAM) just to make sure that there isn't a flaky DIMM causing this issue. There is a chance that this is causing the problem, although after a reboot the same files showed the problems. Those were huge files, which increases the chance to hit the memory flaw... I will attempt a memory check (bit hard: no physical access to the machine myself) Will take me a few days. Thanks for the hint.