Bug 12417 - Sametimes currupted file after disconnect
Summary: Sametimes currupted file after disconnect
Status: RESOLVED WORKSFORME
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.2
Hardware: x64 All
: P1 critical (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-09 13:26 UTC by Anatoly Penkov
Modified: 2017-10-08 16:21 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anatoly Penkov 2016-11-09 13:26:09 UTC
Hello.
Could you help with our bug.

On client host:
rsync  version 3.1.2  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, no xattrs, iconv, symtimes, prealloc

SSH:
OpenSSH_7.2p1, OpenSSL 1.0.2h  3 May 2016

OS Windows 2012 R2 Standard 64-bit, rsync & ssh from Cygwin.

Server side:
rsync  version 3.1.2  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 32-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, no iconv, symtimes, no prealloc, file-flags
SSH:
OpenSSH_7.2p2, OpenSSL 1.0.1s-freebsd  1 Mar 2016

From client start upload to server several files (big - more than 100MB and small – less than 1 mb).
Command for start: rsync.exe -rlDi -z -t --no-h --progress --stats --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r --delay-updates --delete-after --force --ignore-errors -e "'ssh.exe' -vv -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 -F '.ssh/ssh_config' -i '/cygdrive/C/Windows/TEMP/DRSA4A3.tmp' " -f '- *.lock' "/cygdrive/E/LocalStorage/FILE_ PR_DL - 991486649-0/shares/release/pr_dl/" user@server.kaspersky-labs.com:"/data/pr_dl/"

Sometimes, connection is disconnected between server & client, in rsync log we see:
<f.st...... bases/kavkis2015mr1/KSM/corebases.cab
          13104   0%  984.38kB/s    0:00:43  
        4173930   9%  769.51kB/s    0:00:50  
        6444015  14%  365.23kB/s    0:01:40  
        6968303  16%  305.83kB/s    0:01:57  
        7525359  17%  277.18kB/s    0:02:07  
        9098223  21%  254.68kB/s    0:02:13  
       12784760  29%  306.41kB/s    0:01:38  
       14428359  33%  336.06kB/s    0:01:24  
       15024518  34%  306.77kB/s    0:01:31  
       16846020  39%  289.05kB/s    0:01:30  
       17442417  40%  294.01kB/s    0:01:26  
       19663971  45%  310.73kB/s    0:01:15  
       21906082  50%  324.16kB/s    0:01:04  
       23044222  53%  362.87kB/s    0:00:54  
       24256593  56%  330.23kB/s    0:00:56  
       25363707  59%  430.81kB/s    0:00:40  
       26988648  62%  405.92kB/s    0:00:39  
       28323875  65%  353.27kB/s    0:00:41  
       30903891  71%  402.48kB/s    0:00:29  
       31572288  73%  394.50kB/s    0:00:28  
       33110433  77%  313.75kB/s    0:00:31  
       33759195  78%  310.91kB/s    0:00:29  
       34833655  81%  306.10kB/s    0:00:26  
       38549536  89%  445.16kB/s    0:00:09  
       39368552  91%  465.20kB/s    0:00:07  
Connection reset by 193.45.6.4 port 22
rsync: [sender] write error: Connection reset by peer (104)
rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.2]

After this error on disk we can see next files:

/data/pr_dl/bases/kavkis2015mr1/KSM:
total 76888
drwxr-xr-x  2 replicator  replicator       512 Oct 18 09:53 .
drwxr-xr-x  6 replicator  replicator       512 Oct  3 11:48 ..
-rw-------  1 replicator  replicator  35127296 Oct 18 09:55 .corebases.cab.GjiXui <- temp file, on this file has been disconnection
-rw-r--r--  1 replicator  replicator  43010386 Oct 16 23:14 corebases.cab <- old file that was on the disk before rsync start 


On the next rsync start leads to that the remainder of the file is filled, file bases/kavkis2015mr1/KSM/corebases.cab uploaded again, temp file is deleted, rsync log: 
*deleting   bases/kavkis2015mr1/KSM/.corebases.cab.GjiXui

It is correct.

But sometimes we have wrong result, on the disk file is corrupted.

In first time we have disconnect:
<f.st...... bases/kavkis2015/PURE/corebases.cab
        1795048   1%  428.92kB/s    0:04:43  
        7939268   6%  453.70kB/s    0:04:14  
        9072736   7%  318.88kB/s    0:05:58  
        9645170   7%  178.25kB/s    0:10:37  
       11907505   9%  216.62kB/s    0:08:34  
       12923313  10%  249.04kB/s    0:07:23  
       15122888  12%  254.90kB/s    0:07:04  
       15742134  12%  227.61kB/s    0:07:52  
       16275562  13%  201.09kB/s    0:08:52  
       18449974  14%  259.51kB/s    0:06:44  
       19582332  15%  350.01kB/s    0:04:56  
       20104585  16%  352.61kB/s    0:04:52  
       20725326  16%  311.95kB/s    0:05:28  
       21274885  17%  284.67kB/s    0:05:58  
       21789141  17%  261.51kB/s    0:06:28  
       22392629  18%  274.57kB/s    0:06:07  
       22884149  18%  245.17kB/s    0:06:49  
       23441205  19%  192.51kB/s    0:08:38  
       24036929  19%  147.63kB/s    0:11:12  
       25292651  20%  127.34kB/s    0:12:49  
       25869537  20%  137.99kB/s    0:11:46  
       26525253  21%  109.00kB/s    0:14:48  
       27771161  22%   78.16kB/s    0:20:22  
       28362076  22%   69.28kB/s    0:22:50  
       28923203  23%   68.94kB/s    0:22:49  
       29463284  23%   64.13kB/s    0:24:23  
       30669029  24%  104.61kB/s    0:14:45  
       31178786  25%  113.24kB/s    0:13:33  
       31763282  25%  110.37kB/s    0:13:49  
       32863893  26%   50.84kB/s    0:29:39  
       33571017  27%   65.60kB/s    0:22:48  
       34092950  27%   66.84kB/s    0:22:15  
       34603216  28%  148.91kB/s    0:09:55  
       35127504  28%  131.79kB/s    0:11:09  
       35716997  28%  115.68kB/s    0:12:37  
       36311138  29%  112.46kB/s    0:12:53  
       36866342  29%  111.03kB/s    0:12:58  
       37407250  30%  109.94kB/s    0:13:01  
Connection reset by 62.76.24.150 port 22
rsync: [sender] write error: Connection reset by peer (104)
rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.2]

After disconnection we have temp file:
/data/pr_dl/bases/kavkis2015/PURE:
total 156648
drwxr-xr-x  2 replicator  replicator        512 Oct  7 09:42 .
drwxr-xr-x  6 replicator  replicator        512 Oct  3 11:48 ..
-rw-------  1 replicator  replicator   34340864 Oct  7 09:46 .corebases.cab.No20UW

After reconnect rsync finished upload
d..t...... bases/kavkis2015/PURE/
<f.st...... bases/kavkis2015/PURE/corebases.cab
          22208   0%   68.41kB/s    0:30:02  
        4696047   3%  470.65kB/s    0:04:12  
        5453306   4%  393.46kB/s    0:04:59  
…
      118898404  96%  410.53kB/s    0:00:10  
      120208011  97%  551.43kB/s    0:00:05  
      122169563  99%  403.60kB/s    0:00:02  
      123331061 100%  147.23kB/s    0:13:38 (xfr#41, to-chk=19018/19458)

But really on server we have corrupted file
/data/pr_dl/bases/kavkis2015/PURE:
total 36200
drwxr-xr-x  2 replicator  replicator       512 Oct  3 11:48 .
drwxr-xr-x  6 replicator  replicator       512 Oct  3 11:48 ..
-rw-r--r--  1 replicator  replicator  34381209 Oct  7 11:56 corebases.cab

Do you have solution for this problem? May be we need use addition parameters?
Comment 1 Wayne Davison 2016-11-15 21:27:12 UTC
Rsync 3.1.2 doesn't leave its tmp files around unless it is killed by an uncaught signal, such as a sigbus. You might want to enable core files and/or monitor the rsync process to see how it is dying.

As for the corruption, rsync uses a full-file checksum on the resulting file to see if it was constructed correctly, so I'd suspect that your system is having some kind of a hardware issue, possibly connected to the above abnormal terminations.