Bug 5695 - improve keep-alive code to handle long-running directory scans
Summary: improve keep-alive code to handle long-running directory scans
Status: ASSIGNED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: x86 Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-16 08:09 UTC by Gabriele Tozzi
Modified: 2010-10-27 00:29 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriele Tozzi 2008-08-16 08:09:11 UTC
Hello,

first of all, thank you for developing, maintaining and letting us use for free this great software.

I'm using rsync to sync two local directories. Some subdirectories in the source dir contains a lot of small files (maildirs) and rsync takes a very long time to read all of them and decide if they need to be synced or not. During this process, rsync looks like it's stuck (but it isn't, as strace shows) and --timeout is hitten and process is terminated giving a timeout error.

---------------

Example:

/mnt/dati2/dati is the source dir
/mnt/extra/Backups is the destination dir

I invoke rsync with:

# strace -f /usr/bin/rsync --timeout=600 -v -a -x --timeout=300 --numeric-ids --delete --delete-excluded /mnt/dati2/dati /mnt/extra/Backups > /tmp/rsync.out 2> /tmp/rsync.err

rsync.out contains:
sending incremental file list
dati/Documenti/Gabriele/Maildir/.Bozze/tmp/

rsync.err cointains 119Mb of data, I'm trying to strip most significant part (last 100 lines):
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954597.M629924P20705V0000000000000906I00053A8C_3331.transylvania,S=5064:2,S", {st_mode=S_IFREG|0644, st_size=5064, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196953847.M191213P20668V0000000000000906I0001C424_5433.transylvania,S=2406:2,S", {st_mode=S_IFREG|0644, st_size=2406, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955722.M25482P20750V0000000000000906I0006A649_3392.transylvania,S=2981:2,S", {st_mode=S_IFREG|0644, st_size=2981, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196953893.M65320P20671V0000000000000906I0001DB24_2553.transylvania,S=2262:2,S", {st_mode=S_IFREG|0644, st_size=2262, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954963.M426924P20740V0000000000000906I0005B082_1068.transylvania,S=3147:2,S", {st_mode=S_IFREG|0644, st_size=3147, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955769.M801345P20745V0000000000000906I0006C64F_3774.transylvania,S=2584:2,S", {st_mode=S_IFREG|0644, st_size=2584, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955469.M175771P20747V0000000000000906I00065223_2466.transylvania,S=8259:2,S", {st_mode=S_IFREG|0644, st_size=8259, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955858.M750831P20753V0000000000000906I0006F97A_4320.transylvania,S=8297:2,S", {st_mode=S_IFREG|0644, st_size=8297, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954574.M965820P20714V0000000000000906I0005292F_2422.transylvania,S=2494:2,S", {st_mode=S_IFREG|0644, st_size=2494, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954364.M641287P20705V0000000000000906I00012BDB_910.transylvania,S=5289:2,S", {st_mode=S_IFREG|0644, st_size=5289, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954919.M425335P20731V0000000000000906I0005A49E_1486.transylvania,S=5399:2,S", {st_mode=S_IFREG|0644, st_size=5399, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955838.M714021P20760V0000000000000906I0006ED69_2610.transylvania,S=5909:2,S", {st_mode=S_IFREG|0644, st_size=5909, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955558.M610334P20745V0000000000000906I00066F8E_2761.transylvania,S=2243:2,S", {st_mode=S_IFREG|0644, st_size=2243, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955769.M860881P20740V0000000000000906I0006C65E_5168.transylvania,S=4672:2,S", {st_mode=S_IFREG|0644, st_size=4672, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955440.M467314P20726V0000000000000906I0006499B_3879.transylvania,S=4836:2,S", {st_mode=S_IFREG|0644, st_size=4836, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954550.M692809P20697V0000000000000906I00051339_6429.transylvania,S=8518:2,S", {st_mode=S_IFREG|0644, st_size=8518, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955803.M578489P20731V0000000000000906I0006D695_5350.transylvania,S=4031:2,S", {st_mode=S_IFREG|0644, st_size=4031, ...}) = 0
[pid  3083] getdents64(0, /* 19 entries */, 2048) = 1976
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954096.M756473P20685V0000000000000906I00046C56_872.transylvania,S=3728:2,S", {st_mode=S_IFREG|0644, st_size=3728, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955738.M529606P20747V0000000000000906I0006B37B_3549.transylvania,S=26487:2,S", {st_mode=S_IFREG|0644, st_size=26487, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954763.M460840P20717V0000000000000906I000583B6_5062.transylvania,S=5224:2,S", {st_mode=S_IFREG|0644, st_size=5224, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954544.M508824P20717V0000000000000906I00051139_1327.transylvania,S=4454:2,S", {st_mode=S_IFREG|0644, st_size=4454, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954833.M295279P20717V0000000000000906I00058AF8_5515.transylvania,S=2667:2,S", {st_mode=S_IFREG|0644, st_size=2667, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954799.M638564P20726V0000000000000906I0001B85D_1606.transylvania,S=2622:2,S", {st_mode=S_IFREG|0644, st_size=2622, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955095.M398304P20750V0000000000000906I0005D73A_414.transylvania,S=4993:2,S", {st_mode=S_IFREG|0644, st_size=4993, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196956051.M797185P20775V0000000000000906I00075912_4096.transylvania,S=4544:2,S", {st_mode=S_IFREG|0644, st_size=4544, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954776.M833797P20736V0000000000000906I00058958_142.transylvania,S=8657:2,S", {st_mode=S_IFREG|0644, st_size=8657, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196953672.M688315P20639V0000000000000906I000157EC_1473.transylvania,S=16412:2,S", {st_mode=S_IFREG|0644, st_size=16412, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955217.M550481P20747V0000000000000906I0005FC5F_1440.transylvania,S=3213:2,S", {st_mode=S_IFREG|0644, st_size=3213, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954409.M947381P20705V0000000000000906I00019D11_1281.transylvania,S=2300:2,S", {st_mode=S_IFREG|0644, st_size=2300, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196955123.M402452P20721V0000000000000906I0005DED8_4614.transylvania,S=23271:2,S", {st_mode=S_IFREG|0644, st_size=23271, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954501.M349740P20705V0000000000000906I0004FA9C_2061.transylvania,S=3494:2,S", {st_mode=S_IFREG|0644, st_size=3494, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196953618.M462574P20411V0000000000000906I000135F1_2651.transylvania,S=5902:2,S", {st_mode=S_IFREG|0644, st_size=5902, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196953603.M741820P20411V0000000000000906I00012BB5_1051.transylvania,S=4076:2,S", {st_mode=S_IFREG|0644, st_size=4076, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954081.M114612P20682V0000000000000906I00045CD5_4011.transylvania,S=5310:2,S", {st_mode=S_IFREG|0644, st_size=5310, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196956071.M395246P20753V0000000000000906I00075EA1_5778.transylvania,S=2472:2,S", {st_mode=S_IFREG|0644, st_size=2472, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954760.M805379P20721V0000000000000906I0005828E_3062.transylvania,S=4084:2,S", {st_mode=S_IFREG|0644, st_size=4084, ...}) = 0
[pid  3083] getdents64(0, /* 19 entries */, 2048) = 1976
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196953650.M866107P20411V0000000000000906I00014C6A_6172.transylvania,S=2198:2,S", {st_mode=S_IFREG|0644, st_size=2198, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196956524.M753914P20775V0000000000000906I0007AD29_6380.transylvania,S=3413:2,S", {st_mode=S_IFREG|0644, st_size=3413, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954540.M610753P20710V0000000000000906I00050FF7_2489.transylvania,S=5602:2,S", {st_mode=S_IFREG|0644, st_size=5602, ...}) = 0
[pid  3083] lstat64("dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/1196954257.M71986P20697V0000000000000906I0004CA93_2163.transylvania,S=3982:2,S",  <unfinished ...>
[pid  3082] <... select resumed> )      = 0 (Timeout)
[pid  3082] time(NULL)                  = 1218891337
[pid  3082] write(2, "io timeout after 300 seconds -- e"..., 39io timeout after 300 seconds -- exiting) = 39
[pid  3082] write(2, "\n"..., 1
)        = 1
[pid  3082] rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
[pid  3082] rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
[pid  3082] waitpid(3083, 0xbfff4cac, WNOHANG) = 0
[pid  3082] getpid()                    = 3082
[pid  3082] kill(3083, SIGUSR1)         = 0
[pid  3082] write(2, "rsync error: timeout in data send"..., 79rsync error: timeout in data send/receive (code 30) at io.c(239) [sender=3.0.3]) = 79
[pid  3082] write(2, "\n"..., 1
)        = 1
[pid  3082] exit_group(30)              = ?
[pid  3083] <... lstat64 resumed> {st_mode=S_IFREG|0644, st_size=3982, ...}) = 0
[pid  3083] --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
[pid  3084] <... select resumed> )      = 0 (Timeout)
[pid  3083] rt_sigaction(SIGUSR1, {SIG_IGN},  <unfinished ...>
[pid  3084] time( <unfinished ...>
[pid  3083] <... rt_sigaction resumed> NULL, 8) = 0
[pid  3084] <... time resumed> NULL)    = 1218891337
[pid  3083] rt_sigaction(SIGUSR2, {SIG_IGN},  <unfinished ...>
[pid  3084] rt_sigaction(SIGUSR1, {SIG_IGN},  <unfinished ...>
[pid  3083] <... rt_sigaction resumed> NULL, 8) = 0
[pid  3084] <... rt_sigaction resumed> NULL, 8) = 0
[pid  3083] kill(3084, SIGUSR1 <unfinished ...>
[pid  3084] rt_sigaction(SIGUSR2, {SIG_IGN},  <unfinished ...>
[pid  3083] <... kill resumed> )        = 0
[pid  3084] <... rt_sigaction resumed> NULL, 8) = 0
[pid  3083] select(4, [3], [1], [1], {60, 0}) = 1 (out [1], left {60, 0})
[pid  3083] write(1, "J\0\0\nrsync error: received SIGUSR1"..., 78) = -1 EPIPE (Broken pipe)
[pid  3083] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[pid  3083] write(2, "rsync: writefd_unbuffered failed "..., 80rsync: writefd_unbuffered failed to write 78 bytes [generator]: Broken pipe (32)) = 80
[pid  3083] write(2, "\n"..., 1
)        = 1
[pid  3083] rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
[pid  3083] rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
[pid  3083] gettimeofday({1218891337, 769475}, NULL) = 0
[pid  3083] select(0, NULL, NULL, NULL, {0, 100000} <unfinished ...>
[pid  3084] --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
[pid  3084] select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0})
[pid  3084] write(4, "R\0\0\nrsync error: timeout in data "..., 86) = 86
[pid  3084] gettimeofday({1218891337, 770193}, NULL) = 0
[pid  3084] select(0, NULL, NULL, NULL, {0, 100000} <unfinished ...>
[pid  3083] <... select resumed> )      = 0 (Timeout)
[pid  3083] gettimeofday({1218891337, 867447}, NULL) = 0
[pid  3083] select(0, NULL, NULL, NULL, {0, 3000} <unfinished ...>
[pid  3084] <... select resumed> )      = 0 (Timeout)
[pid  3084] gettimeofday({1218891337, 867735}, NULL) = 0
[pid  3084] select(0, NULL, NULL, NULL, {0, 3000} <unfinished ...>
[pid  3083] <... select resumed> )      = 0 (Timeout)
[pid  3083] gettimeofday({1218891337, 879805}, NULL) = 0
[pid  3083] exit_group(19)              = ?
Process 3083 detached
<... select resumed> )                  = 0 (Timeout)
gettimeofday({1218891337, 890184}, NULL) = 0
exit_group(30)                          = ?
Process 3084 detached

please not that directory dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/ contains some thousands of files. A simple "ls" command on this folder, takes many minutes to complete.

Ask me more informations, if you need.

Thank you.

Gabriele Tozzi
Comment 1 Wayne Davison 2008-08-18 19:34:12 UTC
Please give 3.0.4pre2 a try.  It contains a fix that should make rsync send more regular keep-alive messages in incremental recursion mode.
Comment 2 Gabriele Tozzi 2008-08-18 20:26:47 UTC
Thank you for your reply.
3.0.4pre2 times out at an earlier point:

strace -f /tmp/rsync/rsync-3.0.4pre2/rsync -v -a -x --timeout=300 --numeric-ids --delete --delete-excluded /mnt/dati2/dati /mnt/extra/Backups > /tmp/rsync.out 2> /tmp/rsync.err

rsync.out:
sending incremental file list
[some files]

rsync.err (last lines):
[pid  9127] lstat64("dati/Backups/Hell/home/bodom/.kde/share/apps/kopete/logs/JabberProtocol/---accountname---", {st_mode=S_IFDIR|0700, st_size=2048, ...}) = 0
[pid  9127] lstat64("dati/Backups/Hell/home/bodom/.kde/share/apps/kopete/logs/MSNProtocol/---accountname---", {st_mode=S_IFDIR|0700, st_size=2048, ...}) = 0
[pid  9127] lstat64("dati/Backups/Hell/home/bodom/.kde/share/apps/kopete/logs/MSNProtocol/---accountname---", {st_mode=S_IFDIR|0700, st_size=2048, ...}) = 0
[pid  9127] lstat64("dati/Backups/Hell/home/bodom/.kde/share/apps/kopete/logs/MSNProtocol/---accountname---", {st_mode=S_IFDIR|0700, st_size=2048, ...}) = 0
[pid  9127] lstat64("dati/Backups/Hell/home/bodom/.kde/share/apps/kopete/logs/MSNProtocol/---accountname---", {st_mode=S_IFDIR|0700, st_size=67584, ...}) = 0
[pid  9127] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  9128] <... select resumed> )      = 1 (in [0], left {59, 990000})
[pid  9128] read(0, "\1\0\0\7"..., 4)   = 4
[pid  9128] time(NULL)                  = 1219108641
[pid  9128] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0})
[pid  9128] read(0, "\0"..., 1)         = 1
[pid  9128] time(NULL)                  = 1219108641
[pid  9128] select(1, [0], [], NULL, {60, 0} <unfinished ...>
[pid  9126] <... select resumed> )      = 0 (Timeout)
[pid  9126] time(NULL)                  = 1219108701
[pid  9126] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  9128] <... select resumed> )      = 0 (Timeout)
[pid  9128] time(NULL)                  = 1219108701
[pid  9128] select(1, [0], [], NULL, {60, 0} <unfinished ...>
[pid  9127] <... select resumed> )      = 0 (Timeout)
[pid  9127] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  9126] <... select resumed> )      = 0 (Timeout)
[pid  9126] time(NULL)                  = 1219108761
[pid  9126] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  9127] <... select resumed> )      = 0 (Timeout)
[pid  9127] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  9128] <... select resumed> )      = 0 (Timeout)
[pid  9128] time(NULL)                  = 1219108761
[pid  9128] select(1, [0], [], NULL, {60, 0} <unfinished ...>
[pid  9126] <... select resumed> )      = 0 (Timeout)
[pid  9126] time(NULL)                  = 1219108821
[pid  9126] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  9127] <... select resumed> )      = 0 (Timeout)
[pid  9127] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  9128] <... select resumed> )      = 0 (Timeout)
[pid  9128] time(NULL)                  = 1219108821
[pid  9128] select(1, [0], [], NULL, {60, 0} <unfinished ...>
[pid  9126] <... select resumed> )      = 0 (Timeout)
[pid  9126] time(NULL)                  = 1219108881
[pid  9126] select(6, [5], [], NULL, {60, 0} <unfinished ...>
[pid  9128] <... select resumed> )      = 0 (Timeout)
[pid  9128] time(NULL)                  = 1219108881
[pid  9127] <... select resumed> )      = 0 (Timeout)
[pid  9128] select(1, [0], [], NULL, {60, 0} <unfinished ...>
[pid  9127] select(4, [3], [], NULL, {60, 0} <unfinished ...>
[pid  9126] <... select resumed> )      = 0 (Timeout)
[pid  9126] time(NULL)                  = 1219108941
[pid  9126] write(2, "io timeout after 300 seconds -- e"..., 39io timeout after 300 seconds -- exiting) = 39
[pid  9126] write(2, "\n"..., 1
)        = 1
[pid  9126] rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
[pid  9126] rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
[pid  9126] waitpid(9127, 0xbfbba3ec, WNOHANG) = 0
[pid  9126] getpid()                    = 9126
[pid  9126] kill(9127, SIGUSR1)         = 0
[pid  9127] <... select resumed> )      = ? ERESTARTNOHAND (To be restarted)
[pid  9127] --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
[pid  9126] write(2, "rsync error: timeout in data send"..., 83 <unfinished ...>
[pid  9128] <... select resumed> )      = 0 (Timeout)
[pid  9128] time( <unfinished ...>
[pid  9126] <... write resumed> )       = 83
[pid  9127] rt_sigaction(SIGUSR1, {SIG_IGN},  <unfinished ...>
[pid  9126] write(2, "\n"..., 1
 <unfinished ...>
[pid  9127] <... rt_sigaction resumed> NULL, 8) = 0
[pid  9126] <... write resumed> )       = 1
[pid  9127] rt_sigaction(SIGUSR2, {SIG_IGN},  <unfinished ...>
[pid  9126] exit_group(30)              = ?
[pid  9127] <... rt_sigaction resumed> NULL, 8) = 0
[pid  9127] kill(9128, SIGUSR1)         = 0
[pid  9127] gettimeofday({1219108941, 713332}, NULL) = 0
[pid  9127] select(0, NULL, NULL, NULL, {0, 100000} <unfinished ...>
[pid  9128] <... time resumed> NULL)    = 1219108941
[pid  9128] --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
[pid  9128] rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
[pid  9128] rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
[pid  9128] select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0})
[pid  9128] write(4, "M\0\0\nrsync error: received SIGUSR1"..., 81) = 81
[pid  9128] gettimeofday({1219108941, 714385}, NULL) = 0
[pid  9128] select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
[pid  9128] gettimeofday({1219108941, 809137}, NULL) = 0
[pid  9128] select(0, NULL, NULL, NULL, {0, 6000} <unfinished ...>
[pid  9127] <... select resumed> )      = 0 (Timeout)
[pid  9127] gettimeofday({1219108941, 809390}, NULL) = 0
[pid  9127] select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
[pid  9127] gettimeofday({1219108941, 818916}, NULL) = 0
[pid  9127] exit_group(19)              = ?
Process 9127 detached
<... select resumed> )                  = 0 (Timeout)
gettimeofday({1219108941, 820423}, NULL) = 0
exit_group(19)                          = ?
Process 9128 detached
Comment 3 Wayne Davison 2008-09-06 09:49:30 UTC
> please note that directory dati/Documenti/Gabriele/Maildir/.Bugtraq.2006/cur/
> contains some thousands of files. A simple "ls" command on this folder, takes
> many minutes to complete.

Unfortunately, this is not something that rsync currently has keep-alive code inside (the directory scanning code).  The timeout value currently needs to be set high enough to handle the largest directory scan, and (if something like --copy-file were used) the largest local-file copy the generator might do.  The only thing you can currently do is to increase the timeout to the point where it doesn't fire during a successful transfer.

This would be something that would be good to improve for 3.1.0.
Comment 4 Bill W 2009-04-09 17:49:42 UTC
I echo the "thanks for all your hardword making rsync available to the masses".

I'd just like to add to this defect that I have a system where disk writes can be very slow when my client is very loaded.  In my system, rsync gets lowest priority to use disks on the client.  Some very large files in my directories can take a long, long time to create (longer than the 10 minute timeout I need to use on client).  I am not using --copy-files but I am getting a local copy because my very large files are often mostly unchanged (just a byte in a gigantic file) or only a timestamp change.   So the receiver is creating the temp file from an existing file which is usually 100% or 99.999999% the same.   I am stuck in the file creation loop in receiver.c/fileio.c for more than 10 minutes because write() is slow but it does make slow progress.

Just for fun, I simulated my problem on a fast Linux client by hacking in a msleep(20*1000) in the flush_write_file() loop on the client rsync so that each local file "copy update" takes much longer than the client timeout.  The server times out on the client in this test scenario too.

I would love it if someday you could make a keep alive, or equivalent, for file creation and not just for directory listing.  I'll do a workaround by making a huge timeout on the server /etc/rsyncd.conf so that it does not timeout on the client.

By the way, here is my rsync command line; nothing fancy here except the timeout.  Yes, I'm using a slow FAT drive (I have no choice here):
rsync --modify-window=3602 -ptO -L --delete-during -v -ii --progress --port=
873 -z -r --bwlimit=0 --timeout=600 "myserver::rtdata/mydir" "destdir"
Comment 5 Tim Taiwanese Liim 2010-10-27 00:29:35 UTC
These bugs seem to be related (having similar error messages:
connection unexpectedly closed, broken pipe, timeout).

bug7757
    with big file, rsync times out out when it should not; the 
    sender is still responsive

bug2783
    Random high loads during syncs (server side) / client stream errors
    rsync: connection unexpectedly closed (2761332 bytes received so far)
    [generator]
    rsync error: error in rsync protocol data stream (code 12) at io.c(365)

bug5478
    rsync: writefd_unbuffered failed to write 4092 bytes [sender]: 
                Broken pipe (32)

    rsync: writefd_unbuffered failed to write 4092 bytes [sender]: 
                Broken pipe (32)
    io timeout after 30 seconds -- exiting
    rsync error: timeout in data send/receive (code 30) at io.c(239) 
                [sender=3.0.2]

bug5695
    improve keep-alive code to handle long-running directory scans

    ./io.c:void maybe_send_keepalive(void)

bug6175
    write last transfer status when timeout or other error happens
    rsync: writefd_unbuffered failed to write 4 bytes [sender]: 
           Broken pipe (32)
    rsync: connection unexpectedly closed (99113 bytes received so 
           far) [sender]
    rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5]

bug7195
    timeout reached while sending checksums for very large files