I see some very slow inital connection times when running rsync over ssh. There are no network issues like DNS or ping. Both computers are on the same switch and other transferring software like scp work fine. I also ran a snoop when issuing the rsync command and found that it does not send out any packets for a very long time (> 30 secs). This does not happen all the time just often enough to be noticed. As a sanity check, I switched back to an older version of rsync 2.4.6 and the problem does not occur. Let me know if further details are needed.
It sounds to me like an intermittent network problem that is outside the control of rsync. How much testing on the 2.4.6 version did you do to know that the problem did not recur? It may be that the network was just behaving itself during that time. The other thing it can be is the startup time necessary for the sending side to scan all the files: if the host that is scanning the files is doing a lot of I/O, it can be slow the first time you run the command, after which the normal OS caching of the diretory info would cause the next run to occur much more rapidly. You can use the --progress option to see if rsync is scanning files. If it is not either of those possibilities, please provide more info. Is the ssh connection already open? What did rsync last do before the hang?
Created attachment 884 [details] rsync 2.6.3 truss
Hi Wayne, Thanks for fielding this request. I haven't seen any indicators that the network is at fault (pings, scp, ssh all work fine). Here is a sample output of my rsync session: m1010sjc1:/tmp$ time /lc/depot/rsync-2.6.3-AO~0/bin/rsync -aWv --rsync- path="/lc/depot/rsync-2.6.3-AO~0/bin/rsync" --progress --stats -- rsh=/lc/bin/ssh vertex_user_snst_21304.dmp.gz m2249sjc1.cust:/tmp dpeng@m2249sjc1.cust's password: building file list ... 1 file to consider vertex_user_snst_21304.dmp.gz 3432962 100% 5.16MB/s 0:00:00 (1, 100.0% of 1) Number of files: 1 Number of files transferred: 1 Total file size: 3432962 bytes Total transferred file size: 3432962 bytes Literal data: 3432962 bytes Matched data: 0 bytes File list size: 86 Total bytes sent: 3433520 Total bytes received: 40 sent 3433520 bytes received 40 bytes 67991.29 bytes/sec total size is 3432962 speedup is 1.00 real 0m49.377s user 0m13.490s sys 0m2.601s I typed in my password as fast as I could after waiting around 30s. I am also going to attach a truss dump for your reference. Thanks, David
If the 30-second delay is happening before the password prompt, it is likely to be a delay during the ssh connection. The important thing to check is what is happening at the point the delay occurs -- is ssh trying to connect to the remote system? Is the remote system running slowly? If you have an extra window open on each system, you can check the process list to see what's running and check what the programs are doing.
Okay, I think I've got it. ssh-rand-helper which is a child process of ssh generates a random number from various commands specified in "ssh_prng_cmds". One of these commands was taking a huge amount of time and caused the delay for ssh and rsync. I went ahead and commented out the offending command (which was 'last') and rsync ran fine. Thanks for the help.