I am observing a rsync daemon process with a high load and an endlessly running loop in function check_prior() in hlink.c
I stepped (with gdb) through this function but there is no way out of this while loop.
rsync command: rsync --server -vlHtprze.iLs --timeout=600 --delete --partial --ignore-existing . <path>
gdb back trace:
#0 0x080ad49c in check_prior (file=0x8297f7c, gnum=0, prev_ndx_p=0x80449a0, flist_p=0x80449a4) at hlink.c:268
#1 0x080ae0f3 in skip_hard_link (file=0x8297f7c, flist_p=0x81bf8b4) at hlink.c:549
#2 0x080c6a14 in handle_skipped_hlink (file=0x8297f7c, itemizing=1, code=FLOG, f_out=1) at generator.c:2015
#3 0x080c4b68 in recv_generator (
fname=0x8045900 "<path to directory>",
file=0x8297f7c, ndx=647, itemizing=1, code=FLOG, f_out=1) at generator.c:1400
#4 0x080c761f in generate_files (f_out=1, local_name=0x0) at generator.c:2262
#5 0x0809fa50 in do_recv (f_in=0, f_out=1, local_name=0x0) at main.c:832
#6 0x0809fdd0 in do_server_recv (f_in=0, f_out=1, argc=1, argv=0x81d06e4) at main.c:942
#7 0x0809feb2 in start_server (f_in=0, f_out=1, argc=2, argv=0x81d06e0) at main.c:972
Please tell me which further information do you need.
After some more investigation with gdb it seems like something is wrong in function flist_for_ndx().
check_prior() calls flist_for_ndx() with ndx=759. The problem is he returns every time cur_flist since there is no way to set another flist than cur_flist (because ndx=759 and cur_flist->ndx_start = 758).
next = 0x81dfb18, prev = 0x87033d0, files = 0x88a34e0, sorted = 0x88a34e0, file_pool = 0x81d1800, pool_boundary = 0x82bc474, used = 3, malloced = 32768, low = 0, high = 2, ndx_start = 758, flist_num = 50, parent_ndx = 49, in_progress = -2, to_redo = 0
I don't know if this would be the correct fix but maybe we need something like this after the loops in flist_for_ndx:
if (ndx == flist->ndx_start - 1)
I have three broken rsync daemons but I have no idea how to reproduce this behavior.
You should use the --owner option with -H or upgrade to a newer rsync.