Bug 3382 - hang in read() in exclude.test of testsuite
Summary: hang in read() in exclude.test of testsuite
Status: CLOSED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: Alpha OSF/1
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-06 14:23 UTC by Tim Mooney
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:


Attachments
clearerr() before trying getc() again, otherwise may loop indefinitely (400 bytes, patch)
2006-01-06 14:46 UTC, Tim Mooney
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Mooney 2006-01-06 14:23:34 UTC
I just built rsync 2.6.6 on my Tru64 5.1B box, using the vendor C compiler.  The build went fine and most of the testsuite passes (I'm running as myself, so a couple of the tests skip since they require root privs), but the exclude.test hangs.

Using truss, I see that rsync is stuck in a tight read loop, calling read on fd 0 and getting back 0 bytes.

I attached to rsync with a debugger and stopped execution, and got this:

(ladebug) attach 198075 rsync
Reading symbolic information ...done
Attached to process id 198075  ....

Interrupt (for process)

Stopping process localhost:198075 (rsync).
stopped at [<opaque> __read(...) 0x3ff800d67d8] 

Information:  An <opaque> type was presented during execution of the previous command.  For complete type information on this symbol, recompilation of the program will be necessary.  Consult the compiler man pages for details on producing full symbol table information using the '-g' (and '-gall' for cxx) flags.

(ladebug) where
>0  0x3ff800d67d8 in __read(...) in /usr/shlib/libc.so
#1  0x3ff80177b20 in __read_nc(...) in /usr/shlib/libc.so
#2  0x3ff800da6d4 in __filbuf(...) in /usr/shlib/libc.so
#3  0x120023574 in parse_filter_file(listp=0x140000010, fname=Info: no allocation applies for symbol fname at the current PC
<no value>, mflags=1024, xflags=1) "exclude.c":999
#4  0x120023140 in parse_rule(listp=0x140000010, pattern=0x14000e3e7="", mflags=0, xflags=0) "exclude.c":938
#5  0x120031140 in parse_arguments(argc=0x11fffbfd8, argv=0x11fffbfd0, frommain=1) "options.c":749
#6  0x12002a878 in main(argc=9, argv=0x11fffc018) "main.c":1104
#7  0x1200187b8 in __start(...) in rsync

(ladebug) up
>1  0x3ff80177b20 in __read_nc(...) in /usr/shlib/libc.so
(ladebug) up
>2  0x3ff800da6d4 in __filbuf(...) in /usr/shlib/libc.so
(ladebug) up
>3  0x120023574 in parse_filter_file(listp=0x140000010, fname=Info: no allocation applies for symbol fname at the current PC
<no value>, mflags=1024, xflags=1) "exclude.c":999
    999                         if ((ch = getc(fp)) == EOF) {
(ladebug) list $curline - 10 : 20
    989                         exit_cleanup(RERR_FILEIO);
    990                 }
    991                 return;
    992         }
    993         dirbuf[dirbuf_len] = '\0';
    994 
    995         while (1) {
    996                 char *s = line;
    997                 int ch, overflow = 0;
    998                 while (1) {
>   999                         if ((ch = getc(fp)) == EOF) {
   1000                                 if (ferror(fp) && errno == EINTR)
   1001                                         continue;
   1002                                 break;
   1003                         }
   1004                         if (word_split && isspace(ch))
   1005                                 break;
   1006                         if (eol_nulls? !ch : (ch == '\n' || ch == '\r'))
   1007                                 break;
   1008                         if (s < eob)
(ladebug) n
stopped at [void parse_filter_file(struct filter_list_struct*, const char*, unsigned int, int):1000 0x1200235c4]        
   1000                                 if (ferror(fp) && errno == EINTR)
(ladebug) print fp
0x3ffc0080050
(ladebug) print errno
4
(ladebug) print line
"- /mid/for/foo/extra"



I checked errno.h, and "4" is indeed EINTR.  I think that the problem may be that clearerr() should be called before the continue -- otherwise there's nothing that would ever clear out the error condition after it has happened once.

I'm attaching a simple patch that fixes the problem.
Comment 1 Tim Mooney 2006-01-06 14:46:41 UTC
Created attachment 1652 [details]
clearerr() before trying getc() again, otherwise may loop indefinitely
Comment 2 Wayne Davison 2006-01-06 15:20:32 UTC
Thanks for the fix!  I've just checked this into CVS for the upcoming 2.6.7.