isspace should not be called on arguments of type char, unless the character value is known not to be negative. https://git.samba.org/?p=samba.git;a=blob;f=lib/util/tini.c;h=6cd301a0b1b3cb2a2de69e53d673d517b2a67d60;hb=HEAD#l163 isspace (isdigit, isalpha, ...) etc. are only defined for the special value EOF and for values in the range of unsigned char. In practice that means -1 and 0-255. Anything else is undefined behaviour.
Created attachment 10988 [details] Patch Does this fix it?
Certainly, even if it is a bit overkill.
Do you have a simpler approach? We have quite some code that expects "char" instead of "unsigned char" in the callbacks.
It's a really messy corner of the C and C++ standards. You can do it a bit simpler like this: +static bool c_isspace(char c) +{ + return isspace((unsigned char)c); +} This is different from your version in how it handles non-ascii. The above respects the locale setting, yours ignores anything outside ascii range. One can certainly argue over which one is right. It basically comes down to the question of whether it is correct to treat the same file differently for two users with different locale settings.
What does the standard say about casting of a negative number in "char" representation, say -100, to unsigned char? Is that defined?
That's well-defined per C99's section 6.3.1.3 item #2. [#2] Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. (i.e., two's complement.) http://www.open-std.org/jtc1/sc22/wg14/www/docs/n869/n869.txt.gz On a two's complement machine -- anything not in the "ancient" section of a museum -- char-to-unsigned-char conversion doesn't even involve run-time code. The bit patterns for (signed char)-100 and (unsigned char)-100 are identical.
(In reply to M Welinder from comment #6) Thanks -- but what does this mean for isspace? :-) However, as you confirmed that my initial patch does solve the problem of undefined behaviour, I'll propose it for inclusion.