isspace should not be called on arguments of type char, unless the
character value is known not to be negative.
isspace (isdigit, isalpha, ...) etc. are only defined for the special
value EOF and for values in the range of unsigned char. In practice
that means -1 and 0-255. Anything else is undefined behaviour.
Created attachment 10988 [details]
Does this fix it?
Certainly, even if it is a bit overkill.
Do you have a simpler approach? We have quite some code that expects "char" instead of "unsigned char" in the callbacks.
It's a really messy corner of the C and C++ standards. You can do
it a bit simpler like this:
+static bool c_isspace(char c)
+ return isspace((unsigned char)c);
This is different from your version in how it handles non-ascii. The above
respects the locale setting, yours ignores anything outside ascii range.
One can certainly argue over which one is right. It basically comes down
to the question of whether it is correct to treat the same file differently
for two users with different locale settings.
What does the standard say about casting of a negative number in "char" representation, say -100, to unsigned char? Is that defined?
That's well-defined per C99's section 22.214.171.124 item #2.
[#2] Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.
(i.e., two's complement.)
On a two's complement machine -- anything not in the "ancient" section of
a museum -- char-to-unsigned-char conversion doesn't even involve run-time
code. The bit patterns for (signed char)-100 and (unsigned char)-100 are
(In reply to M Welinder from comment #6)
Thanks -- but what does this mean for isspace? :-)
However, as you confirmed that my initial patch does solve the problem of undefined behaviour, I'll propose it for inclusion.