Yes, the argument to toupper
needs to be converted to unsigned char
to avoid the risk of undefined behavior.
The types char
, signed char
, and unsigned char
are three distinct types. char
has the same range and representation as either signed char
or unsigned char
. (Plain char
is very commonly signed and able to represent values in the range -128..+127.)
The toupper
function takes an int
argument and returns an int
result. Quoting the C standard, section 7.4 paragraph 1:
In all cases the argument is an int
, the value of which shall be
representable as an unsigned char
or shall equal the value of
the macro EOF
. If the argument has any other value, the
behavior is undefined.
(C++ incorporates most of the C standard library, and defers its definition to the C standard.)
The []
indexing operator on std::string
returns a char
value. If plain char
is a signed type, and if the value returned by name[0]
happens to be negative, then the expression
toupper(name[0])
has undefined behavior.
The language guarantees that, even if plain char
is signed, all members of the basic character set have non-negative values, so given the initialization
string name = "Niels Stroustrup";
the program doesn't risk undefined behavior. But yes, in general a char
value passed to toupper
(or to any of the functions declared in <cctype>
/ <ctype.h>
) needs to be converted to unsigned char
, so that the implicit conversion to int
won't yield a negative value and cause undefined behavior.
The <ctype.h>
functions are commonly implemented using a lookup table. Something like:
// assume plain char is signed
char c = -2;
c = toupper(c); // undefined behavior
may index outside the bounds of that table.
Note that converting to unsigned
:
char c = -2;
c = toupper((unsigned)c); // undefined behavior
doesn't avoid the problem. If int
is 32 bits, converting the char
value -2
to unsigned
yields 4294967294
. This is then implicitly converted to int
(the parameter type), which probably yields -2
.
toupper
can be implemented so it behaves sensibly for negative values (accepting all values from CHAR_MIN
to UCHAR_MAX
), but it's not required to do so. Furthermore, the functions in <ctype.h>
are required to accept an argument with the value EOF
, which is typically -1
.
The C++ standard makes adjustments to some C standard library functions. For example, strchr
and several other functions are replaced by overloaded versions that enforce const
correctness. There are no such adjustments for the functions declared in <cctype>
.