There are a variety of characters that are not legally encodeable in XML 1.0, e.g. U+0007
('bell') and U+001B
('escape'). Most of the interesting ones are non-whitespace 'control' characters.
It's clear from (e.g.) this question and others that it's the XML spec that's the issue -- but can anyone illuminate me as to why the XML spec forbids these characters?
It seems like it could have been required that they be encoded in escapes, e.g. as 
and 
respectively, but perhaps there's a practical reason that the characters were forbidden rather than required to be escaped?
Answerers have suggested that there is some motivation towards avoiding transmission control characters, but Unicode includes many other control-like characters (consider U+200C
"zero width non joiner"). I recognize there may be no good reason for this behavior, but I would still like to understand it better.
It's particularly frustrating because when those character values appear in other encodings data formats, I end up "double-escaping" new XML documents that need to encode this.
Question&Answers:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…