Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
493 views
in Technique[技术] by (71.8m points)

c# - Encoding space character in XML name

I am given an XML file which contains names like below:

<Bench?0020Code?0020>something</Bench?0020Code?0020>

The ? symbol is represented with three bytes: 0xE2, 0x86, 0x82.

It looks like ?0020 is supposed to be treated as space character. But when I read the XML using System.Xml.XmlReader the characters ?0020 are not converted to space.

Is there is a way to have them converted (besides replacing, of course)? Or I just got broken XML?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Space characters are not permitted in XML names

There are 86 codepoints whose name contain the word space. Ignoring the codepoints where SPACE hits due to MONOSPACE and any other that have a visual representation, leaves the following:

  • #x0020 SPACE
  • #x00A0 NO-BREAK SPACE
  • [#x2002-#x200A] EN SPACE through HAIR SPACE
  • #x205F MEDIUM MATHEMATICAL SPACE
  • #x3000 IDEOGRAPHIC SPACE

None of the space-related code points (empty visual representation) are permitted in XML names by the W3C XML BNF for component names:

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |
                  [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
                  [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
                  [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                  [#x10000-#xEFFFF]
NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] |
                  [#x203F-#x2040]
Name          ::= NameStartChar (NameChar)*

Alternatives to spaces in XML names

  • CamelCase
  • underscore_char
  • hyphen-char
  • period.char

Colon should not be used as a word separator in XML names to avoid confusion with its use in XML Namespaces.


? is permitted in XML names

The character, ?, (0xE2, 0x86, 0x82, which is #x2182), has nothing to do with spaces – it is ROMAN NUMERAL TEN THOUSAND. ? is explicitly permitted: #x2182 is in the [#x2070-#x218F] code range.

The 0020 appearing after ? are just digits. Together with the rest of the characters in Bench?0020Code?0020, these form an allowed (albeit unconventional) XML name. They do not constitute spaces in the XML name as spaces are not allowed in XML names.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...