Each byte starts with a few bits that tell you whether it's a single byte code-point, a multi-byte code point, or a continuation of a multi-byte code point. Like this:
0xxx xxxx A single-byte US-ASCII code (from the first 127 characters)
The multi-byte code-points each start with a few bits that essentially say "hey, you need to also read the next byte (or two, or three) to figure out what I am." They are:
110x xxxx One more byte follows
1110 xxxx Two more bytes follow
1111 0xxx Three more bytes follow
Finally, the bytes that follow those start codes all look like this:
10xx xxxx A continuation of one of the multi-byte characters
Since you can tell what kind of byte you're looking at from the first few bits, then even if something gets mangled somewhere, you don't lose the whole sequence.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…