You should use the W3C method, which it is something like:
if you know the encoding, use that
if there is a BOM, use it to determine the encoding
decode as UTF-8. UTF-8 has strict byte sequence rules (which it is the purpose of UTF-8: being able to find the first byte of a character). So if the file it is not UTF-8, very probably it will fail the decoding: on ANSI (cp-1252) it is not frequent to have accented letters followed by a symbols, and not at all probable that every time you have such sequence. Latin-1: you may get control characters (instead of symbols), but it is also very seldom to have control characters C1 only after accented letters, and always C1 after accented characters.
if decoding fails (maybe you can just test first 4096 bytes, or 10 bytes above 127), use the standard 8-bit encoding of the OS (probably cp-1252 on windows).
This method should work very well. It is biased on UTF-8, but the world went to such directions long ago. Determining which codepage is much more difficult.
You may add a step before the last step. If there are various 00
bytes, you may be in a UTF-16 or UTF-32 form. Unicode requires that you know which form (e.g. from side channel), else the files should have a BOM. But you can guess the form (UTF-16LE, UTF-16BE, UTF-32LE, UTF32-BE) according the position of 00
in the file (new lines, and some ASCII characters are considered common scripts, so they are used in many scripts, so you should have many 00
).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…