What you have is EXTRATERRESTRIAL ALIEN (U+1F47D)
and BROKEN HEART (U+1F494)
which
are not in the basic multilingual plane. They cannot be even represented in java as one char, "????".length() == 4
. They are definitely not null characters and one will see squares if you are not using fonts that support them.
MySQL's utf8
only supports basic multilingual plane, and you need to use utf8mb4
instead:
For a supplementary character, utf8 cannot store the character at all,
while utf8mb4 requires four bytes to store it. Since utf8 cannot store
the character at all, you do not have any supplementary characters in
utf8 columns and you need not worry about converting characters or
losing data when upgrading utf8 data from older versions of MySQL.
So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4
everywhere. Connection encoding needs to be utf8mb4
, character set needs to be utf8mb4
and collaction needs to be utf8mb4
. For java it's still just "utf-8"
, but MySQL needs a distinction.
I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:
SET NAMES 'utf8mb4'
Right after making the connection.
See also this for Connector/J:
14.14: How can I use 4-byte UTF8, utf8mb4 with Connector/J?
To use 4-byte UTF8 with Connector/J configure the MySQL server with
character_set_server=utf8mb4. Connector/J will then use that setting
as long as characterEncoding has not been set in the connection
string. This is equivalent to autodetection of the character set.
Adjust your columns and database as well:
var1 varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL
Again, your MySQL version needs to be relatively up-to-date for utf8mb4 support.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…