I have a String encoded in UTF-8. For example:
Thats a nice joke ?????? ??
I have to extract all the emojis present in the sentence. And the emoji could be any
When this sentence is viewed in terminal using command less text.txt
it is viewed as:
Thats a nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>
This is the corresponding UTF code for the emoji. All the codes for emojis can be found at emojitracker.
For the purpose of finding all the occurances, I used a regular expression pattern (<U+w+?>)
but it didnt work for the UTF-8 encoded string.
Following is my code:
String s="Thats a nice joke ?????? ??";
Pattern pattern = Pattern.compile("(<U\+\w+?>)");
Matcher matcher = pattern.matcher(s);
List<String> matchList = new ArrayList<String>();
while (matcher.find()) {
matchList.add(matcher.group());
}
for(int i=0;i<matchList.size();i++){
System.out.println(matchList.get(i));
}
This pdf says Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs
. So I want to capture any character lying within this range.
Question&Answers:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…