再看Java编码

从Emoji联想到的

比如 这个Emoji 😂的unicode 编码是 U+1F602

java char只有2个字节,肯定无法表示。

那String 怎么表示 Emoji这种 需要3个及以上字节表示的(大于 U+FFFF) 的字符呢?


其实是 UTF-16。

UTF-16 uses sequences of one or two unsigned 16-bit code units to encode Unicode code points. Values U+0000 to U+FFFF are encoded in one 16-bit unit with the same value. Supplementary characters are encoded in two code units, the first from the high-surrogates range (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). This may seem similar in concept to multi-byte encodings, but there is an important difference: The values U+D800 to U+DFFF are reserved for use in UTF-16; no characters are assigned to them as code points. This means, software can tell for each individual code unit in a string whether it represents a one-unit character or whether it is the first or second unit of a two-unit character. This is a significant improvement over some traditional multi-byte character encodings, where the byte value 0x41 could mean the letter "A" or be the second byte of a two-byte character. 

😂的UTF-16编码是 \uD83D\uDE02

String中用2个char来表示。

你可以定义 

public String emostring ="😂"; 

emostring.length()  // 返回2

emostring.codePointCount(0,emostring.length())  // 返回1


http://www.oracle.com/us/technologies/java/supplementary-142654.html

推荐阅读更多精彩内容