Abstract


Incompatible Encoding Standard

Characters encoded in one standard may be displayed differently in another standard.

We should always use the same character encoding standard, UTF-8 is recommended. And if we see characters that are not displayed correctly, we shouldn’t save the file, because saving will overwrite the file with placeholders for those wrongly displayed characters.

UTF-8


  • UTF-8 allows variable-length encoding, this brings a great amount of space saving to store different characters. As shown above, we use 1 Byte to store a, 4 bytes to store 😊, and 3 bytes to store . Without variable-length encoding, we need to use 4 bytes to store each character. Variable-length encoding is achieved with the code point

UTF-8 in Go

  • String in Go is encoded with UTF-8 and is treated as an Array of Byte. This explains why the index is off and len(myString) returns , instead of

Abstract away this weird behavior

We can cast the string to an array of rune to have an intuitive interface to the string in Go as shown below. But rune as you can see below is int32, this approach comes with some space sacrifices.

References