…
…
…
Unicode is a character encoding system that assigns a code to every character and symbol in the world's languages.
Unicode is the only encoding system that ensures you may get or combine data using any combination of languages because no other encoding standard covers all languages. XML, Java, JavaScript, LDAP, and other web-based technologies all require Unicode.
UTF-8, a variable length encoding method in which one represents each written symbol- to four-byte code, and UTF-16, a fixed width encoding scheme in which a two-byte code represents each written symbol, are the two most prevalent Unicode implementations for computer systems.
We now know that Unicode is an international standard that encodes every known character to a unique number. But, how do we move these unique numbers around the internet? Transmission is achieved using bytes of information.
UTF-8: Every code point is encoded using one, two, three, or four bytes in UTF-8. It is ASCII backward compatible. All English characters use only one byte, which is exceptionally efficient. If we're sending non-English characters, we'll merely need more bytes. It is the most used type of encoding, and Python 3 uses it by default. The default encoding in Python 2 is ASCII (unfortunately).
UTF-16 UTF-16 has a variable length of 2 or 4 bytes. Because most Asian text can be encoded in two bytes each, this encoding is ideal for it. It isn't very good for English since every English character requires two bytes..
UTF-32 is fixed 4 bytes. All characters are encoded in 4 bytes, so it needs a lot of memory. It is not used very often.
…