Difference Between UTF-8 and UTF-16

• Categorized under Protocols & Formats,Technology | Difference Between UTF-8 and UTF-16

UTF-8 vs UTF-16

UTF stands for Unicode Transformation Format. It is a family of standards for encoding the Unicode character set into its equivalent binary value. UTF was developed so that users have a standardized means of encoding the characters with the minimal amount of space.UTF-8 and UTF 16 are only two of the established standards for encoding. They only differ in how many bytes they use to encode each character. Since both are variable width encoding, they can use up to four bytes to encode the data but when it comes to the minimum, UTF-8 only uses 1 byte (8bits) and UTF-16 uses 2 bytes(16bits). This bears a huge impact on the resulting size of the encoded files. When using ASCII only characters, a UTF-16 encoded file would be roughly twice as big as the same file encoded with UTF-8.

The main advantage of UTF-8 is that it is backwards compatible with ASCII. The ASCII character set is fixed width and only uses one byte. When encoding a file that uses only ASCII characters with UTF-8, the resulting file would be identical to a file encoded with ASCII. This is not possible when using UTF-16 as each character would be two bytes long. Legacy software that is not Unicode aware would be unable to open the UTF-16 file even if it only had ASCII characters.

UTF-8 is byte oriented format and therefore has no problems with byte oriented networks or file. UTF-16, on the other hand, is not byte oriented and needs to establish a byte order in order to work with byte oriented networks. UTF-8 is also better in recovering from errors that corrupt portions of the file or stream as it can still decode the next uncorrupted byte. UTF-16 does the exact same thing if some bytes are corrupted but the problem lies when some bytes are lost. The lost byte can mix up the following byte combinations and the end result would be garbled.

Summary:
1. UTF-8 and UTF-16 are both used for encoding characters
2. UTF-8 uses a byte at the minimum in encoding the characters while UTF-16 uses two
3. A UTF-8 encoded file tends to be smaller than a UTF-16 encoded file
4. UTF-8 is compatible with ASCII while UTF-16 is incompatible with ASCII
5. UTF-8 is byte oriented while UTF-16 is not
6. UTF-8 is better in recovering from errors compared to UTF-16

Author
Recent Posts

Ben Joan

Search DifferenceBetween.net :

Email This Post : If you like this article or our site. Please spread the word. Share it with your friends/family.

Difference Between Unicode and UTF-8

Difference Between Unicode and ASCII

Difference Between ANSI and ASCII

Difference Between ANSI and UTF-8

Difference Between EBCDIC and ASCII

Cite
APA 7
, l. (2017, June 22). Difference Between UTF-8 and UTF-16. Difference Between Similar Terms and Objects. https://www.differencebetween.net/technology/difference-between-utf-8-and-utf-16/.
MLA 8
, lanceben. "Difference Between UTF-8 and UTF-16." Difference Between Similar Terms and Objects, 22 June, 2017, https://www.differencebetween.net/technology/difference-between-utf-8-and-utf-16/.

8 Comments

Eric
March 22, 2011 • 5:08 pm

Wow, this is really informative! I have a question. Is there any quality difference between 8 and 16? Like if it is a sound file or an image.

Reply
- Jan
  November 8, 2012 • 10:44 am
  
  As sound-files or images are binary-data they contain there own “encoding” aka a algorithm which stores the specific data.
  UTF-x is for encoding characters of alphabets. So if you open a data-file in your texteditor you see a different flavor of gibberish, depending whether you use 8 or 16 😉
  
  If one wants to see whats in such files there is a slim chance of seeing something useful with a hex-editor.
  
  Reply
jean
May 27, 2011 • 12:26 pm

Very good article. thank you very much. i was looking for a such article for a while

Reply
horst
April 7, 2013 • 7:34 am

Something I don’t understand… UTF-8 is 8 bits, meaning it can only display 255 different characters. What about all the Asian, Russian, Hebrew etc. symbols? 255 is not enough to hold them. How is this done? Do I have to use UTF-16 for these languages?

Reply
- Vishal
  May 17, 2013 • 12:15 pm
  
  UTF-8 is minimum 8 bits (not just 8 bits) and hence can handle non ASCII chars as well.
  
  Reply
Lei
April 2, 2014 • 2:57 am

Good article.
My long-time confusion is solved now 😛

Reply
ait
November 10, 2014 • 5:43 pm

Hmm. So why would anybody bother to use utf-16?

Reply

Trackbacks

Difference Between Unicode and UTF-8 | Difference Between | Unicode vs UTF-8

Leave a Response

Written by : Ben Joan. and updated on 2017, June 22

Articles on DifferenceBetween.net are general information, and are not intended to substitute for professional advice. The information is "AS IS", "WITH ALL FAULTS". User assumes all risk of use, damage, or injury. You agree that we have no liability for any damages.

See more about : ascii, character, format

Difference Between UTF-8 and UTF-16

Search DifferenceBetween.net :

8 Comments

Trackbacks

Leave a Response

More in 'Protocols & Formats'

More in 'Technology'

Get New Comparisons in your inbox:

Editor's Picks