Charset: Difference between revisions

From Braindump
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 3: Line 3:
https://mcilloni.ovh/2023/07/23/unicode-is-hard/
https://mcilloni.ovh/2023/07/23/unicode-is-hard/


Baudot Encoding
Baudot Encoding 5-bit
5-bit


International Encoding
International Encoding
Line 10: Line 9:
IA5
IA5


ASCII
ASCII 7-bit Control blocks graphics
7-bit
[[File:7-bit ASCII.png|thumb|7-bit ASCII]]
Control blocks graphics


IBM CP
Windows
Latin-1
8-bit, upper codepoints used for


Unicode
IBM CP Windows Latin-1 8-bit, upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4  
2-byte UCS-2 or 4-byte UCS-4


UTF
UTF 8 bit
8 bit UTF-8
10...... => next byte is also used


Byte order and BOM FFFE
0bbbbbbb => us-ascii
 
10bbbbbb => next byte is also part of the character (character is up to 6 bytes)
 
Byte order BOM FFFE


Java internal UCS-2
Java internal UCS-2

Revision as of 06:09, 31 May 2024

Unicode isn't hard if you know the history and where it comes from

https://mcilloni.ovh/2023/07/23/unicode-is-hard/

Baudot Encoding 5-bit

International Encoding

IA5

ASCII 7-bit Control blocks graphics

7-bit ASCII


IBM CP Windows Latin-1 8-bit, upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4

UTF 8 bit

0bbbbbbb => us-ascii

10bbbbbb => next byte is also part of the character (character is up to 6 bytes)

Byte order BOM FFFE

Java internal UCS-2