Charset: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
https://mcilloni.ovh/2023/07/23/unicode-is-hard/ | https://mcilloni.ovh/2023/07/23/unicode-is-hard/ | ||
Baudot Encoding | Baudot Encoding 5-bit | ||
5-bit | |||
International Encoding | International Encoding | ||
Line 10: | Line 9: | ||
IA5 | IA5 | ||
ASCII | ASCII 7-bit Control blocks graphics | ||
7-bit | [[File:7-bit ASCII.png|thumb|7-bit ASCII]] | ||
Control blocks graphics | |||
Unicode | IBM CP Windows Latin-1 8-bit, upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4 | ||
2-byte UCS-2 or 4-byte UCS-4 | |||
UTF | UTF 8 bit | ||
8 bit | |||
Byte order | 0bbbbbbb => us-ascii | ||
10bbbbbb => next byte is also part of the character (character is up to 6 bytes) | |||
Byte order BOM FFFE | |||
Java internal UCS-2 | Java internal UCS-2 |
Revision as of 06:09, 31 May 2024
Unicode isn't hard if you know the history and where it comes from
https://mcilloni.ovh/2023/07/23/unicode-is-hard/
Baudot Encoding 5-bit
International Encoding
IA5
ASCII 7-bit Control blocks graphics
IBM CP Windows Latin-1 8-bit, upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4
UTF 8 bit
0bbbbbbb => us-ascii
10bbbbbb => next byte is also part of the character (character is up to 6 bytes)
Byte order BOM FFFE
Java internal UCS-2