Charset: Difference between revisions

Revision as of 06:09, 31 May 2024

Unicode isn't hard if you know the history and where it comes from

Baudot Encoding 5-bit

International Encoding

IA5

ASCII 7-bit Control blocks graphics

7-bit ASCII

IBM CP Windows Latin-1 8-bit, upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4

UTF 8 bit

0bbbbbbb => us-ascii

10bbbbbb => next byte is also part of the character (character is up to 6 bytes)

Byte order BOM FFFE

Java internal UCS-2