Charset: Difference between revisions
Jump to navigation
Jump to search
Line 31: | Line 31: | ||
upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4 | upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4 | ||
[[File:Unicode BMP.png|thumb]] | [[File:Unicode BMP.png|thumb]] | ||
EURO SIGN: U+20AC | |||
https://www.fileformat.info/info/unicode/char/20ac/index.htm | |||
UTF 8 bit | UTF 8 bit | ||
Line 39: | Line 44: | ||
Byte order BOM FFFE | Byte order BOM FFFE | ||
HTML Escaping | |||
Java internal UCS-2 | Java internal UCS-2 |
Revision as of 06:42, 31 May 2024
Unicode isn't hard if you know the history and where it comes from
https://mcilloni.ovh/2023/07/23/unicode-is-hard/
Teleprinter / TelegrafBaudot Encoding 5-bit, ITA-1
https://cryptii.com/pipes/baudot
International Encoding
ITU-T T50 IA5 String
https://www.itu.int/rec/T-REC-T.50
ASCII 7-bit C0 set of Control Characters / G0 set of Graphic Characters
Codepages
IBM CP
Windows
ISO-8859-1 Latin-1
https://www.charset.org/charsets/iso-8859-1
Unicode
upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4
EURO SIGN: U+20AC
https://www.fileformat.info/info/unicode/char/20ac/index.htm
UTF 8 bit
0bbbbbbb => us-ascii
10bbbbbb => next byte is also part of the character (character is up to 6 bytes)
Byte order BOM FFFE
HTML Escaping
Java internal UCS-2