Charset: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
https://mcilloni.ovh/2023/07/23/unicode-is-hard/ | https://mcilloni.ovh/2023/07/23/unicode-is-hard/ | ||
Teleprinter / | Teleprinter / Telegraf | ||
Baudot Encoding 5-bit, ITA-1 | |||
https://cryptii.com/pipes/baudot | https://cryptii.com/pipes/baudot | ||
[[File:Baudot.png|thumb]] | [[File:Baudot.png|thumb]] | ||
ASCII 7-bit C0 set of Control Characters / G0 set of Graphic Characters | |||
International Encoding | International Encoding | ||
Line 12: | Line 16: | ||
ITU-T T50 IA5 String | ITU-T T50 IA5 String | ||
https://www.itu.int/rec/T-REC-T.50 | https://www.itu.int/rec/T-REC-T.50[[File:7-bit ASCII.png|thumb|7-bit ASCII]] | ||
[[File:7-bit ASCII.png|thumb|7-bit ASCII]] | |||
=== Codepages === | === Codepages === | ||
Line 36: | Line 37: | ||
https://www.fileformat.info/info/unicode/char/20ac/index.htm | https://www.fileformat.info/info/unicode/char/20ac/index.htm | ||
UTF-8 is 8 bit | UTF-8 is 8 bit |
Latest revision as of 08:38, 1 September 2024
Unicode isn't hard if you know the history and where it comes from
https://mcilloni.ovh/2023/07/23/unicode-is-hard/
Teleprinter / Telegraf
Baudot Encoding 5-bit, ITA-1
https://cryptii.com/pipes/baudot
ASCII 7-bit C0 set of Control Characters / G0 set of Graphic Characters
International Encoding
ITU-T T50 IA5 String
https://www.itu.int/rec/T-REC-T.50
Codepages
8 bit, lower half 0000000-011111111 is compatible with ASCII
IBM / Windows CodePages: Windows-1252 (defines the C1)
ISO-8859-1 Latin-1 (leaves C1 empty, upper half contains regionally significant characters)
ISO-8895-2 Latin-2 upper half is mostly for slavic languages
https://www.charset.org/charsets/iso-8859-1
Unicode
upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4
EURO SIGN: U+20AC
https://www.fileformat.info/info/unicode/char/20ac/index.htm
UTF-8 is 8 bit
0bbbbbbb => us-ascii
10bbbbbb => next byte is also part of the character (character is up to 6 bytes)
Byte order BOM FFFE
HTML Escaping
Java internal UCS-2