Charset: Difference between revisions

From Braindump
Jump to navigation Jump to search
No edit summary
No edit summary
Line 3: Line 3:
https://mcilloni.ovh/2023/07/23/unicode-is-hard/
https://mcilloni.ovh/2023/07/23/unicode-is-hard/


Baudot Encoding 5-bit
 
Baudot Encoding 5-bit, ITA-1
 
https://cryptii.com/pipes/baudot


International Encoding
International Encoding


IA5
ITU-T T50 IA5 String
 
https://www.itu.int/rec/T-REC-T.50


ASCII 7-bit Control blocks graphics
ASCII 7-bit C0 set of Control Characters /  G0 set of Graphic Characters
[[File:7-bit ASCII.png|thumb|7-bit ASCII]]
[[File:7-bit ASCII.png|thumb|7-bit ASCII]]


=== Codepages ===
IBM CP
Windows
ISO-8859-1 Latin-1


IBM CP Windows Latin-1 8-bit, upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4  
https://www.charset.org/charsets/iso-8859-1  
=== Unicode ===
upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4


UTF 8 bit
UTF 8 bit

Revision as of 06:31, 31 May 2024

Unicode isn't hard if you know the history and where it comes from

https://mcilloni.ovh/2023/07/23/unicode-is-hard/


Baudot Encoding 5-bit, ITA-1

https://cryptii.com/pipes/baudot

International Encoding

ITU-T T50 IA5 String

https://www.itu.int/rec/T-REC-T.50

ASCII 7-bit C0 set of Control Characters / G0 set of Graphic Characters

7-bit ASCII

Codepages

IBM CP

Windows

ISO-8859-1 Latin-1

https://www.charset.org/charsets/iso-8859-1

Unicode

upper codepoints used for Unicode 2-byte UCS-2 or 4-byte UCS-4

UTF 8 bit

0bbbbbbb => us-ascii

10bbbbbb => next byte is also part of the character (character is up to 6 bytes)

Byte order BOM FFFE

Java internal UCS-2