ISO/IEC 8859

From Knowino
Jump to: navigation, search

ISO/IEC 8859 is a collection of fifteen different 8-bit character encodings. By definition, an 8-bit character encoding assigns a unique number between 0 and 255 to a character. The first ISO/IEC 8895 encodings were designed in the mid-1980s by the European Computer Manufacturer's Association (ECMA)[1] and endorsed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC).

The ISO/IEC 8895 collection consists of numbered parts: ISO/IEC 8859-1 through ISO/IEC 8859-16. They are to be used by languages that use different letters, for example, part 6 covers most of the Arabic language characters; see Table 1 for an overview. The part ISO/IEC 8859-12 was destined for Latin/Devanagari but was prematurely abandoned.

Often the ASCII codes (codes 0 through 127) are seen as part of ISO/IEC 8859. The first 32 ASCII codes are control characters, these form control character set 0, referred to as C0. The characters from 128 through 159 (hexadecimal: 0x80 – 0x9F) constitute control set C1 of ISO 8859. The Windows Latin character set (Windows code page 1252) uses many of the positions in control set C1 for printable characters. Thus, the Windows encoding from 128 through 159 is completely different from the Latin-1 (ISO/IEC 8859-1) encoding. However, the Windows code page 1252 is identical to Latin-1 from character 160 (non-breaking space) through 255 (ÿ).[2] The extended ASCII set used by DOS, on the other hand, is completely different between 128 and 255, but coincides again with ASCII for the characters below 128.

The ISO and IEC are also responsible for ISO 10646 (UCS, Universal Character Set), a much more ambitious and elaborate character encoding than ISO/IEC 8859. UCS is kept synchronized with Unicode of the Unicode Consortium. Latin-1 (ISO/IEC 8859-1) has been adopted as the first code pages of ISO 10646 and Unicode.[3]

On the World-Wide-Web, a near-exponential increase in usage of Unicode UTF-8 is observed.[4] ISO/IEC 8859-1 is in 2011 still important, but is on the decline on the Web.[5]

Table 1. Parts of ISO/IEC 8859
Part 1 Latin-1
Western European
Covers most Western European languages. Further: Albanian, Indonesian, Afrikaans, and Swahili. The missing euro sign and capital Ÿ are in the revised version ISO/IEC 8859-15 (position 164 and 190). The IANA character set ISO-8859-1 is the default encoding for documents received via HTTP when the document's media type is "text" (as in "text/html").
Part 2 Latin-2
Central European
Supports those Central and Eastern European languages that use the Latin alphabet.
Part 3 Latin-3
South European
Turkish, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish and Unicode for Esperanto.
Part 4 Latin-4
North European
Estonian, Latvian, Lithuanian, Greenlandic, and Sami.
Part 5 Latin/Cyrillic Covers mostly Slavic languages that use a Cyrillic alphabet.
Part 6 Latin/Arabic Covers the most common Arabic language characters. Does not support other languages using the Arabic script.
Part 7 Latin/Greek Covers the modern Greek language. Can also be used for Ancient Greek written without accents.
Part 8 Latin/Hebrew Covers the modern Hebrew alphabet as used in Israel.
Part 9 Latin-5
Turkish
Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones.
Part 10 Latin-6
Nordic
a rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages prefer Latin-4.
Part 11 Latin/Thai Contains characters needed for the Thai language. Virtually identical to TIS 620.
Part 13 Latin-7
Baltic Rim
Added some characters for Baltic languages which were missing from Latin-4 and Latin-6.
Part 14 Latin-8
Celtic
Covers Celtic languages such as Gaelic and the Breton language.
Part 15 Latin-9 A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign and the letters Š, š, Ž, ž, Œ, œ, and Ÿ. .
Part 16 Latin-10
South-Eastern European
Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovene.

Table 2 lists all the characters in the different parts. The columns are organized such that it is relatively easy to switch between character sets. For example, the German umlauts ë, ä, ö, and ü and scharfes S ß are found at exactly the same positions in Latin-1, Latin-2, Latin-3, Latin-4, Latin-5 (column 9), and Latin-6 (column 10). Thus one can write German/Polish with Latin-2 or German/Turkish with Latin-5.

The HTML version of table 2 is prepared in Unicode UTF-8. Two examples: the Latin-3 character H with stroke (column 3, row 161, U+0126) is given by Ħ → Ħ.[6] The Thai digit 8 (column 11, row 248, U+0E58) is given by ๘ → ๘. [7]

Table 2. Comparison of the various parts of ISO/IEC 8859
Dec Hex Binary 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
160 A0 1010 0000 Non-breaking space (NBSP)
161 A1 1010 0001 ¡ Ą Ħ Ą Ё     ¡ Ą ¡ Ą
162 A2 1010 0010 ¢ ˘ ĸ Ђ   ¢ ¢ Ē ¢ ¢ ą
163 A3 1010 0011 £ Ł £ Ŗ Ѓ   £ Ģ £ Ł
164 A4 1010 0100 ¤ Є ¤ ¤ Ī ¤ Ċ
165 A5 1010 0101 ¥ Ľ   Ĩ Ѕ   ¥ Ĩ ċ ¥
166 A6 1010 0110 ¦ Ś Ĥ Ļ І   ¦ Ķ ¦ Š
167 A7 1010 0111 § Ї   § §
168 A8 1010 1000 ¨ Ј   ¨ Ļ Ø š
169 A9 1010 1001 © Š İ Š Љ   © Đ ©
170 AA 1010 1010 ª Ş Ē Њ   ͺ × ª Š Ŗ ª Ș
171 AB 1010 1011 «  Ť Ğ Ģ Ћ   «  Ŧ «  «
172 AC 1010 1100 ¬ Ź Ĵ Ŧ Ќ ، ¬ Ž ¬ ¬ Ź
173 AD 1010 1101 soft hyphen (SHY) SHY
174 AE 1010 1110 ® Ž   Ž Ў     ® Ū ® ź
175 AF 1010 1111 ¯ Ż ¯ Џ   ¯ Ŋ Æ Ÿ ¯ Ż
176 B0 1011 0000 ° А   ° ° °
177 B1 1011 0001 ± ą ħ ą Б   ± ą ± ±
178 B2 1011 0010 ² ˛ ² ˛ В   ² ē ² Ġ ² Č
179 B3 1011 0011 ³ ł ³ ŗ Г   ³ ģ ³ ġ ³ ł
180 B4 1011 0100 ´ Д   ΄ ´ ī Ž
181 B5 1011 0101 µ ľ µ ĩ Е   ΅ µ ĩ µ µ
182 B6 1011 0110 ś ĥ ļ Ж   Ά ķ
183 B7 1011 0111 · ˇ · ˇ З   · · ·
184 B8 1011 1000 ¸ И   Έ ¸ ļ ø ž
185 B9 1011 1001 ¹ š ı š Й   Ή ¹ đ ¹ ¹ č
186 BA 1011 1010 º ş ē К   Ί ÷ º š ŗ º ș
187 BB 1011 1011  » ť ğ ģ Л ؛  » ŧ  »  »
188 BC 1011 1100 ¼ ź ĵ ŧ М   Ό ¼ ž ¼ Œ
189 BD 1011 1101 ½ ˝ ½ Ŋ Н   ½ ½ œ
190 BE 1011 1110 ¾ ž   ž О   Ύ ¾ ū ¾ Ÿ
191 BF 1011 1111 ¿ ż ŋ П ؟ Ώ   ¿ ŋ æ ¿ ż
192 C0 1100 0000 À Ŕ À Ā Р   ΐ   À Ā Ą À
193 C1 1100 0001 Á С ء Α   Á Į Á
194 C2 1100 0010 Â Т آ Β   Â Ā Â
195 C3 1100 0011 Ã Ă   Ã У أ Γ   Ã Ć Ã Ă
196 C4 1100 0100 Ä Ф ؤ Δ   Ä Ä
197 C5 1100 0101 Å Ĺ Ċ Å Х إ Ε   Å Å Ć
198 C6 1100 0110 Æ Ć Ĉ Æ Ц ئ Ζ   Æ Ę Æ
199 C7 1100 0111 Ç Į Ч ا Η   Ç Į Ē Ç
200 C8 1100 1000 È Č È Č Ш ب Θ   È Č Č È
201 C9 1100 1001 É Щ ة Ι   É É
202 CA 1100 1010 Ê Ę Ê Ę Ъ ت Κ   Ê Ę Ź Ê
203 CB 1100 1011 Ë Ы ث Λ   Ë Ė Ë
204 CC 1100 1100 Ì Ě Ì Ė Ь ج Μ   Ì Ė Ģ Ì
205 CD 1100 1101 Í Э ح Ν   Í Ķ Í
206 CE 1100 1110 Î Ю خ Ξ   Î Ī Î
207 CF 1100 1111 Ï Ď Ï Ī Я د Ο   Ï Ļ Ï
208 D0 1101 0000 Ð Đ   Đ а ذ Π   Ğ Ð Š Ŵ Ð
209 D1 1101 0001 Ñ Ń Ñ Ņ б ر Ρ   Ñ Ņ Ń Ñ Ń
210 D2 1101 0010 Ò Ň Ò Ō в ز     Ò Ō Ņ Ò
211 D3 1101 0011 Ó Ķ г س Σ   Ó Ó
212 D4 1101 0100 Ô д ش Τ   Ô Ō Ô
213 D5 1101 0101 Õ Ő Ġ Õ е ص Υ   Õ Ő
214 D6 1101 0110 Ö ж ض Φ   Ö Ö
215 D7 1101 0111 × з ط Χ   × Ũ × × Ś
216 D8 1101 1000 Ø Ř Ĝ Ø и ظ Ψ   Ø Ų Ø Ű
217 D9 1101 1001 Ù Ů Ù Ų й ع Ω   Ù Ų Ł Ù
218 DA 1101 1010 Ú к غ Ϊ   Ú Ś Ú
219 DB 1101 1011 Û Ű Û л   Ϋ   Û   Ū Û
220 DC 1101 1100 Ü м   ά   Ü   Ü
221 DD 1101 1101 Ý Ŭ Ũ н   έ   İ Ý   Ż Ý Ę
222 DE 1101 1110 Þ Ţ Ŝ Ū о   ή   Ş Þ   Ž Ŷ Þ Ț
223 DF 1101 1111 ß п   ί ß ฿ ß
224 E0 1110 0000 à ŕ à ā р ـ ΰ א à ā ą à
225 E1 1110 0001 á с ف α ב á į á
226 E2 1110 0010 â т ق β ג â ā â
227 E3 1110 0011 ã ă   ã у ك γ ד ã ć ã ă
228 E4 1110 0100 ä ф ل δ ה ä ä
229 E5 1110 0101 å ĺ ċ å х م ε ו å å ć
230 E6 1110 0110 æ ć ĉ æ ц ن ζ ז æ ę æ
231 E7 1110 0111 ç į ч ه η ח ç į ē ç
232 E8 1110 1000 è č è č ш و θ ט è č č è
233 E9 1110 1001 é щ ى ι י é é
234 EA 1110 1010 ê ę ê ę ъ ي κ ך ê ę ź ê
235 EB 1110 1011 ë ы ً λ כ ë ė ë
236 EC 1110 1100 ì ě ì ė ь ٌ μ ל ì ė ģ ì
237 ED 1110 1101 í э ٍ ν ם í ķ í
238 EE 1110 1110 î ю َ ξ מ î ī î
239 EF 1110 1111 ï ď ï ī я ُ ο ן ï ļ ï
240 F0 1111 0000 ð đ   đ ِ π נ ğ ð š ŵ ð đ
241 F1 1111 0001 ñ ń ñ ņ ё ّ ρ ס ñ ņ ń ñ ń
242 F2 1111 0010 ò ň ò ō ђ ْ ς ע ò ō ņ ò
243 F3 1111 0011 ó ķ ѓ   σ ף ó ó
244 F4 1111 0100 ô є   τ פ ô ō ô
245 F5 1111 0101 õ ő ġ õ ѕ   υ ץ õ ő
246 F6 1111 0110 ö і   φ צ ö ö
247 F7 1111 0111 ÷ ї   χ ק ÷ ũ ÷ ÷ ś
248 F8 1111 1000 ø ř ĝ ø ј   ψ ר ø ų ø ű
249 F9 1111 1001 ù ů ù ų љ   ω ש ù ų ł ù
250 FA 1111 1010 ú њ   ϊ ת ú ś ú
251 FB 1111 1011 û ű û ћ   ϋ   û ū û
252 FC 1111 1100 ü ќ   ό   ü   ü
253 FD 1111 1101 ý ŭ ũ §   ύ LRM ı ý   ż ý ę
254 FE 1111 1110 þ ţ ŝ ū ў   ώ RLM ş þ   ž ŷ þ ț
255 FF 1111 1111 ÿ ˙ џ       ÿ ĸ   ÿ

[edit] References

  1. March 1985
  2. Although the Windows Western character set is often called "ANSI character set" (code page 1252), it has not been approved by the American National Standards Institute.
  3. Code chart U0000.pdf Latin (ASCII) and Code chart U0080.pdf Latin-1 Supplement
  4. Google blog 1/28/2010
  5. Trends July 2011
  6. Code chart U0100.pdf Latin Extended-A
  7. Code chart U0E00.pdf Thai


Some content on this page previously appeared on Wikipedia.


Personal tools
Variants
Actions
Navigation
Community
Toolbox