ISO/IEC 8859
ISO/IEC 8859 is a collection of fifteen different 8-bit character encodings. By definition, an 8-bit character encoding assigns a unique number between 0 and 255 to a character. The first ISO/IEC 8895 encodings were designed in the mid-1980s by the European Computer Manufacturer's Association (ECMA)[1] and endorsed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC).
The ISO/IEC 8895 collection consists of numbered parts: ISO/IEC 8859-1 through ISO/IEC 8859-16. They are to be used by languages that use different letters, for example, part 6 covers most of the Arabic language characters; see Table 1 for an overview. The part ISO/IEC 8859-12 was destined for Latin/Devanagari but was prematurely abandoned.
Often the ASCII codes (codes 0 through 127) are seen as part of ISO/IEC 8859. The first 32 ASCII codes are control characters, these form control character set 0, referred to as C0. The characters from 128 through 159 (hexadecimal: 0x80 – 0x9F) constitute control set C1 of ISO 8859. The Windows Latin character set (Windows code page 1252) uses many of the positions in control set C1 for printable characters. Thus, the Windows encoding from 128 through 159 is completely different from the Latin-1 (ISO/IEC 8859-1) encoding. However, the Windows code page 1252 is identical to Latin-1 from character 160 (non-breaking space) through 255 (ÿ).[2] The extended ASCII set used by DOS, on the other hand, is completely different between 128 and 255, but coincides again with ASCII for the characters below 128.
The ISO and IEC are also responsible for ISO 10646 (UCS, Universal Character Set), a much more ambitious and elaborate character encoding than ISO/IEC 8859. UCS is kept synchronized with Unicode of the Unicode Consortium. Latin-1 (ISO/IEC 8859-1) has been adopted as the first code pages of ISO 10646 and Unicode.[3]
On the World-Wide-Web, a near-exponential increase in usage of Unicode UTF-8 is observed.[4] ISO/IEC 8859-1 is in 2011 still important, but is on the decline on the Web.[5]
Part 1 | Latin-1 Western European |
Covers most Western European languages. Further: Albanian, Indonesian, Afrikaans, and Swahili. The missing euro sign and capital Ÿ are in the revised version ISO/IEC 8859-15 (position 164 and 190). The IANA character set ISO-8859-1 is the default encoding for documents received via HTTP when the document's media type is "text" (as in "text/html"). |
---|---|---|
Part 2 | Latin-2 Central European |
Supports those Central and Eastern European languages that use the Latin alphabet. |
Part 3 | Latin-3 South European |
Turkish, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish and Unicode for Esperanto. |
Part 4 | Latin-4 North European |
Estonian, Latvian, Lithuanian, Greenlandic, and Sami. |
Part 5 | Latin/Cyrillic | Covers mostly Slavic languages that use a Cyrillic alphabet. |
Part 6 | Latin/Arabic | Covers the most common Arabic language characters. Does not support other languages using the Arabic script. |
Part 7 | Latin/Greek | Covers the modern Greek language. Can also be used for Ancient Greek written without accents. |
Part 8 | Latin/Hebrew | Covers the modern Hebrew alphabet as used in Israel. |
Part 9 | Latin-5 Turkish |
Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones. |
Part 10 | Latin-6 Nordic |
a rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages prefer Latin-4. |
Part 11 | Latin/Thai | Contains characters needed for the Thai language. Virtually identical to TIS 620. |
Part 13 | Latin-7 Baltic Rim |
Added some characters for Baltic languages which were missing from Latin-4 and Latin-6. |
Part 14 | Latin-8 Celtic |
Covers Celtic languages such as Gaelic and the Breton language. |
Part 15 | Latin-9 | A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ. . |
Part 16 | Latin-10 South-Eastern European |
Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovene. |
Table 2 lists all the characters in the different parts. The columns are organized such that it is relatively easy to switch between character sets. For example, the German umlauts ë, ä, ö, and ü and scharfes S ß are found at exactly the same positions in Latin-1, Latin-2, Latin-3, Latin-4, Latin-5 (column 9), and Latin-6 (column 10). Thus one can write German/Polish with Latin-2 or German/Turkish with Latin-5.
The HTML version of table 2 is prepared in Unicode UTF-8. Two examples: the Latin-3 character H with stroke (column 3, row 161, U+0126) is given by Ħ → Ħ.[6] The Thai digit 8 (column 11, row 248, U+0E58) is given by ๘ → ๘. [7]
Dec | Hex | Binary | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 13 | 14 | 15 | 16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
160 | A0 | 1010 0000 | Non-breaking space (NBSP) | |||||||||||||||
161 | A1 | 1010 0001 | ¡ | Ą | Ħ | Ą | Ё | ‘ | ¡ | Ą | ก | ” | Ḃ | ¡ | Ą | |||
162 | A2 | 1010 0010 | ¢ | ˘ | ĸ | Ђ | ’ | ¢ | ¢ | Ē | ข | ¢ | ḃ | ¢ | ą | |||
163 | A3 | 1010 0011 | £ | Ł | £ | Ŗ | Ѓ | £ | Ģ | ฃ | £ | Ł | ||||||
164 | A4 | 1010 0100 | ¤ | Є | ¤ | € | ¤ | Ī | ค | ¤ | Ċ | € | ||||||
165 | A5 | 1010 0101 | ¥ | Ľ | Ĩ | Ѕ | ₯ | ¥ | Ĩ | ฅ | „ | ċ | ¥ | „ | ||||
166 | A6 | 1010 0110 | ¦ | Ś | Ĥ | Ļ | І | ¦ | Ķ | ฆ | ¦ | Ḋ | Š | |||||
167 | A7 | 1010 0111 | § | Ї | § | ง | § | |||||||||||
168 | A8 | 1010 1000 | ¨ | Ј | ¨ | Ļ | จ | Ø | Ẁ | š | ||||||||
169 | A9 | 1010 1001 | © | Š | İ | Š | Љ | © | Đ | ฉ | © | |||||||
170 | AA | 1010 1010 | ª | Ş | Ē | Њ | ͺ | × | ª | Š | ช | Ŗ | Ẃ | ª | Ș | |||
171 | AB | 1010 1011 | « | Ť | Ğ | Ģ | Ћ | « | Ŧ | ซ | « | ḋ | « | |||||
172 | AC | 1010 1100 | ¬ | Ź | Ĵ | Ŧ | Ќ | ، | ¬ | Ž | ฌ | ¬ | Ỳ | ¬ | Ź | |||
173 | AD | 1010 1101 | soft hyphen (SHY) | ญ | SHY | |||||||||||||
174 | AE | 1010 1110 | ® | Ž | Ž | Ў | ® | Ū | ฎ | ® | ź | |||||||
175 | AF | 1010 1111 | ¯ | Ż | ¯ | Џ | ― | ¯ | Ŋ | ฏ | Æ | Ÿ | ¯ | Ż | ||||
176 | B0 | 1011 0000 | ° | А | ° | ฐ | ° | Ḟ | ° | |||||||||
177 | B1 | 1011 0001 | ± | ą | ħ | ą | Б | ± | ą | ฑ | ± | ḟ | ± | |||||
178 | B2 | 1011 0010 | ² | ˛ | ² | ˛ | В | ² | ē | ฒ | ² | Ġ | ² | Č | ||||
179 | B3 | 1011 0011 | ³ | ł | ³ | ŗ | Г | ³ | ģ | ณ | ³ | ġ | ³ | ł | ||||
180 | B4 | 1011 0100 | ´ | Д | ΄ | ´ | ī | ด | “ | Ṁ | Ž | |||||||
181 | B5 | 1011 0101 | µ | ľ | µ | ĩ | Е | ΅ | µ | ĩ | ต | µ | ṁ | µ | ” | |||
182 | B6 | 1011 0110 | ¶ | ś | ĥ | ļ | Ж | Ά | ¶ | ķ | ถ | ¶ | ||||||
183 | B7 | 1011 0111 | · | ˇ | · | ˇ | З | · | ท | · | Ṗ | · | ||||||
184 | B8 | 1011 1000 | ¸ | И | Έ | ¸ | ļ | ธ | ø | ẁ | ž | |||||||
185 | B9 | 1011 1001 | ¹ | š | ı | š | Й | Ή | ¹ | đ | น | ¹ | ṗ | ¹ | č | |||
186 | BA | 1011 1010 | º | ş | ē | К | Ί | ÷ | º | š | บ | ŗ | ẃ | º | ș | |||
187 | BB | 1011 1011 | » | ť | ğ | ģ | Л | ؛ | » | ŧ | ป | » | Ṡ | » | ||||
188 | BC | 1011 1100 | ¼ | ź | ĵ | ŧ | М | Ό | ¼ | ž | ผ | ¼ | ỳ | Œ | ||||
189 | BD | 1011 1101 | ½ | ˝ | ½ | Ŋ | Н | ½ | ― | ฝ | ½ | Ẅ | œ | |||||
190 | BE | 1011 1110 | ¾ | ž | ž | О | Ύ | ¾ | ū | พ | ¾ | ẅ | Ÿ | |||||
191 | BF | 1011 1111 | ¿ | ż | ŋ | П | ؟ | Ώ | ¿ | ŋ | ฟ | æ | ṡ | ¿ | ż | |||
192 | C0 | 1100 0000 | À | Ŕ | À | Ā | Р | ΐ | À | Ā | ภ | Ą | À | |||||
193 | C1 | 1100 0001 | Á | С | ء | Α | Á | ม | Į | Á | ||||||||
194 | C2 | 1100 0010 | Â | Т | آ | Β | Â | ย | Ā | Â | ||||||||
195 | C3 | 1100 0011 | Ã | Ă | Ã | У | أ | Γ | Ã | ร | Ć | Ã | Ă | |||||
196 | C4 | 1100 0100 | Ä | Ф | ؤ | Δ | Ä | ฤ | Ä | |||||||||
197 | C5 | 1100 0101 | Å | Ĺ | Ċ | Å | Х | إ | Ε | Å | ล | Å | Ć | |||||
198 | C6 | 1100 0110 | Æ | Ć | Ĉ | Æ | Ц | ئ | Ζ | Æ | ฦ | Ę | Æ | |||||
199 | C7 | 1100 0111 | Ç | Į | Ч | ا | Η | Ç | Į | ว | Ē | Ç | ||||||
200 | C8 | 1100 1000 | È | Č | È | Č | Ш | ب | Θ | È | Č | ศ | Č | È | ||||
201 | C9 | 1100 1001 | É | Щ | ة | Ι | É | ษ | É | |||||||||
202 | CA | 1100 1010 | Ê | Ę | Ê | Ę | Ъ | ت | Κ | Ê | Ę | ส | Ź | Ê | ||||
203 | CB | 1100 1011 | Ë | Ы | ث | Λ | Ë | ห | Ė | Ë | ||||||||
204 | CC | 1100 1100 | Ì | Ě | Ì | Ė | Ь | ج | Μ | Ì | Ė | ฬ | Ģ | Ì | ||||
205 | CD | 1100 1101 | Í | Э | ح | Ν | Í | อ | Ķ | Í | ||||||||
206 | CE | 1100 1110 | Î | Ю | خ | Ξ | Î | ฮ | Ī | Î | ||||||||
207 | CF | 1100 1111 | Ï | Ď | Ï | Ī | Я | د | Ο | Ï | ฯ | Ļ | Ï | |||||
208 | D0 | 1101 0000 | Ð | Đ | Đ | а | ذ | Π | Ğ | Ð | ะ | Š | Ŵ | Ð | ||||
209 | D1 | 1101 0001 | Ñ | Ń | Ñ | Ņ | б | ر | Ρ | Ñ | Ņ | ั | Ń | Ñ | Ń | |||
210 | D2 | 1101 0010 | Ò | Ň | Ò | Ō | в | ز | Ò | Ō | า | Ņ | Ò | |||||
211 | D3 | 1101 0011 | Ó | Ķ | г | س | Σ | Ó | ำ | Ó | ||||||||
212 | D4 | 1101 0100 | Ô | д | ش | Τ | Ô | ิ | Ō | Ô | ||||||||
213 | D5 | 1101 0101 | Õ | Ő | Ġ | Õ | е | ص | Υ | Õ | ี | Ő | ||||||
214 | D6 | 1101 0110 | Ö | ж | ض | Φ | Ö | ึ | Ö | |||||||||
215 | D7 | 1101 0111 | × | з | ط | Χ | × | Ũ | ื | × | Ṫ | × | Ś | |||||
216 | D8 | 1101 1000 | Ø | Ř | Ĝ | Ø | и | ظ | Ψ | Ø | ุ | Ų | Ø | Ű | ||||
217 | D9 | 1101 1001 | Ù | Ů | Ù | Ų | й | ع | Ω | Ù | Ų | ู | Ł | Ù | ||||
218 | DA | 1101 1010 | Ú | к | غ | Ϊ | Ú | ฺ | Ś | Ú | ||||||||
219 | DB | 1101 1011 | Û | Ű | Û | л | Ϋ | Û | Ū | Û | ||||||||
220 | DC | 1101 1100 | Ü | м | ά | Ü | Ü | |||||||||||
221 | DD | 1101 1101 | Ý | Ŭ | Ũ | н | έ | İ | Ý | Ż | Ý | Ę | ||||||
222 | DE | 1101 1110 | Þ | Ţ | Ŝ | Ū | о | ή | Ş | Þ | Ž | Ŷ | Þ | Ț | ||||
223 | DF | 1101 1111 | ß | п | ί | ‗ | ß | ฿ | ß | |||||||||
224 | E0 | 1110 0000 | à | ŕ | à | ā | р | ـ | ΰ | א | à | ā | เ | ą | à | |||
225 | E1 | 1110 0001 | á | с | ف | α | ב | á | แ | į | á | |||||||
226 | E2 | 1110 0010 | â | т | ق | β | ג | â | โ | ā | â | |||||||
227 | E3 | 1110 0011 | ã | ă | ã | у | ك | γ | ד | ã | ใ | ć | ã | ă | ||||
228 | E4 | 1110 0100 | ä | ф | ل | δ | ה | ä | ไ | ä | ||||||||
229 | E5 | 1110 0101 | å | ĺ | ċ | å | х | م | ε | ו | å | ๅ | å | ć | ||||
230 | E6 | 1110 0110 | æ | ć | ĉ | æ | ц | ن | ζ | ז | æ | ๆ | ę | æ | ||||
231 | E7 | 1110 0111 | ç | į | ч | ه | η | ח | ç | į | ็ | ē | ç | |||||
232 | E8 | 1110 1000 | è | č | è | č | ш | و | θ | ט | è | č | ่ | č | è | |||
233 | E9 | 1110 1001 | é | щ | ى | ι | י | é | ้ | é | ||||||||
234 | EA | 1110 1010 | ê | ę | ê | ę | ъ | ي | κ | ך | ê | ę | ๊ | ź | ê | |||
235 | EB | 1110 1011 | ë | ы | ً | λ | כ | ë | ๋ | ė | ë | |||||||
236 | EC | 1110 1100 | ì | ě | ì | ė | ь | ٌ | μ | ל | ì | ė | ์ | ģ | ì | |||
237 | ED | 1110 1101 | í | э | ٍ | ν | ם | í | ํ | ķ | í | |||||||
238 | EE | 1110 1110 | î | ю | َ | ξ | מ | î | ๎ | ī | î | |||||||
239 | EF | 1110 1111 | ï | ď | ï | ī | я | ُ | ο | ן | ï | ๏ | ļ | ï | ||||
240 | F0 | 1111 0000 | ð | đ | đ | № | ِ | π | נ | ğ | ð | ๐ | š | ŵ | ð | đ | ||
241 | F1 | 1111 0001 | ñ | ń | ñ | ņ | ё | ّ | ρ | ס | ñ | ņ | ๑ | ń | ñ | ń | ||
242 | F2 | 1111 0010 | ò | ň | ò | ō | ђ | ْ | ς | ע | ò | ō | ๒ | ņ | ò | |||
243 | F3 | 1111 0011 | ó | ķ | ѓ | σ | ף | ó | ๓ | ó | ||||||||
244 | F4 | 1111 0100 | ô | є | τ | פ | ô | ๔ | ō | ô | ||||||||
245 | F5 | 1111 0101 | õ | ő | ġ | õ | ѕ | υ | ץ | õ | ๕ | ő | ||||||
246 | F6 | 1111 0110 | ö | і | φ | צ | ö | ๖ | ö | |||||||||
247 | F7 | 1111 0111 | ÷ | ї | χ | ק | ÷ | ũ | ๗ | ÷ | ṫ | ÷ | ś | |||||
248 | F8 | 1111 1000 | ø | ř | ĝ | ø | ј | ψ | ר | ø | ๘ | ų | ø | ű | ||||
249 | F9 | 1111 1001 | ù | ů | ù | ų | љ | ω | ש | ù | ų | ๙ | ł | ù | ||||
250 | FA | 1111 1010 | ú | њ | ϊ | ת | ú | ๚ | ś | ú | ||||||||
251 | FB | 1111 1011 | û | ű | û | ћ | ϋ | û | ๛ | ū | û | |||||||
252 | FC | 1111 1100 | ü | ќ | ό | ü | ü | |||||||||||
253 | FD | 1111 1101 | ý | ŭ | ũ | § | ύ | LRM | ı | ý | ż | ý | ę | |||||
254 | FE | 1111 1110 | þ | ţ | ŝ | ū | ў | ώ | RLM | ş | þ | ž | ŷ | þ | ț | |||
255 | FF | 1111 1111 | ÿ | ˙ | џ | ÿ | ĸ | ’ | ÿ |
- Row 160 gives the non-breaking space (HTML: ) and row 173 gives, except for column 11 (Thai), the soft hyphen (HTML: ­) that only shows at line breaks. Other empty fields are unassigned.
- LRM stands for left-to-right mark (U+200E) and RLM stands for right-to-left mark (U+200F).
[edit] References
- ↑ March 1985
- ↑ Although the Windows Western character set is often called "ANSI character set" (code page 1252), it has not been approved by the American National Standards Institute.
- ↑ Code chart U0000.pdf Latin (ASCII) and Code chart U0080.pdf Latin-1 Supplement
- ↑ Google blog 1/28/2010
- ↑ Trends July 2011
- ↑ Code chart U0100.pdf Latin Extended-A
- ↑ Code chart U0E00.pdf Thai
Some content on this page previously appeared on Wikipedia.