EasyManua.ls Logo

Motorola g20 - Character Set Management

Motorola g20
352 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Product Features
16 98-08901C68-O
2.8.4 UTF-8 Character Set Management
UTF-8 provides compact, efficient Unicode encoding. The encoding distributes a Unicode code value's bit pattern across one,
two, three, or even four bytes. This encoding is a multi-byte encoding.
UTF-8 encodes ASCII in a single byte, meaning that languages using Latin-based scripts can be represented with only 1.1 bytes
per character on average.
UTF-8 is useful for legacy systems that want Unicode support because developers do not have to drastically modify text
processing code. Code that assumes single-byte code units typically does not fail completely when provided UTF-8 text instead
of ASCII or even Latin-1.
Unlike some legacy encoding, UTF-8 is easy to parse. So-called lead and trail bytes are easily distinguished. Moving forwards
or backwards in a text string is easier in UTF-8 than in many other multi-byte encoding.
The codes in the first half of the first row in Character Set Table CS2 (UTF-8 <-> ASCII) (that is, characters that are also ASCII),
are replaced in this transformation format by their ASCII codes, which are octets in the range between 00h and 7F. The other
UCS codes are transformed to between two and six octets in the range between 80h and FF. Text containing only characters in
Character Set Table CS3 (UTF-8 <-> UCS-2) is transformed to the same octet sequence, irrespective of whether it was coded
with UCS-2.
2.8.5 8859 Character Set Management
ISO-8859 is an 8 bit character set - a major improvement over the plain 7 bit US-ASCII.
Characters 0 to 127 are always identical with US-ASCII and the positions 128 to 159 hold some less used control characters.
Positions 160 to 255 hold language-specific characters. ISO 8859 comprises a full series of 10 standardized multilingual single-
byte coded (8 bit) graphic character sets for writing in alphabetic languages:
Latin 1 (West European)
Latin 2 (East European)
Latin 3 (South European)
Latin 4 (North European)
Cyrillic
•Arabic
Greek
•Hebrew
Latin 5 (Turkish)
Latin 6 (Nordic)
g20 supports Latin 1.
Latin 1 covers most West European languages, such as French (fr), Spanish (es), Catalan (ca), Basque (eu), Portuguese (pt),
Italian (it), Albanian (sq), Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), Swedish (sv), Norwegian (no), Finnish
(fi), Faroese (fo), Icelandic (is), Irish (ga), Scottish (gd) and English (en). Afrikaans (af) and Swahili (sw) are also included,
extending coverage to much of Africa.
Latin 1 has also been adopted as the first page of ISO-10646.

Table of Contents

Related product manuals