Motorola g20 - Character Set Management

To Next Page

To Previous Page

Product Features

16 98-08901C68-O

2.8.4 UTF-8 Character Set Management

UTF-8 provides compact, efficient Unicode encoding. The encoding distributes a Unicode code value's bit pattern across one,

two, three, or even four bytes. This encoding is a multi-byte encoding.

UTF-8 encodes ASCII in a single byte, meaning that languages using Latin-based scripts can be represented with only 1.1 bytes

per character on average.

UTF-8 is useful for legacy systems that want Unicode support because developers do not have to drastically modify text

processing code. Code that assumes single-byte code units typically does not fail completely when provided UTF-8 text instead

of ASCII or even Latin-1.

Unlike some legacy encoding, UTF-8 is easy to parse. So-called lead and trail bytes are easily distinguished. Moving forwards

or backwards in a text string is easier in UTF-8 than in many other multi-byte encoding.

The codes in the first half of the first row in Character Set Table CS2 (UTF-8 <-> ASCII) (that is, characters that are also ASCII),

are replaced in this transformation format by their ASCII codes, which are octets in the range between 00h and 7F. The other

UCS codes are transformed to between two and six octets in the range between 80h and FF. Text containing only characters in

Character Set Table CS3 (UTF-8 <-> UCS-2) is transformed to the same octet sequence, irrespective of whether it was coded

with UCS-2.

2.8.5 8859 Character Set Management

ISO-8859 is an 8 bit character set - a major improvement over the plain 7 bit US-ASCII.

Characters 0 to 127 are always identical with US-ASCII and the positions 128 to 159 hold some less used control characters.

Positions 160 to 255 hold language-specific characters. ISO 8859 comprises a full series of 10 standardized multilingual single-

byte coded (8 bit) graphic character sets for writing in alphabetic languages:

• Latin 1 (West European)

• Latin 2 (East European)

• Latin 3 (South European)

• Latin 4 (North European)

• Cyrillic

•Arabic

• Greek

•Hebrew

• Latin 5 (Turkish)

• Latin 6 (Nordic)

g20 supports Latin 1.

Latin 1 covers most West European languages, such as French (fr), Spanish (es), Catalan (ca), Basque (eu), Portuguese (pt),

Italian (it), Albanian (sq), Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), Swedish (sv), Norwegian (no), Finnish

(fi), Faroese (fo), Icelandic (is), Irish (ga), Scottish (gd) and English (en). Afrikaans (af) and Swahili (sw) are also included,

extending coverage to much of Africa.

Latin 1 has also been adopted as the first page of ISO-10646.

Related product manuals