Language specific knowledge is broken off into loadable modules for each language. These modules are used to implement a low-level API that allows its users to properly handle all sorts of scripts without having language specific knowledge.
The Bigger Picture
Pango provides facilities, that are not necessarily easy to use. A toolkit builds on top of these facililities to provide:
Other text handling facilities, such as a printing architecture, also build on top of Pango and provide simplified string-based APIs for their users.
Internationalized text, as handled by Pango, makes large demands on the font system. Many features that, for western text are only of interest for high-quality typography, such as ligatures, and the selection of alternate glyphs for a character, are vital for rendering non-western languages. Also, a much larger range of glyphs are needed, and finally, when rendering multi-lingual text, one must be able to simultaneously encode multiple languages, so encodings that are limited to the character set for a single language are not useful.
The process of choosing a font is considerably more complex in a multi-lingual environment which supports multiple font rendering systems than it is in a more homogeneous environment. Several different font selection activities can be identified:
There are three types of font related objects in Pango. A FontDescription is an abstract description of a font. A FontList is a list of FontDescription, and finally, a Font is a the realization of a FontDescription as as particular font on the system.
The glyphs within a font are indexed by 32-bit integers which are opaque to the application. (In the case we mention above where we merge various fonts with different encodings into a single font, a portion of the encoding space would be used to index the fonts within the fontset, and a portion would be used to index glyphs within each font.)
A number of aliases are provided on a system-wide level that are used to get standard fonts with certain general characteristics. Programs should use these aliases for fonts that are specified internally. These aliases include "fixed", "sans", and "serif".
The process of turning a character string into glyph string requires knowledge of the glyph encoding of the font. For this reason, the Font object includes a method to retrieve an appropriate Shaper object as a function of Unicode character and language tag. The Shaper object is responsible for taking a string of characters and turning them into a string of glyphs. A particular implementation of a Shaper object is most commonly be shared between a group of related scripts for a particular font system. For instance, there would be a single shaper implentation for Indic scripts rendered with OpenType, and a single shaper for western (latin, greek, and cyrillic) scripts rendered using X fonts.
For more complicated scripts, such as the Indic scripts and Arabic, there may be a considerable amount of code used in the shaping process that is independent of the font system. In these cases, a two-level system can be used - the Shaper uses an AbstractShaper to convert characters to abstract glyphs and the Shaper then is responsible for converting those abstract glyphs into glyphs for the specific font.
The Layout and Rendering Pipeline
Layout and rendering in Pango involves several steps:
A caller of Pango could apply all of the steps in the above algorithm explicitely by themselves. However, that process is rather involved and would involve code duplication in various places. Therefore, a higher-level facility is provided in Pango for callers that do not need a highly-detailed level of contro. This is the PangoLayout object. A PangoLayout object represents a paragraph of text and is created by passing in a attributed text string. It handles all of the above steps internally except for rendering.
Text in Pango is represented, in most cases, as UTF-8 encoded strings. This representation, has a number of advantages as opposed to a fixed-width representation such as UCS-16:
There are some disadvantages as well:
Where individual characters are represented, they are reprented as 32 bit wide characters. This again provides forwards compatibility with the full range of ISO10646, and should incur minimal cost for local variables and parameter passing. This agrees with the type of wchar_t in the GNU libc library.
Offsets into a utf-8 string are represented as byte offsets, not character offsets. This is more convenient for processing, and although there is the problem of having invalid offsets into the data, note that given a string of Unicode text with combining characters, character positions may already be invalid, and break-iteration is needed to determine valid positions.
Conversion between character sets will be handled via via the iconv. A lot of new systems provide a decent implementation of iconv - noteably, GNU libc-2.1, and these systems can be used as "reference platforms"; for other platforms, it shouldn't be hard to write a simple table-driven iconv implementation that can handle the small amounts of data in a typical GUI reasonably efficiently. (various implementations of this are availale - e.g., Tom Tromey's libunicode, Bruno Haible's libiconv.)
There are roughly three separate groups of tasks that Pango involves. There are tasks that are independent of language and font system, such as:
Then there are tasks that are dependent only on the language, such as:
Finally, there are tasks that depend both on language and the font system being used ,such as:
The first set of tasks are performed within the core of Pango, the other tasks are performed within dynamically loaded modules.
API Design Principles
Pango is intended to be a cross-platform, cross-toolkit, low-level library. Convenience of the API is not the primary goal, althought it is a goal. Some of the principles that are used in designing the details of the API are:
Last modified 22-Jun-2000 Owen Taylor <email@example.com>