Input Resources

Pango itself does not deal with input at all. The Pango library is only concerned with layout and rendering. However, larger systems which includes Pango needs to do input as well as output, so some information is collected here about handling input, especially under X, Linux, and other free software systems

Keyboard Maps

While some languages, such as Chinese or Japanese require complicated input methods (see below), input for many languages is handled fine with a straightforward keyboard map.

For languages written with the Roman alphabet, the keyboard layout consists of a US keyboard layout with the addition of a few keys to add accents. The keyboard layout may also add an AltGR key which acts as a modifier to produce different key symbols for a given key, or a Compose key, which allows entering additional characters via sequences of keys on the keyboard. (For example with a AltGR key, the German sharp-s would be produced as 'AltGR + S', while using the Compose key, it would be produced with the sequence 'Compose, s, s'.

For languages written with a different alphabet, such as the Cyrillic languages, Greek, Hebrew and Arabic, the setup is somewhat different. For these languages, there will typically be several keyboard layouts - one for the user's script, and one for Roman, and the ability to toggle the keyboard between them.

The modern way of handling keyboard maps under X is the XKB extension. XKB is very configurable and powerful compared to the older ways of configuring keyboards (xmodmap), however, it is also very complex and not well documented. The main existing documentation that I am aware of is the XKB library and protocol reference manuals which come with the X distribution.

There is also a guide to configuring XKB written in Russian available here.

The way that XKB handles multiple keyboard layouts is a concept of groups. A keyboard map can have from 1 to 4 groups defined, each which represents a different layout. Groups may either be switched temporarily by use of a modifier key (typically AltGR), or they may be switched persistantly.

XKB keyboard maps can be loaded in one of two ways: when the X server is started, or later using the xkbcomp utility. In the first case, the particular map and options that are loaded are determined by the configuration in your server configuration file. (For instance, /etc/X11/XF86Config).

As an example of the second method, using xkb, say that you have the keyboard file /tmp/russian.xkb. You can load that into your display with the command:

    xkbcomp -R/usr/X11R6/lib/X11/xkb/ /tmp/russian.xkb :0
  

This tells xkbcomp to compile the file /tmp/russian.xkb (you need a absolute path name), and store the results on the X display :0. The -R option specifies the directory in which to look for include files referenced in the .xkb file.

The gswitchit panel applet for GNOME is a useful utility for monitoring the current group using XKB. There also is a similar fookb utility which works as a WindowMaker doc applet, or for straight X, though I have not tried it out myself.

XKB keyboard map files for various languages

  • Hebrew (Created by Uri David Akavia)
  • Arabic (Created by Ali Abdin, still a work in progress)
Robert Brady has made available some XKB symbols files for Bengali, Burmese, Gujarati, Hindi, Persian, Punjabi, and Tamil These are, again a work in progress, and he would be grateful for feedback on them.

An annotated XKB keyboard map file

As an example of a xkb file, we'll consider a simple xkb file for russian. The keymap includes various sections. xkb_keycodes, xkb_types, and so forth. For most of these, we simply include the standard settings from the X distribution.

xkb_keymap "myrussian" {
    xkb_keycodes        { include "xfree86"             };
    xkb_types           { include "default"             };
    xkb_compatibility   { include "default"             };
    xkb_geometry        { include "pc(pc102)"           };
  

The one section that we do interesting things in is the xkb_symbols section, which defines the mapping between the keys on the keyboard, and the key symbols produced. In this section, we first include some standard settings, so we don't have to define every key.

 xkb_symbols         { 

    include "en_US(pc105)+group(toggle)"
  

This includes two standard symbol sets - the keys for the standard 105 PC keyboard, and a setting so that the right alt key produces the ISO_Next_Group key symbol and thus causes the next keyboard group to be switched in.

We then name the first two keyboard groups as "US/ASCII" and "Russian".

    name[Group1]= "US/ASCII";
    name[Group2]= "Russian";
  

And finally redefine the keys which are different in the US layout and the Russian layout. For instance, we have:

    key	<AC01> {	[		a,		 A	],
			[     Cyrillic_ef,     Cyrillic_EF	]	};
  

Which says that the key AC01 should produce 'a' unshifted and 'A' shifted in the first group (US/ASCII) and 'Cyrillic_ef' unshifted and 'Cyrillic_EF' shifted in the second group (Russian).

Input Methods

While many languages are handled OK by selecting a keyboard layout, other languages, such as Chinese and Japanese require a more complicated input process. A typical process is that the user enters input phonetically, and then is prompted to choose between various possible written forms for that phonetic form. This conversion may involve dictionary lookups and other complicated processing, so it is typically done in a separate process - an input method server.

The most common protocol for communicating between applications and input method servers on X is the XIM protocol. The Xlib library contains support for the XIM protocol. (As part of the X Input Method extension.) XIM and the X Input Method extension are rather overcomplicated. They are also not very well adapted to the needs of a a system such as Pango, since they are based around the idea of a single, locale-specific encoding, and around rendering directly through Xlib. However, because of the availability of input methods using XIM, supporting it is important for a system that wants to gain widespread acceptance.

I've listed a few of the available input methods below (with an emphasis on free software). Most of the documentation on these systems, is, not suprisingly, in the target languages.

  • XCIN is an XIM server for Chinese.

  • Wnn (pronounced oon) is one of the more widely used Japanese input systems. For many years, it was available in as a basically unmaintained free version (Wnn4) and a commercial version (Wnn6). The FreeWnn project is now actively developing a free software version of Wnn.

  • Wnn, mentioned above, does not actually contain the code to interact with applications and the user. The actual input method sits betwen Wnn and the application. A commonly used Japanese XIM input method server is a program called kinput2, which can work both with Wnn, and with other backends, such as Canna.



Last modified 18-Nov-2000
Owen Taylor <otaylor@redhat.com>