Pegasus InfoCorp: Web site design and web software development company

charmap (5)

character symbols to define character encodings

DESCRIPTION

    A character set description (charmap) defines a characterset of available characters and their encodings. All supported character sets should have the portable character set as a proper subset. The portable character set is defined in the file /usr/lib/nls/charmap/POSIX for reference purposes.

SYNTAX

    The charmap file starts with a header, that may consist of the following keywords:

    <codeset>

      is followed by the name of the codeset.

    <mb_cur_max>

      is followed by the max number of bytes for a multibyte-character. Multibyte characters are currently not supported. The default value is 1.

    <mb_cur_min>

      is followed by the min number of bytes for a character. This value must be less or equal than mb_cur_max. If not specified, it defaults to mb_cur_max.

    <escape_char>

      is followed by a character that should be used as the escape-character for the rest of the file to mark characters that should be interpreted in a special way. It defaults to the backslash ( \\\\ ).

    <comment_char>

      is followed by a character that will be used as the comment-character for the rest of the file. It defaults to the number sign ( # ).

    The charmap-definition itself starts with the keyword CHARMAP in column 1.

    The following lines may have one of the two following forms to define the character-encodings:

    <symbolic-name> <encoding> <comments>

      This for defines exactly one character and its encoding.

    <symbolic-name>...<symbolic-name> <encoding> <comments>

      This form defines a couple of characters. This is only useful for mutlibyte-characters, which are currently not implemented.

    The last line in a charmap-definition file must contain END CHARMAP.

SYMBOLIC NAMES

    A symbolic name for a character contains only characters of the portable character set. The name itself isenclosed between angle brackets. Characters following the <escape_char> are interpreted as itself; for example, the sequence '<\\\\\\\\\\\gt;>' represents the symbolic name '\\\gt;' enclosed in angle brackets.

CHARACTER ENCODING

    The encoding may be in each of the following three forms:

    <escape_char>d<number>

      with a decimal number

    <escape_char>x<number>

      with a hexadecimal number

    <escape_char><number>

      with an octal number.