Next: Programmer i18n, Previous: I18N and L10N, Up: Internationalization [Contents][Index]
gettext
gawk
uses GNU gettext
to provide its internationalization
features.
The facilities in GNU gettext
focus on messages: strings printed
by a program, either directly or via formatting with printf
or
sprintf()
.85
When using GNU gettext
, each application has its own
text domain. This is a unique name, such as ‘kpilot’ or ‘gawk’,
that identifies the application.
A complete application may have multiple components—programs written
in C or C++, as well as scripts written in sh
or awk
.
All of the components use the same text domain.
To make the discussion concrete, assume we’re writing an application
named guide
. Internationalization consists of the
following steps, in this order:
guide
’s components
and marks each string that is a candidate for translation.
For example, "`-F': option required"
is a good candidate for translation.
A table with strings of option names is not (e.g., gawk
’s
--profile option should remain the same, no matter what the local
language).
"guide"
) to the gettext
library,
by calling the textdomain()
function.
.po
)
and translations are created and shipped with the application.
For example, there might be a fr.po for a French translation.
guide
is built and installed, the binary translation files
are installed in a standard place.
gettext
to use .gmo files in a different directory than the standard
one by using the bindtextdomain()
function.
guide
looks up each string via a call
to gettext()
. The returned string is the translated string
if available, or the original string if not.
In C (or C++), the string marking and dynamic translation lookup
are accomplished by wrapping each string in a call to gettext()
:
printf("%s", gettext("Don't Panic!\n"));
The tools that extract messages from source code pull out all
strings enclosed in calls to gettext()
.
The GNU gettext
developers, recognizing that typing
‘gettext(…)’ over and over again is both painful and ugly to look
at, use the macro ‘_’ (an underscore) to make things easier:
/* In the standard header file: */ #define _(str) gettext(str) /* In the program text: */ printf("%s", _("Don't Panic!\n"));
This reduces the typing overhead to just three extra characters per string and is considerably easier to read as well.
There are locale categories
for different types of locale-related information.
The defined locale categories that gettext
knows about are:
LC_MESSAGES
Text messages. This is the default category for gettext
operations, but it is possible to supply a different one explicitly,
if necessary. (It is almost never necessary to supply a different category.)
LC_COLLATE
Text-collation information (i.e., how different characters and/or groups of characters sort in a given language).
LC_CTYPE
Character-type information (alphabetic, digit, upper- or lowercase, and
so on) as well as character encoding.
This information is accessed via the
POSIX character classes in regular expressions,
such as /[[:alnum:]]/
(see Bracket Expressions).
LC_MONETARY
Monetary information, such as the currency symbol, and whether the symbol goes before or after a number.
LC_NUMERIC
Numeric information, such as which characters to use for the decimal point and the thousands separator.86
LC_TIME
Time- and date-related information, such as 12- or 24-hour clock, month printed before or after the day in a date, local month abbreviations, and so on.
LC_ALL
All of the above. (Not too useful in the context of gettext
.)
For some operating systems, the gawk
port doesn’t support GNU gettext
.
Therefore, these features are not available
if you are using one of those operating systems. Sorry.
Americans use a comma every three decimal places and a period for the decimal point, while many Europeans do exactly the opposite: 1,234.56 versus 1.234,56.
Next: Programmer i18n, Previous: I18N and L10N, Up: Internationalization [Contents][Index]