WARNING: THIS CLASS IS A WORK IN PROGRESS. SOME FEATURE ARE NOT
YET IMPLEMENTED AND SOME FEATURE MAY APPEAR/DISAPPEAR.
A UNICODE_STRING is a resizable string written with unicode values.
From unicode.org: "Unicode provides a unique number for every
character ,
no matter what the platform,
no matter what the program,
no matter what the language.
WARNING: a grapheme may be described with many code.
grapheme may be defined as "user character". Angstrom sign is
one grapheme but may be defined using (LETTER A + COMBINING RING).
Unicode strings may be acceded in two ways:
- low-level (code by code)
- high-level (grapheme by grapheme)
Unless otherwise specified, all functions unit is the unicode number.
return position to use in low_surrogate* arrays relative to
character at index in the string (return the good answer
if the corresponding character is not surrogate)
WARNING: it's only storage area. Each Unicode value is
stored using 2 bytes (CHARACTER). Encoding used is UTF-16NE.
low surrogates are stored in other way for direct acces.
Discard all characters (is_empty is True after that call).
The internal capacity is not changed
by this call. See also clear_count_and_capacity to select the most appropriate.
Note: internal storage memory is neither released nor shrunk.
Discard all characters (is_empty is True after that call).
The internal capacity is not changed
by this call. See also clear_count_and_capacity to select the most appropriate.
Note: internal storage memory is neither released nor shrunk.
return position to use in low_surrogate* arrays relative to
character at index in the string (return the good answer
if the corresponding character is not surrogate)