#include "unicode/utf.h"
Go to the source code of this file.
Defines | |
#define | U16_IS_SINGLE(c) !U_IS_SURROGATE(c) |
Does this code unit alone encode a code point (BMP, not a surrogate)? | |
#define | U16_IS_LEAD(c) (((c)&0xfffffc00)==0xd800) |
Is this code unit a lead surrogate (U+d800. | |
#define | U16_IS_TRAIL(c) (((c)&0xfffffc00)==0xdc00) |
Is this code unit a trail surrogate (U+dc00. | |
#define | U16_IS_SURROGATE(c) U_IS_SURROGATE(c) |
Is this code unit a surrogate (U+d800. | |
#define | U16_IS_SURROGATE_LEAD(c) (((c)&0x400)==0) |
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate? | |
#define | U16_SURROGATE_OFFSET ((0xd800<<10UL)+0xdc00-0x10000) |
Helper constant for U16_GET_SUPPLEMENTARY. | |
#define | U16_GET_SUPPLEMENTARY(lead, trail) (((UChar32)(lead)<<10UL)+(UChar32)(trail)-U16_SURROGATE_OFFSET) |
Get a supplementary code point value (U+10000. | |
#define | U16_LEAD(supplementary) (UChar)(((supplementary)>>10)+0xd7c0) |
Get the lead surrogate (0xd800. | |
#define | U16_TRAIL(supplementary) (UChar)(((supplementary)&0x3ff)|0xdc00) |
Get the trail surrogate (0xdc00. | |
#define | U16_LENGTH(c) ((uint32_t)(c)<=0xffff ? 1 : 2) |
How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000. | |
#define | U16_MAX_LENGTH 2 |
The maximum number of 16-bit code units per Unicode code point (U+0000. | |
#define | U16_GET_UNSAFE(s, i, c) |
Get a code point from a string at a random-access offset, without changing the offset. | |
#define | U16_GET(s, start, i, length, c) |
Get a code point from a string at a random-access offset, without changing the offset. | |
#define | U16_NEXT_UNSAFE(s, i, c) |
Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary. | |
#define | U16_NEXT(s, i, length, c) |
Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary. | |
#define | U16_APPEND_UNSAFE(s, i, c) |
Append a code point to a string, overwriting 1 or 2 code units. | |
#define | U16_APPEND(s, i, capacity, c, isError) |
Append a code point to a string, overwriting 1 or 2 code units. | |
#define | U16_FWD_1_UNSAFE(s, i) |
Advance the string offset from one code point boundary to the next. | |
#define | U16_FWD_1(s, i, length) |
Advance the string offset from one code point boundary to the next. | |
#define | U16_FWD_N_UNSAFE(s, i, n) |
Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points. | |
#define | U16_FWD_N(s, i, length, n) |
Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points. | |
#define | U16_SET_CP_START_UNSAFE(s, i) |
Adjust a random-access offset to a code point boundary at the start of a code point. | |
#define | U16_SET_CP_START(s, start, i) |
Adjust a random-access offset to a code point boundary at the start of a code point. | |
#define | U16_PREV_UNSAFE(s, i, c) |
Move the string offset from one code point boundary to the previous one and get the code point between them. | |
#define | U16_PREV(s, start, i, c) |
Move the string offset from one code point boundary to the previous one and get the code point between them. | |
#define | U16_BACK_1_UNSAFE(s, i) |
Move the string offset from one code point boundary to the previous one. | |
#define | U16_BACK_1(s, start, i) |
Move the string offset from one code point boundary to the previous one. | |
#define | U16_BACK_N_UNSAFE(s, i, n) |
Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points. | |
#define | U16_BACK_N(s, start, i, n) |
Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points. | |
#define | U16_SET_CP_LIMIT_UNSAFE(s, i) |
Adjust a random-access offset to a code point boundary after a code point. | |
#define | U16_SET_CP_LIMIT(s, start, i, length) |
Adjust a random-access offset to a code point boundary after a code point. |
This file defines macros to deal with 16-bit Unicode (UTF-16) code units and strings. utf16.h is included by utf.h after unicode/umachine.h and some common definitions.
For more information see utf.h and the ICU User Guide Strings chapter (http://icu-project.org/userguide/strings.html).
Usage: ICU coding guidelines for if() statements should be followed when using these macros. Compound statements (curly braces {}) must be used for if-else-while... bodies and all macro statements should be terminated with semicolon.
Definition in file utf16.h.
#define U16_APPEND | ( | s, | |||
i, | |||||
capacity, | |||||
c, | |||||
isError | ) |
Value:
{ \ if((uint32_t)(c)<=0xffff) { \ (s)[(i)++]=(uint16_t)(c); \ } else if((uint32_t)(c)<=0x10ffff && (i)+1<(capacity)) { \ (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \ (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \ } else /* c>0x10ffff or not enough space */ { \ (isError)=TRUE; \ } \ }
The offset points to the current end of the string contents and is advanced (post-increment). "Safe" macro, checks for a valid code point. If a surrogate pair is written, checks for sufficient space in the string. If the code point is not valid or a trail surrogate does not fit, then isError is set to TRUE.
s | const UChar * string buffer | |
i | string offset, must be i<capacity | |
capacity | size of the string buffer | |
c | code point to append | |
isError | output UBool set to TRUE if an error occurs, otherwise not modified |
Definition at line 302 of file utf16.h.
Referenced by UnicodeString::append(), and UnicodeString::replace().
#define U16_APPEND_UNSAFE | ( | s, | |||
i, | |||||
c | ) |
Value:
{ \ if((uint32_t)(c)<=0xffff) { \ (s)[(i)++]=(uint16_t)(c); \ } else { \ (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \ (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \ } \ }
The offset points to the current end of the string contents and is advanced (post-increment). "Unsafe" macro, assumes a valid code point and sufficient space in the string. Otherwise, the result is undefined.
s | const UChar * string buffer | |
i | string offset | |
c | code point to append |
#define U16_BACK_1 | ( | s, | |||
start, | |||||
i | ) |
Value:
{ \ if(U16_IS_TRAIL((s)[--(i)]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \ --(i); \ } \ }
(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.
s | const UChar * string | |
start | starting string offset (usually 0) | |
i | string offset, must be start<i |
#define U16_BACK_1_UNSAFE | ( | s, | |||
i | ) |
Value:
{ \ if(U16_IS_TRAIL((s)[--(i)])) { \ --(i); \ } \ }
(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.
s | const UChar * string | |
i | string offset |
#define U16_BACK_N | ( | s, | |||
start, | |||||
i, | |||||
n | ) |
Value:
{ \ int32_t __N=(n); \ while(__N>0 && (i)>(start)) { \ U16_BACK_1(s, start, i); \ --__N; \ } \ }
(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.
s | const UChar * string | |
start | start of string | |
i | string offset, must be start<i | |
n | number of code points to skip |
#define U16_BACK_N_UNSAFE | ( | s, | |||
i, | |||||
n | ) |
Value:
{ \ int32_t __N=(n); \ while(__N>0) { \ U16_BACK_1_UNSAFE(s, i); \ --__N; \ } \ }
(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.
s | const UChar * string | |
i | string offset | |
n | number of code points to skip |
#define U16_FWD_1 | ( | s, | |||
i, | |||||
length | ) |
Value:
{ \ if(U16_IS_LEAD((s)[(i)++]) && (i)<(length) && U16_IS_TRAIL((s)[i])) { \ ++(i); \ } \ }
(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.
s | const UChar * string | |
i | string offset, must be i<length | |
length | string length |
#define U16_FWD_1_UNSAFE | ( | s, | |||
i | ) |
Value:
{ \ if(U16_IS_LEAD((s)[(i)++])) { \ ++(i); \ } \ }
(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.
s | const UChar * string | |
i | string offset |
#define U16_FWD_N | ( | s, | |||
i, | |||||
length, | |||||
n | ) |
Value:
{ \ int32_t __N=(n); \ while(__N>0 && (i)<(length)) { \ U16_FWD_1(s, i, length); \ --__N; \ } \ }
(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.
s | const UChar * string | |
i | string offset, must be i<length | |
length | string length | |
n | number of code points to skip |
#define U16_FWD_N_UNSAFE | ( | s, | |||
i, | |||||
n | ) |
Value:
{ \ int32_t __N=(n); \ while(__N>0) { \ U16_FWD_1_UNSAFE(s, i); \ --__N; \ } \ }
(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.
s | const UChar * string | |
i | string offset | |
n | number of code points to skip |
#define U16_GET | ( | s, | |||
start, | |||||
i, | |||||
length, | |||||
c | ) |
Value:
{ \ (c)=(s)[i]; \ if(U16_IS_SURROGATE(c)) { \ uint16_t __c2; \ if(U16_IS_SURROGATE_LEAD(c)) { \ if((i)+1<(length) && U16_IS_TRAIL(__c2=(s)[(i)+1])) { \ (c)=U16_GET_SUPPLEMENTARY((c), __c2); \ } \ } else { \ if((i)-1>=(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \ (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \ } \ } \ } \ }
"Safe" macro, handles unpaired surrogates and checks for string boundaries.
The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. If the offset points to a single, unpaired surrogate, then that itself will be returned as the code point. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.
s | const UChar * string | |
start | starting string offset (usually 0) | |
i | string offset, must be start<=i<length | |
length | string length | |
c | output UChar32 variable |
Definition at line 188 of file utf16.h.
Referenced by UnicodeString::char32At().
#define U16_GET_SUPPLEMENTARY | ( | lead, | |||
trail | ) | (((UChar32)(lead)<<10UL)+(UChar32)(trail)-U16_SURROGATE_OFFSET) |
Get a supplementary code point value (U+10000.
.U+10ffff) from its lead and trail surrogates. The result is undefined if the input values are not lead and trail surrogates.
lead | lead surrogate (U+d800..U+dbff) | |
trail | trail surrogate (U+dc00..U+dfff) |
#define U16_GET_UNSAFE | ( | s, | |||
i, | |||||
c | ) |
Value:
{ \ (c)=(s)[i]; \ if(U16_IS_SURROGATE(c)) { \ if(U16_IS_SURROGATE_LEAD(c)) { \ (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)+1]); \ } else { \ (c)=U16_GET_SUPPLEMENTARY((s)[(i)-1], (c)); \ } \ } \ }
"Unsafe" macro, assumes well-formed UTF-16.
The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. The result is undefined if the offset points to a single, unpaired surrogate. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.
s | const UChar * string | |
i | string offset | |
c | output UChar32 variable |
#define U16_IS_LEAD | ( | c | ) | (((c)&0xfffffc00)==0xd800) |
#define U16_IS_SINGLE | ( | c | ) | !U_IS_SURROGATE(c) |
#define U16_IS_SURROGATE | ( | c | ) | U_IS_SURROGATE(c) |
#define U16_IS_SURROGATE_LEAD | ( | c | ) | (((c)&0x400)==0) |
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?
c | 16-bit code unit |
#define U16_IS_TRAIL | ( | c | ) | (((c)&0xfffffc00)==0xdc00) |
#define U16_LEAD | ( | supplementary | ) | (UChar)(((supplementary)>>10)+0xd7c0) |
#define U16_LENGTH | ( | c | ) | ((uint32_t)(c)<=0xffff ? 1 : 2) |
#define U16_MAX_LENGTH 2 |
The maximum number of 16-bit code units per Unicode code point (U+0000.
.U+10ffff).
Definition at line 138 of file utf16.h.
Referenced by UnicodeString::append(), and UnicodeString::replace().
#define U16_NEXT | ( | s, | |||
i, | |||||
length, | |||||
c | ) |
Value:
{ \ (c)=(s)[(i)++]; \ if(U16_IS_LEAD(c)) { \ uint16_t __c2; \ if((i)<(length) && U16_IS_TRAIL(__c2=(s)[(i)])) { \ ++(i); \ (c)=U16_GET_SUPPLEMENTARY((c), __c2); \ } \ } \ }
(Post-incrementing forward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.
The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate or to a single, unpaired lead surrogate, then that itself will be returned as the code point.
s | const UChar * string | |
i | string offset, must be i<length | |
length | string length | |
c | output UChar32 variable |
#define U16_NEXT_UNSAFE | ( | s, | |||
i, | |||||
c | ) |
Value:
{ \ (c)=(s)[(i)++]; \ if(U16_IS_LEAD(c)) { \ (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)++]); \ } \ }
(Post-incrementing forward iteration.) "Unsafe" macro, assumes well-formed UTF-16.
The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate, then that itself will be returned as the code point. The result is undefined if the offset points to a single, unpaired lead surrogate.
s | const UChar * string | |
i | string offset | |
c | output UChar32 variable |
#define U16_PREV | ( | s, | |||
start, | |||||
i, | |||||
c | ) |
Value:
{ \ (c)=(s)[--(i)]; \ if(U16_IS_TRAIL(c)) { \ uint16_t __c2; \ if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \ --(i); \ (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \ } \ } \ }
(Pre-decrementing backward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.
The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate or behind a single, unpaired trail surrogate, then that itself will be returned as the code point.
s | const UChar * string | |
start | starting string offset (usually 0) | |
i | string offset, must be start<i | |
c | output UChar32 variable |
#define U16_PREV_UNSAFE | ( | s, | |||
i, | |||||
c | ) |
Value:
{ \ (c)=(s)[--(i)]; \ if(U16_IS_TRAIL(c)) { \ (c)=U16_GET_SUPPLEMENTARY((s)[--(i)], (c)); \ } \ }
(Pre-decrementing backward iteration.) "Unsafe" macro, assumes well-formed UTF-16.
The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate, then that itself will be returned as the code point. The result is undefined if the offset is behind a single, unpaired trail surrogate.
s | const UChar * string | |
i | string offset | |
c | output UChar32 variable |
#define U16_SET_CP_LIMIT | ( | s, | |||
start, | |||||
i, | |||||
length | ) |
Value:
{ \ if((start)<(i) && (i)<(length) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \ ++(i); \ } \ }
If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.
s | const UChar * string | |
start | starting string offset (usually 0) | |
i | string offset, start<=i<=length | |
length | string length |
Definition at line 599 of file utf16.h.
Referenced by UnicodeString::getChar32Limit().
#define U16_SET_CP_LIMIT_UNSAFE | ( | s, | |||
i | ) |
Value:
{ \ if(U16_IS_LEAD((s)[(i)-1])) { \ ++(i); \ } \ }
If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.
s | const UChar * string | |
i | string offset |
#define U16_SET_CP_START | ( | s, | |||
start, | |||||
i | ) |
Value:
{ \ if(U16_IS_TRAIL((s)[i]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \ --(i); \ } \ }
If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Safe" macro, handles unpaired surrogates and checks for string boundaries.
s | const UChar * string | |
start | starting string offset (usually 0) | |
i | string offset, must be start<=i |
Definition at line 420 of file utf16.h.
Referenced by UnicodeString::getChar32Start().
#define U16_SET_CP_START_UNSAFE | ( | s, | |||
i | ) |
Value:
{ \ if(U16_IS_TRAIL((s)[i])) { \ --(i); \ } \ }
If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Unsafe" macro, assumes well-formed UTF-16.
s | const UChar * string | |
i | string offset |
#define U16_SURROGATE_OFFSET ((0xd800<<10UL)+0xdc00-0x10000) |
#define U16_TRAIL | ( | supplementary | ) | (UChar)(((supplementary)&0x3ff)|0xdc00) |