com.ibm.icu.text

Class UTF16

public final class UTF16 extends Object

Standalone utility class providing UTF16 character conversions and indexing conversions.

Code that uses strings alone rarely need modification. By design, UTF-16 does not allow overlap, so searching for strings is a safe operation. Similarly, concatenation is always safe. Substringing is safe if the start and end are both on UTF-32 boundaries. In normal code, the values for start and end are on those boundaries, since they arose from operations like searching. If not, the nearest UTF-32 boundaries can be determined using bounds().

Examples:

The following examples illustrate use of some of these methods.

 // iteration forwards: Original
 for (int i = 0; i < s.length(); ++i) {
     char ch = s.charAt(i);
     doSomethingWith(ch);
 }

 // iteration forwards: Changes for UTF-32
 int ch;
 for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) {
     ch = UTF16.charAt(s,i);
     doSomethingWith(ch);
 }

 // iteration backwards: Original
 for (int i = s.length() -1; i >= 0; --i) {
     char ch = s.charAt(i);
     doSomethingWith(ch);
 }

 // iteration backwards: Changes for UTF-32
 int ch;
 for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) {
     ch = UTF16.charAt(s,i);
     doSomethingWith(ch);
 }
 
Notes:

Author: Mark Davis, with help from Markus Scherer

UNKNOWN: ICU 2.1

Nested Class Summary
static classUTF16.StringComparator

UTF16 string comparator class.

Field Summary
static intCODEPOINT_MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard.
static intCODEPOINT_MIN_VALUE
The lowest Unicode code point value.
static intLEAD_SURROGATE_BOUNDARY
Value returned in bounds().
static intLEAD_SURROGATE_MAX_VALUE
Lead surrogate maximum value
static intLEAD_SURROGATE_MIN_VALUE
Lead surrogate minimum value
static intSINGLE_CHAR_BOUNDARY
Value returned in bounds().
static intSUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points
static intSURROGATE_MAX_VALUE
Maximum surrogate value
static intSURROGATE_MIN_VALUE
Surrogate minimum value
static intTRAIL_SURROGATE_BOUNDARY
Value returned in bounds().
static intTRAIL_SURROGATE_MAX_VALUE
Trail surrogate maximum value
static intTRAIL_SURROGATE_MIN_VALUE
Trail surrogate minimum value
Method Summary
static StringBufferappend(StringBuffer target, int char32)
Append a single UTF-32 value to the end of a StringBuffer.
static intappend(char[] target, int limit, int char32)
Adds a codepoint to offset16 position of the argument char array.
static StringBufferappendCodePoint(StringBuffer target, int cp)
Cover JDK 1.5 APIs.
static intbounds(String source, int offset16)
Returns the type of the boundaries around the char at offset16.
static intbounds(StringBuffer source, int offset16)
Returns the type of the boundaries around the char at offset16.
static intbounds(char[] source, int start, int limit, int offset16)
Returns the type of the boundaries around the char at offset16.
static intcharAt(String source, int offset16)
Extract a single UTF-32 value from a string.
static intcharAt(StringBuffer source, int offset16)
Extract a single UTF-32 value from a string.
static intcharAt(char[] source, int start, int limit, int offset16)
Extract a single UTF-32 value from a substring.
static intcharAt(Replaceable source, int offset16)
Extract a single UTF-32 value from a string.
static intcountCodePoint(String source)
Number of codepoints in a UTF16 String
static intcountCodePoint(StringBuffer source)
Number of codepoints in a UTF16 String buffer
static intcountCodePoint(char[] source, int start, int limit)
Number of codepoints in a UTF16 char array substring
static StringBufferdelete(StringBuffer target, int offset16)
Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise).
static intdelete(char[] target, int limit, int offset16)
Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise).
static intfindCodePointOffset(String source, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at or after the given UTF-16 offset.
static intfindCodePointOffset(StringBuffer source, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset.
static intfindCodePointOffset(char[] source, int start, int limit, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset.
static intfindOffsetFromCodePoint(String source, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
static intfindOffsetFromCodePoint(StringBuffer source, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
static intfindOffsetFromCodePoint(char[] source, int start, int limit, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
static intgetCharCount(int char32)
Determines how many chars this char32 requires.
static chargetLeadSurrogate(int char32)
Returns the lead surrogate.
static chargetTrailSurrogate(int char32)
Returns the trail surrogate.
static booleanhasMoreCodePointsThan(String source, int number)
Check if the string contains more Unicode code points than a certain number.
static booleanhasMoreCodePointsThan(char[] source, int start, int limit, int number)
Check if the sub-range of char array, from argument start to limit, contains more Unicode code points than a certain number.
static booleanhasMoreCodePointsThan(StringBuffer source, int number)
Check if the string buffer contains more Unicode code points than a certain number.
static intindexOf(String source, int char32)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint.
static intindexOf(String source, String str)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str.
static intindexOf(String source, int char32, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint.
static intindexOf(String source, String str, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str.
static StringBufferinsert(StringBuffer target, int offset16, int char32)
Inserts char32 codepoint into target at the argument offset16.
static intinsert(char[] target, int limit, int offset16, int char32)
Inserts char32 codepoint into target at the argument offset16.
static booleanisLeadSurrogate(char char16)
Determines whether the character is a lead surrogate.
static booleanisSurrogate(char char16)
Determines whether the code value is a surrogate.
static booleanisTrailSurrogate(char char16)
Determines whether the character is a trail surrogate.
static intlastIndexOf(String source, int char32)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint.
static intlastIndexOf(String source, String str)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str.
static intlastIndexOf(String source, int char32, int fromIndex)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex.

This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character.

source is searched backwards starting at the last character starting at the specified index.

Examples:
UTF16.lastIndexOf("abc", 'c', 2) returns 2
UTF16.lastIndexOf("abc", 'c', 1) returns -1
UTF16.lastIndexOf("abc��", 0x10000, 5) returns 3
UTF16.lastIndexOf("abc��", 0x10000, 3) returns 3
UTF16.lastIndexOf("abc��", 0xd800) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.
static intlastIndexOf(String source, String str, int fromIndex)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex.

This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity.

static intmoveCodePointOffset(String source, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints
static intmoveCodePointOffset(StringBuffer source, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints
static intmoveCodePointOffset(char[] source, int start, int limit, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints within a subarray.
static StringnewString(int[] codePoints, int offset, int count)
Cover JDK 1.5 API.
static Stringreplace(String source, int oldChar32, int newChar32)
Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldChar32 in source with newChar32.
static Stringreplace(String source, String oldStr, String newStr)
Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldStr in source with newStr.
static StringBufferreverse(StringBuffer source)
Reverses a UTF16 format Unicode string and replaces source's content with it.
static voidsetCharAt(StringBuffer target, int offset16, int char32)
Set a code point into a UTF16 position.
static intsetCharAt(char[] target, int limit, int offset16, int char32)
Set a code point into a UTF16 position in a char array.
static StringvalueOf(int char32)
Convenience method corresponding to String.valueOf(char).
static StringvalueOf(String source, int offset16)
Convenience method corresponding to String.valueOf(codepoint at offset16).
static StringvalueOf(StringBuffer source, int offset16)
Convenience method corresponding to StringBuffer.valueOf(codepoint at offset16).
static StringvalueOf(char[] source, int start, int limit, int offset16)
Convenience method.

Field Detail

CODEPOINT_MAX_VALUE

public static final int CODEPOINT_MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard.

UNKNOWN: ICU 2.1

CODEPOINT_MIN_VALUE

public static final int CODEPOINT_MIN_VALUE
The lowest Unicode code point value.

UNKNOWN: ICU 2.1

LEAD_SURROGATE_BOUNDARY

public static final int LEAD_SURROGATE_BOUNDARY
Value returned in bounds(). These values are chosen specifically so that it actually represents the position of the character [offset16 - (value >> 2), offset16 + (value & 3)]

UNKNOWN: ICU 2.1

LEAD_SURROGATE_MAX_VALUE

public static final int LEAD_SURROGATE_MAX_VALUE
Lead surrogate maximum value

UNKNOWN: ICU 2.1

LEAD_SURROGATE_MIN_VALUE

public static final int LEAD_SURROGATE_MIN_VALUE
Lead surrogate minimum value

UNKNOWN: ICU 2.1

SINGLE_CHAR_BOUNDARY

public static final int SINGLE_CHAR_BOUNDARY
Value returned in bounds(). These values are chosen specifically so that it actually represents the position of the character [offset16 - (value >> 2), offset16 + (value & 3)]

UNKNOWN: ICU 2.1

SUPPLEMENTARY_MIN_VALUE

public static final int SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points

UNKNOWN: ICU 2.1

SURROGATE_MAX_VALUE

public static final int SURROGATE_MAX_VALUE
Maximum surrogate value

UNKNOWN: ICU 2.1

SURROGATE_MIN_VALUE

public static final int SURROGATE_MIN_VALUE
Surrogate minimum value

UNKNOWN: ICU 2.1

TRAIL_SURROGATE_BOUNDARY

public static final int TRAIL_SURROGATE_BOUNDARY
Value returned in bounds(). These values are chosen specifically so that it actually represents the position of the character [offset16 - (value >> 2), offset16 + (value & 3)]

UNKNOWN: ICU 2.1

TRAIL_SURROGATE_MAX_VALUE

public static final int TRAIL_SURROGATE_MAX_VALUE
Trail surrogate maximum value

UNKNOWN: ICU 2.1

TRAIL_SURROGATE_MIN_VALUE

public static final int TRAIL_SURROGATE_MIN_VALUE
Trail surrogate minimum value

UNKNOWN: ICU 2.1

Method Detail

append

public static StringBuffer append(StringBuffer target, int char32)
Append a single UTF-32 value to the end of a StringBuffer. If a validity check is required, use isLegal() on char32 before calling.

Parameters: target the buffer to append to char32 value to append.

Returns: the updated StringBuffer

Throws: IllegalArgumentException thrown when char32 does not lie within the range of the Unicode codepoints

UNKNOWN: ICU 2.1

append

public static int append(char[] target, int limit, int char32)
Adds a codepoint to offset16 position of the argument char array.

Parameters: target char array to be append with the new code point limit UTF16 offset which the codepoint will be appended. char32 code point to be appended

Returns: offset after char32 in the array.

Throws: IllegalArgumentException thrown if there is not enough space for the append, or when char32 does not lie within the range of the Unicode codepoints.

UNKNOWN: ICU 2.1

appendCodePoint

public static StringBuffer appendCodePoint(StringBuffer target, int cp)
Cover JDK 1.5 APIs. Append the code point to the buffer and return the buffer as a convenience.

Parameters: target the buffer to append to cp the code point to append

Returns: the updated StringBuffer

Throws: IllegalArgumentException if cp is not a valid code point

UNKNOWN: ICU 3.0 This API might change or be removed in a future release.

bounds

public static int bounds(String source, int offset16)
Returns the type of the boundaries around the char at offset16. Used for random access.

Parameters: source text to analyse offset16 UTF-16 offset

Returns:

For bit-twiddlers, the return values for these are chosen so that the boundaries can be gotten by: [offset16 - (value >> 2), offset16 + (value & 3)].

Throws: IndexOutOfBoundsException if offset16 is out of bounds.

UNKNOWN: ICU 2.1

bounds

public static int bounds(StringBuffer source, int offset16)
Returns the type of the boundaries around the char at offset16. Used for random access.

Parameters: source string buffer to analyse offset16 UTF16 offset

Returns:

For bit-twiddlers, the return values for these are chosen so that the boundaries can be gotten by: [offset16 - (value >> 2), offset16 + (value & 3)].

Throws: IndexOutOfBoundsException if offset16 is out of bounds.

UNKNOWN: ICU 2.1

bounds

public static int bounds(char[] source, int start, int limit, int offset16)
Returns the type of the boundaries around the char at offset16. Used for random access. Note that the boundaries are determined with respect to the subarray, hence the char array {0xD800, 0xDC00} has the result SINGLE_CHAR_BOUNDARY for start = offset16 = 0 and limit = 1.

Parameters: source char array to analyse start offset to substring in the source array for analyzing limit offset to substring in the source array for analyzing offset16 UTF16 offset relative to start

Returns:

For bit-twiddlers, the boundary values for these are chosen so that the boundaries can be gotten by: [offset16 - (boundvalue >> 2), offset16 + (boundvalue & 3)].

Throws: IndexOutOfBoundsException if offset16 is not within the range of start and limit.

UNKNOWN: ICU 2.1

charAt

public static int charAt(String source, int offset16)
Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned

Parameters: source array of UTF-16 chars offset16 UTF-16 offset to the start of the character.

Returns: UTF-32 value for the UTF-32 value that contains the char at offset16. The boundaries of that codepoint are the same as in bounds32().

Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.

UNKNOWN: ICU 2.1

charAt

public static int charAt(StringBuffer source, int offset16)
Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned

Parameters: source UTF-16 chars string buffer offset16 UTF-16 offset to the start of the character.

Returns: UTF-32 value for the UTF-32 value that contains the char at offset16. The boundaries of that codepoint are the same as in bounds32().

Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.

UNKNOWN: ICU 2.1

charAt

public static int charAt(char[] source, int start, int limit, int offset16)
Extract a single UTF-32 value from a substring. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned

Parameters: source array of UTF-16 chars start offset to substring in the source array for analyzing limit offset to substring in the source array for analyzing offset16 UTF-16 offset relative to start

Returns: UTF-32 value for the UTF-32 value that contains the char at offset16. The boundaries of that codepoint are the same as in bounds32().

Throws: IndexOutOfBoundsException thrown if offset16 is not within the range of start and limit.

UNKNOWN: ICU 2.1

charAt

public static int charAt(Replaceable source, int offset16)
Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned

Parameters: source UTF-16 chars string buffer offset16 UTF-16 offset to the start of the character.

Returns: UTF-32 value for the UTF-32 value that contains the char at offset16. The boundaries of that codepoint are the same as in bounds32().

Throws: IndexOutOfBoundsException thrown if offset16 is out of bounds.

UNKNOWN: ICU 2.1

countCodePoint

public static int countCodePoint(String source)
Number of codepoints in a UTF16 String

Parameters: source UTF16 string

Returns: number of codepoint in string

UNKNOWN: ICU 2.1

countCodePoint

public static int countCodePoint(StringBuffer source)
Number of codepoints in a UTF16 String buffer

Parameters: source UTF16 string buffer

Returns: number of codepoint in string

UNKNOWN: ICU 2.1

countCodePoint

public static int countCodePoint(char[] source, int start, int limit)
Number of codepoints in a UTF16 char array substring

Parameters: source UTF16 char array start offset of the substring limit offset of the substring

Returns: number of codepoint in the substring

Throws: IndexOutOfBoundsException if start and limit are not valid.

UNKNOWN: ICU 2.1

delete

public static StringBuffer delete(StringBuffer target, int offset16)
Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise).

Parameters: target string buffer to remove codepoint from offset16 offset which the codepoint will be removed

Returns: a reference to target

Throws: IndexOutOfBoundsException thrown if offset16 is invalid.

UNKNOWN: ICU 2.1

delete

public static int delete(char[] target, int limit, int offset16)
Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise).

Parameters: target string buffer to remove codepoint from limit end index of the char array, limit <= target.length offset16 offset which the codepoint will be removed

Returns: a new limit size

Throws: IndexOutOfBoundsException thrown if offset16 is invalid.

UNKNOWN: ICU 2.1

findCodePointOffset

public static int findCodePointOffset(String source, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at or after the given UTF-16 offset. Used for random access. See the class description for notes on roundtripping.
Note: If the UTF-16 offset is into the middle of a surrogate pair, then the UTF-32 offset of the lead of the pair is returned.

To find the UTF-32 length of a string, use:

     len32 = countCodePoint(source, source.length());
   

Parameters: source text to analyse offset16 UTF-16 offset < source text length.

Returns: UTF-32 offset

Throws: IndexOutOfBoundsException if offset16 is out of bounds.

UNKNOWN: ICU 2.1

findCodePointOffset

public static int findCodePointOffset(StringBuffer source, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset. Used for random access. See the class description for notes on roundtripping.
Note: If the UTF-16 offset is into the middle of a surrogate pair, then the UTF-32 offset of the lead of the pair is returned.

To find the UTF-32 length of a string, use:

     len32 = countCodePoint(source);
   

Parameters: source text to analyse offset16 UTF-16 offset < source text length.

Returns: UTF-32 offset

Throws: IndexOutOfBoundsException if offset16 is out of bounds.

UNKNOWN: ICU 2.1

findCodePointOffset

public static int findCodePointOffset(char[] source, int start, int limit, int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset. Used for random access. See the class description for notes on roundtripping.
Note: If the UTF-16 offset is into the middle of a surrogate pair, then the UTF-32 offset of the lead of the pair is returned.

To find the UTF-32 length of a substring, use:

     len32 = countCodePoint(source, start, limit);
   

Parameters: source text to analyse start offset of the substring limit offset of the substring offset16 UTF-16 relative to start

Returns: UTF-32 offset relative to start

Throws: IndexOutOfBoundsException if offset16 is not within the range of start and limit.

UNKNOWN: ICU 2.1

findOffsetFromCodePoint

public static int findOffsetFromCodePoint(String source, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access. See the class description for notes on roundtripping.

Parameters: source the UTF-16 string offset32 UTF-32 offset

Returns: UTF-16 offset

Throws: IndexOutOfBoundsException if offset32 is out of bounds.

UNKNOWN: ICU 2.1

findOffsetFromCodePoint

public static int findOffsetFromCodePoint(StringBuffer source, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access. See the class description for notes on roundtripping.

Parameters: source the UTF-16 string buffer offset32 UTF-32 offset

Returns: UTF-16 offset

Throws: IndexOutOfBoundsException if offset32 is out of bounds.

UNKNOWN: ICU 2.1

findOffsetFromCodePoint

public static int findOffsetFromCodePoint(char[] source, int start, int limit, int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access. See the class description for notes on roundtripping.

Parameters: source the UTF-16 char array whose substring is to be analysed start offset of the substring to be analysed limit offset of the substring to be analysed offset32 UTF-32 offset relative to start

Returns: UTF-16 offset relative to start

Throws: IndexOutOfBoundsException if offset32 is out of bounds.

UNKNOWN: ICU 2.1

getCharCount

public static int getCharCount(int char32)
Determines how many chars this char32 requires. If a validity check is required, use isLegal() on char32 before calling.

Parameters: char32 the input codepoint.

Returns: 2 if is in supplementary space, otherwise 1.

UNKNOWN: ICU 2.1

getLeadSurrogate

public static char getLeadSurrogate(int char32)
Returns the lead surrogate. If a validity check is required, use isLegal() on char32 before calling.

Parameters: char32 the input character.

Returns: lead surrogate if the getCharCount(ch) is 2;
and 0 otherwise (note: 0 is not a valid lead surrogate).

UNKNOWN: ICU 2.1

getTrailSurrogate

public static char getTrailSurrogate(int char32)
Returns the trail surrogate. If a validity check is required, use isLegal() on char32 before calling.

Parameters: char32 the input character.

Returns: the trail surrogate if the getCharCount(ch) is 2;
otherwise the character itself

UNKNOWN: ICU 2.1

hasMoreCodePointsThan

public static boolean hasMoreCodePointsThan(String source, int number)
Check if the string contains more Unicode code points than a certain number. This is more efficient than counting all code points in the entire string and comparing that number with a threshold. This function may not need to scan the string at all if the length is within a certain range, and never needs to count more than 'number + 1' code points. Logically equivalent to (countCodePoint(s) > number). A Unicode code point may occupy either one or two code units.

Parameters: source The input string. number The number of code points in the string is compared against the 'number' parameter.

Returns: boolean value for whether the string contains more Unicode code points than 'number'.

UNKNOWN: ICU 2.4

hasMoreCodePointsThan

public static boolean hasMoreCodePointsThan(char[] source, int start, int limit, int number)
Check if the sub-range of char array, from argument start to limit, contains more Unicode code points than a certain number. This is more efficient than counting all code points in the entire char array range and comparing that number with a threshold. This function may not need to scan the char array at all if start and limit is within a certain range, and never needs to count more than 'number + 1' code points. Logically equivalent to (countCodePoint(source, start, limit) > number). A Unicode code point may occupy either one or two code units.

Parameters: source array of UTF-16 chars start offset to substring in the source array for analyzing limit offset to substring in the source array for analyzing number The number of code points in the string is compared against the 'number' parameter.

Returns: boolean value for whether the string contains more Unicode code points than 'number'.

Throws: IndexOutOfBoundsException thrown when limit < start

UNKNOWN: ICU 2.4

hasMoreCodePointsThan

public static boolean hasMoreCodePointsThan(StringBuffer source, int number)
Check if the string buffer contains more Unicode code points than a certain number. This is more efficient than counting all code points in the entire string buffer and comparing that number with a threshold. This function may not need to scan the string buffer at all if the length is within a certain range, and never needs to count more than 'number + 1' code points. Logically equivalent to (countCodePoint(s) > number). A Unicode code point may occupy either one or two code units.

Parameters: source The input string buffer. number The number of code points in the string buffer is compared against the 'number' parameter.

Returns: boolean value for whether the string buffer contains more Unicode code points than 'number'.

UNKNOWN: ICU 2.4

indexOf

public static int indexOf(String source, int char32)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint. I.e., the smallest index i such that UTF16.charAt(source, i) == char32 is true.

If no such character occurs in this string, then -1 is returned.

Examples:
UTF16.indexOf("abc", 'a') returns 0
UTF16.indexOf("abc��", 0x10000) returns 3
UTF16.indexOf("abc��", 0xd800) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for

Returns: the index of the first occurrence of the codepoint in the argument Unicode string, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

indexOf

public static int indexOf(String source, String str)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity.e Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str. See example below.

If no such string str occurs in this source, then -1 is returned.

Examples:
UTF16.indexOf("abc", "ab") returns 0
UTF16.indexOf("abc��", "��") returns 3
UTF16.indexOf("abc��", "�") returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for

Returns: the index of the first occurrence of the codepoint in the argument Unicode string, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

indexOf

public static int indexOf(String source, int char32, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint. I.e., the smallest index i such that:
(UTF16.charAt(source, i) == char32 && i >= fromIndex) is true.

If no such character occurs in this string, then -1 is returned.

Examples:
UTF16.indexOf("abc", 'a', 1) returns -1
UTF16.indexOf("abc��", 0x10000, 1) returns 3
UTF16.indexOf("abc��", 0xd800, 1) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for fromIndex the index to start the search from.

Returns: the index of the first occurrence of the codepoint in the argument Unicode string at or after fromIndex, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

indexOf

public static int indexOf(String source, String str, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity.e Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str. See example below.

If no such string str occurs in this source, then -1 is returned.

Examples:
UTF16.indexOf("abc", "ab", 0) returns 0
UTF16.indexOf("abc��", "��", 0) returns 3
UTF16.indexOf("abc��", "��", 2) returns 3
UTF16.indexOf("abc��", "�", 0) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for fromIndex the index to start the search from.

Returns: the index of the first occurrence of the codepoint in the argument Unicode string, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

insert

public static StringBuffer insert(StringBuffer target, int offset16, int char32)
Inserts char32 codepoint into target at the argument offset16. If the offset16 is in the middle of a supplementary codepoint, char32 will be inserted after the supplementary codepoint. The length of target increases by one if codepoint is non-supplementary, 2 otherwise.

The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.

The offset argument must be greater than or equal to 0, and less than or equal to the length of source.

Parameters: target string buffer to insert to offset16 offset which char32 will be inserted in char32 codepoint to be inserted

Returns: a reference to target

Throws: IndexOutOfBoundsException thrown if offset16 is invalid.

UNKNOWN: ICU 2.1

insert

public static int insert(char[] target, int limit, int offset16, int char32)
Inserts char32 codepoint into target at the argument offset16. If the offset16 is in the middle of a supplementary codepoint, char32 will be inserted after the supplementary codepoint. Limit increases by one if codepoint is non-supplementary, 2 otherwise.

The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.

The offset argument must be greater than or equal to 0, and less than or equal to the limit.

Parameters: target char array to insert to limit end index of the char array, limit <= target.length offset16 offset which char32 will be inserted in char32 codepoint to be inserted

Returns: new limit size

Throws: IndexOutOfBoundsException thrown if offset16 is invalid.

UNKNOWN: ICU 2.1

isLeadSurrogate

public static boolean isLeadSurrogate(char char16)
Determines whether the character is a lead surrogate.

Parameters: char16 the input character.

Returns: true iff the input character is a lead surrogate

UNKNOWN: ICU 2.1

isSurrogate

public static boolean isSurrogate(char char16)
Determines whether the code value is a surrogate.

Parameters: char16 the input character.

Returns: true iff the input character is a surrogate.

UNKNOWN: ICU 2.1

isTrailSurrogate

public static boolean isTrailSurrogate(char char16)
Determines whether the character is a trail surrogate.

Parameters: char16 the input character.

Returns: true iff the input character is a trail surrogate.

UNKNOWN: ICU 2.1

lastIndexOf

public static int lastIndexOf(String source, int char32)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint. I.e., the index returned is the largest value i such that: UTF16.charAt(source, i) == char32 is true.

Examples:
UTF16.lastIndexOf("abc", 'a') returns 0
UTF16.lastIndexOf("abc��", 0x10000) returns 3
UTF16.lastIndexOf("abc��", 0xd800) returns -1

source is searched backwards starting at the last character.

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for

Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

lastIndexOf

public static int lastIndexOf(String source, String str)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity.e Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str. See example below.

Examples:
UTF16.lastIndexOf("abc", "a") returns 0
UTF16.lastIndexOf("abc��", "��") returns 3
UTF16.lastIndexOf("abc��", "�") returns -1

source is searched backwards starting at the last character.

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for

Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

lastIndexOf

public static int lastIndexOf(String source, int char32, int fromIndex)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex.

This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character.

source is searched backwards starting at the last character starting at the specified index.

Examples:
UTF16.lastIndexOf("abc", 'c', 2) returns 2
UTF16.lastIndexOf("abc", 'c', 1) returns -1
UTF16.lastIndexOf("abc��", 0x10000, 5) returns 3
UTF16.lastIndexOf("abc��", 0x10000, 3) returns 3
UTF16.lastIndexOf("abc��", 0xd800) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched char32 codepoint to search for fromIndex the index to start the search from. There is no restriction on the value of fromIndex. If it is greater than or equal to the length of this string, it has the same effect as if it were equal to one less than the length of this string: this entire string may be searched. If it is negative, it has the same effect as if it were -1: -1 is returned.

Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

lastIndexOf

public static int lastIndexOf(String source, String str, int fromIndex)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex.

This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity. Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str.

See example below.

Examples:
UTF16.lastIndexOf("abc", "c", 2) returns 2
UTF16.lastIndexOf("abc", "c", 1) returns -1
UTF16.lastIndexOf("abc��", "��", 5) returns 3
UTF16.lastIndexOf("abc��", "��", 3) returns 3
UTF16.lastIndexOf("abc��", "�", 4) returns -1

source is searched backwards starting at the last character.

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string that will be searched str UTF16 format Unicode string to search for fromIndex the index to start the search from. There is no restriction on the value of fromIndex. If it is greater than or equal to the length of this string, it has the same effect as if it were equal to one less than the length of this string: this entire string may be searched. If it is negative, it has the same effect as if it were -1: -1 is returned.

Returns: the index of the last occurrence of the codepoint in source, or -1 if the codepoint does not occur.

UNKNOWN: ICU 2.6

moveCodePointOffset

public static int moveCodePointOffset(String source, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints

Parameters: source string offset16 UTF16 position to shift shift32 number of codepoints to shift

Returns: new shifted offset16

Throws: IndexOutOfBoundsException if the new offset16 is out of bounds.

UNKNOWN: ICU 2.1

moveCodePointOffset

public static int moveCodePointOffset(StringBuffer source, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints

Parameters: source string buffer offset16 UTF16 position to shift shift32 number of codepoints to shift

Returns: new shifted offset16

Throws: IndexOutOfBoundsException if the new offset16 is out of bounds.

UNKNOWN: ICU 2.1

moveCodePointOffset

public static int moveCodePointOffset(char[] source, int start, int limit, int offset16, int shift32)
Shifts offset16 by the argument number of codepoints within a subarray.

Parameters: source char array start position of the subarray to be performed on limit position of the subarray to be performed on offset16 UTF16 position to shift relative to start shift32 number of codepoints to shift

Returns: new shifted offset16 relative to start

Throws: IndexOutOfBoundsException if the new offset16 is out of bounds with respect to the subarray or the subarray bounds are out of range.

UNKNOWN: ICU 2.1

newString

public static String newString(int[] codePoints, int offset, int count)
Cover JDK 1.5 API. Create a String from an array of codePoints.

Parameters: codePoints the code array offset the start of the text in the code point array count the number of code points

Returns: a String representing the code points between offset and count

Throws: IllegalArgumentException if an invalid code point is encountered IndexOutOfBoundsException if the offset or count are out of bounds.

UNKNOWN: ICU 3.0 This API might change or be removed in a future release.

replace

public static String replace(String source, int oldChar32, int newChar32)
Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldChar32 in source with newChar32. If the character oldChar32 does not occur in the UTF16 format Unicode string source, then source will be returned. Otherwise, a new String object is created that represents a codepoint sequence identical to the codepoint sequence represented by source, except that every occurrence of oldChar32 is replaced by an occurrence of newChar32.

Examples:
UTF16.replace("mesquite in your cellar", 'e', 'o');
returns "mosquito in your collar"
UTF16.replace("JonL", 'q', 'x');
returns "JonL" (no change)
UTF16.replace("Supplementary character ��", 0x10000, '!');
returns "Supplementary character !"
UTF16.replace("Supplementary character ��", 0xd800, '!');
returns "Supplementary character ��"

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string which the codepoint replacements will be based on. oldChar32 non-zero old codepoint to be replaced. newChar32 the new codepoint to replace oldChar32

Returns: new String derived from source by replacing every occurrence of oldChar32 with newChar32, unless when no oldChar32 is found in source then source will be returned.

UNKNOWN: ICU 2.6

replace

public static String replace(String source, String oldStr, String newStr)
Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldStr in source with newStr. If the string oldStr does not occur in the UTF16 format Unicode string source, then source will be returned. Otherwise, a new String object is created that represents a codepoint sequence identical to the codepoint sequence represented by source, except that every occurrence of oldStr is replaced by an occurrence of newStr.

Examples:
UTF16.replace("mesquite in your cellar", "e", "o");
returns "mosquito in your collar"
UTF16.replace("mesquite in your cellar", "mesquite", "cat");
returns "cat in your cellar"
UTF16.replace("JonL", "q", "x");
returns "JonL" (no change)
UTF16.replace("Supplementary character ��", "��", '!');
returns "Supplementary character !"
UTF16.replace("Supplementary character ��", "�", '!');
returns "Supplementary character ��"

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.

Parameters: source UTF16 format Unicode string which the replacements will be based on. oldStr non-zero-length string to be replaced. newStr the new string to replace oldStr

Returns: new String derived from source by replacing every occurrence of oldStr with newStr. When no oldStr is found in source, then source will be returned.

UNKNOWN: ICU 2.6

reverse

public static StringBuffer reverse(StringBuffer source)
Reverses a UTF16 format Unicode string and replaces source's content with it. This method will reverse surrogate characters correctly, instead of blindly reversing every character.

Examples:
UTF16.reverse(new StringBuffer( "Supplementary characters ����"))
returns "���� sretcarahc yratnemelppuS".

Parameters: source the source StringBuffer that contains UTF16 format Unicode string to be reversed

Returns: a modified source with reversed UTF16 format Unicode string.

UNKNOWN: ICU 2.6

setCharAt

public static void setCharAt(StringBuffer target, int offset16, int char32)
Set a code point into a UTF16 position. Adjusts target according if we are replacing a non-supplementary codepoint with a supplementary and vice versa.

Parameters: target stringbuffer offset16 UTF16 position to insert into char32 code point

UNKNOWN: ICU 2.1

setCharAt

public static int setCharAt(char[] target, int limit, int offset16, int char32)
Set a code point into a UTF16 position in a char array. Adjusts target according if we are replacing a non-supplementary codepoint with a supplementary and vice versa.

Parameters: target char array limit numbers of valid chars in target, different from target.length. limit counts the number of chars in target that represents a string, not the size of array target. offset16 UTF16 position to insert into char32 code point

Returns: new number of chars in target that represents a string

Throws: IndexOutOfBoundsException if offset16 is out of range

UNKNOWN: ICU 2.1

valueOf

public static String valueOf(int char32)
Convenience method corresponding to String.valueOf(char). Returns a one or two char string containing the UTF-32 value in UTF16 format. If a validity check is required, use isLegal() on char32 before calling.

Parameters: char32 the input character.

Returns: string value of char32 in UTF16 format

Throws: IllegalArgumentException thrown if char32 is a invalid codepoint.

UNKNOWN: ICU 2.1

valueOf

public static String valueOf(String source, int offset16)
Convenience method corresponding to String.valueOf(codepoint at offset16). Returns a one or two char string containing the UTF-32 value in UTF16 format. If offset16 indexes a surrogate character, the whole supplementary codepoint will be returned. If a validity check is required, use isLegal() on the codepoint at offset16 before calling. The result returned will be a newly created String obtained by calling source.substring(..) with the appropriate indexes.

Parameters: source the input string. offset16 the UTF16 index to the codepoint in source

Returns: string value of char32 in UTF16 format

UNKNOWN: ICU 2.1

valueOf

public static String valueOf(StringBuffer source, int offset16)
Convenience method corresponding to StringBuffer.valueOf(codepoint at offset16). Returns a one or two char string containing the UTF-32 value in UTF16 format. If offset16 indexes a surrogate character, the whole supplementary codepoint will be returned. If a validity check is required, use isLegal() on the codepoint at offset16 before calling. The result returned will be a newly created String obtained by calling source.substring(..) with the appropriate indexes.

Parameters: source the input string buffer. offset16 the UTF16 index to the codepoint in source

Returns: string value of char32 in UTF16 format

UNKNOWN: ICU 2.1

valueOf

public static String valueOf(char[] source, int start, int limit, int offset16)
Convenience method. Returns a one or two char string containing the UTF-32 value in UTF16 format. If offset16 indexes a surrogate character, the whole supplementary codepoint will be returned, except when either the leading or trailing surrogate character lies out of the specified subarray. In the latter case, only the surrogate character within bounds will be returned. If a validity check is required, use isLegal() on the codepoint at offset16 before calling. The result returned will be a newly created String containing the relevant characters.

Parameters: source the input char array. start start index of the subarray limit end index of the subarray offset16 the UTF16 index to the codepoint in source relative to start

Returns: string value of char32 in UTF16 format

UNKNOWN: ICU 2.1