com.ibm.icu.text
Class Collator

java.lang.Object
  extended by com.ibm.icu.text.Collator
All Implemented Interfaces:
Cloneable, Comparator
Direct Known Subclasses:
RuleBasedCollator

public abstract class Collator
extends Object
implements Comparator, Cloneable

Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:

Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, the Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

For more information about the collation service see the users guide.

Examples of use

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }

 The following example shows how to compare two strings using the
 Collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(NO_DECOMPOSITION);
 if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
     System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
     myCollator.setDecomposition(CANONICAL_DECOMPOSITION);
     if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
         System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
     }
     else {
         System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
     }
 }
 else {
     System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");
 }
 

Author:
Syn Wee Quek
See Also:
RuleBasedCollator, CollationKey
Status:
Stable ICU 2.8.

Nested Class Summary
static class Collator.CollatorFactory
          A factory used with registerFactory to register multiple collators and provide display names for them.
 
Field Summary
static int CANONICAL_DECOMPOSITION
          Decomposition mode value.
static int FULL_DECOMPOSITION
          This is for backwards compatibility with Java APIs only.
static int IDENTICAL
           Smallest Collator strength value.
static int NO_DECOMPOSITION
          Decomposition mode value.
static int PRIMARY
          Strongest collator strength value.
static int QUATERNARY
          Fourth level collator strength value.
static int SECONDARY
          Second level collator strength value.
static int TERTIARY
          Third level collator strength value.
 
Constructor Summary
protected Collator()
          Empty default constructor to make javadocs happy
 
Method Summary
 Object clone()
          Clone the collator.
 int compare(Object source, Object target)
           Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.
abstract  int compare(String source, String target)
           Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.
 boolean equals(String source, String target)
          Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.
static Locale[] getAvailableLocales()
          Get the set of locales, as Locale objects, for which collators are installed.
static ULocale[] getAvailableULocales()
          Get the set of locales, as ULocale objects, for which collators are installed.
abstract  CollationKey getCollationKey(String source)
           Transforms the String into a CollationKey suitable for efficient repeated comparison.
 int getDecomposition()
           Get the decomposition mode of this Collator.
static String getDisplayName(Locale objectLocale)
          Get the name of the collator for the objectLocale, localized for the current locale.
static String getDisplayName(Locale objectLocale, Locale displayLocale)
          Get the name of the collator for the objectLocale, localized for the displayLocale.
static String getDisplayName(ULocale objectLocale)
          Get the name of the collator for the objectLocale, localized for the current locale.
static String getDisplayName(ULocale objectLocale, ULocale displayLocale)
          Get the name of the collator for the objectLocale, localized for the displayLocale.
static ULocale getFunctionalEquivalent(String keyword, ULocale locID)
          Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
static ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)
          Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
static Collator getInstance()
          Gets the Collator for the current default locale.
static Collator getInstance(Locale locale)
          Gets the Collator for the desired locale.
static Collator getInstance(ULocale locale)
          Gets the Collator for the desired locale.
static String[] getKeywords()
          Return an array of all possible keywords that are relevant to collation.
static String[] getKeywordValues(String keyword)
          Given a keyword, return an array of all values for that keyword that are currently in use.
 ULocale getLocale(ULocale.Type type)
          Return the locale that was used to create this object, or null.
abstract  RawCollationKey getRawCollationKey(String source, RawCollationKey key)
          Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key.
 int getStrength()
          Returns this Collator's strength property.
 UnicodeSet getTailoredSet()
          Get an UnicodeSet that contains all the characters and sequences tailored in this collator.
abstract  VersionInfo getUCAVersion()
          Get the UCA version of this collator object.
abstract  int getVariableTop()
          Gets the variable top value of a Collator.
abstract  VersionInfo getVersion()
          Get the version of this collator object.
static Object registerFactory(Collator.CollatorFactory factory)
          Register a collator factory.
static Object registerInstance(Collator collator, ULocale locale)
          Register a collator as the default collator for the provided locale.
 void setDecomposition(int decomposition)
          Set the decomposition mode of this Collator.
 void setStrength(int newStrength)
          Sets this Collator's strength property.
abstract  void setVariableTop(int varTop)
          Sets the variable top to a collation element value supplied.
abstract  int setVariableTop(String varTop)
           Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.
static boolean unregister(Object registryKey)
          Unregister a collator previously registered using registerInstance.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.util.Comparator
equals
 

Field Detail

PRIMARY

public static final int PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Stable ICU 2.8.

SECONDARY

public static final int SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Stable ICU 2.8.

TERTIARY

public static final int TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Stable ICU 2.8.

QUATERNARY

public static final int QUATERNARY
Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Stable ICU 2.8.

IDENTICAL

public static final int IDENTICAL

Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's

See Also:
Constant Field Values
Status:
Stable ICU 2.8.

FULL_DECOMPOSITION

public static final int FULL_DECOMPOSITION
This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.

See Also:
Constant Field Values
Status:
Stable ICU 3.4.

NO_DECOMPOSITION

public static final int NO_DECOMPOSITION

Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

See Also:
CANONICAL_DECOMPOSITION, getDecomposition(), setDecomposition(int), Constant Field Values
Status:
Stable ICU 2.8.

CANONICAL_DECOMPOSITION

public static final int CANONICAL_DECOMPOSITION

Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.

See Also:
NO_DECOMPOSITION, getDecomposition(), setDecomposition(int), Constant Field Values
Status:
Stable ICU 2.8.
Constructor Detail

Collator

protected Collator()
Empty default constructor to make javadocs happy

Status:
Stable ICU 2.4.
Method Detail

setStrength

public void setStrength(int newStrength)

Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.

The default strength for the Collator is TERTIARY, unless specified otherwise by the locale used to create the Collator.

See the Collator class description for an example of use.

Parameters:
newStrength - the new strength value.
Throws:
IllegalArgumentException - if the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
See Also:
getStrength(), PRIMARY, SECONDARY, TERTIARY, QUATERNARY, IDENTICAL
Status:
Stable ICU 2.8.

setDecomposition

public void setDecomposition(int decomposition)

Set the decomposition mode of this Collator. Setting this decomposition property with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.

Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.

The default decompositon mode for the Collator is NO_DECOMPOSITON, unless specified otherwise by the locale used to create the Collator.

See getDecomposition for a description of decomposition mode.

Parameters:
decomposition - the new decomposition mode
Throws:
IllegalArgumentException - If the given value is not a valid decomposition mode.
See Also:
getDecomposition(), NO_DECOMPOSITION, CANONICAL_DECOMPOSITION
Status:
Stable ICU 2.8.

getInstance

public static final Collator getInstance()
Gets the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().

Returns:
the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the default UCA collator will be returned.
See Also:
Locale.getDefault(), getInstance(Locale)
Status:
Stable ICU 2.8.

clone

public Object clone()
             throws CloneNotSupportedException
Clone the collator.

Overrides:
clone in class Object
Returns:
a clone of this collator.
Throws:
CloneNotSupportedException
Status:
Stable ICU 2.6.

getInstance

public static final Collator getInstance(ULocale locale)
Gets the Collator for the desired locale.

Parameters:
locale - the desired locale.
Returns:
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.
See Also:
Locale, ResourceBundle, getInstance(Locale), getInstance()
Status:
Stable ICU 3.0.

getInstance

public static final Collator getInstance(Locale locale)
Gets the Collator for the desired locale.

Parameters:
locale - the desired locale.
Returns:
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.
See Also:
Locale, ResourceBundle, getInstance(ULocale), getInstance()
Status:
Stable ICU 2.8.

registerInstance

public static final Object registerInstance(Collator collator,
                                            ULocale locale)
Register a collator as the default collator for the provided locale. The collator should not be modified after it is registered.

Parameters:
collator - the collator to register
locale - the locale for which this is the default collator
Returns:
an object that can be used to unregister the registered collator.
Status:
Stable ICU 3.2.

registerFactory

public static final Object registerFactory(Collator.CollatorFactory factory)
Register a collator factory.

Parameters:
factory - the factory to register
Returns:
an object that can be used to unregister the registered factory.
Status:
Stable ICU 2.6.

unregister

public static final boolean unregister(Object registryKey)
Unregister a collator previously registered using registerInstance.

Parameters:
registryKey - the object previously returned by registerInstance.
Returns:
true if the collator was successfully unregistered.
Status:
Stable ICU 2.6.

getAvailableLocales

public static Locale[] getAvailableLocales()
Get the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.

Returns:
the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.
Status:
Stable ICU 2.4.

getAvailableULocales

public static final ULocale[] getAvailableULocales()
Get the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.

Returns:
the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.
Status:
Stable ICU 3.0.

getKeywords

public static final String[] getKeywords()
Return an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".

Returns:
an array of valid collation keywords.
See Also:
getKeywordValues(java.lang.String)
Status:
Stable ICU 3.0.

getKeywordValues

public static final String[] getKeywordValues(String keyword)
Given a keyword, return an array of all values for that keyword that are currently in use.

Parameters:
keyword - one of the keywords returned by getKeywords.
See Also:
getKeywords()
Status:
Stable ICU 3.0.

getFunctionalEquivalent

public static final ULocale getFunctionalEquivalent(String keyword,
                                                    ULocale locID,
                                                    boolean[] isAvailable)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.

Parameters:
keyword - a particular keyword as enumerated by getKeywords.
locID - The requested locale
isAvailable - If non-null, isAvailable[0] will receive and output boolean that indicates whether the requested locale was 'available' to the collation service. If non-null, isAvailable must have length >= 1.
Returns:
the locale
Status:
Stable ICU 3.0.

getFunctionalEquivalent

public static final ULocale getFunctionalEquivalent(String keyword,
                                                    ULocale locID)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

Parameters:
keyword - a particular keyword as enumerated by getKeywords.
locID - The requested locale
Returns:
the locale
See Also:
getFunctionalEquivalent(String,ULocale,boolean[])
Status:
Stable ICU 3.0.

getDisplayName

public static String getDisplayName(Locale objectLocale,
                                    Locale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.

Parameters:
objectLocale - the locale of the collator
displayLocale - the locale for the collator's display name
Returns:
the display name
Status:
Stable ICU 2.6.

getDisplayName

public static String getDisplayName(ULocale objectLocale,
                                    ULocale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.

Parameters:
objectLocale - the locale of the collator
displayLocale - the locale for the collator's display name
Returns:
the display name
Status:
Stable ICU 3.2.

getDisplayName

public static String getDisplayName(Locale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.

Parameters:
objectLocale - the locale of the collator
Returns:
the display name
Status:
Stable ICU 2.6.

getDisplayName

public static String getDisplayName(ULocale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.

Parameters:
objectLocale - the locale of the collator
Returns:
the display name
Status:
Stable ICU 3.2.

getStrength

public int getStrength()

Returns this Collator's strength property. The strength property determines the minimum level of difference considered significant.

See the Collator class description for more details.

Returns:
this Collator's current strength property.
See Also:
setStrength(int), PRIMARY, SECONDARY, TERTIARY, QUATERNARY, IDENTICAL
Status:
Stable ICU 2.8.

getDecomposition

public int getDecomposition()

Get the decomposition mode of this Collator. Decomposition mode determines how Unicode composed characters are handled.

See the Collator class description for more details.

Returns:
the decomposition mode
See Also:
setDecomposition(int), NO_DECOMPOSITION, CANONICAL_DECOMPOSITION
Status:
Stable ICU 2.8.

compare

public int compare(Object source,
                   Object target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Specified by:
compare in interface Comparator
Parameters:
source - the source String.
target - the target String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws:
NullPointerException - thrown if either arguments is null. IllegalArgumentException thrown if either source or target is not of the class String.
See Also:
CollationKey, getCollationKey(java.lang.String)
Status:
Stable ICU 2.8.

equals

public boolean equals(String source,
                      String target)
Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.

Parameters:
source - the source string to be compared.
target - the target string to be compared.
Returns:
true if the strings are equal according to the collation rules, otherwise false.
Throws:
NullPointerException - thrown if either arguments is null.
See Also:
compare(java.lang.Object, java.lang.Object)
Status:
Stable ICU 2.8.

getTailoredSet

public UnicodeSet getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator.

Returns:
a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the UCA.
Status:
Stable ICU 2.4.

compare

public abstract int compare(String source,
                            String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Parameters:
source - the source String.
target - the target String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws:
NullPointerException - thrown if either arguments is null.
See Also:
CollationKey, getCollationKey(java.lang.String)
Status:
Stable ICU 2.8.

getCollationKey

public abstract CollationKey getCollationKey(String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.

See the CollationKey class documentation for more information.

Parameters:
source - the string to be transformed into a CollationKey.
Returns:
the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.
See Also:
CollationKey, compare(String, String), getRawCollationKey(java.lang.String, com.ibm.icu.text.RawCollationKey)
Status:
Stable ICU 2.8.

getRawCollationKey

public abstract RawCollationKey getRawCollationKey(String source,
                                                   RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key. If key has a internal byte array of length that's too small for the result, the internal byte array will be grown to the exact required size.

Parameters:
source - the text String to be transformed into a RawCollationKey
Returns:
If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned.
See Also:
compare(String, String), getCollationKey(java.lang.String), RawCollationKey
Status:
Stable ICU 2.8.

setVariableTop

public abstract int setVariableTop(String varTop)

Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

Sets the variable top to a collation element value of a string supplied.

Parameters:
varTop - one or more (if contraction) characters to which the variable top should be set
Returns:
a int value containing the value of the variable top in upper 16 bits. Lower 16 bits are undefined.
Throws:
IllegalArgumentException - is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when it is a contraction that does not exist in the Collation order or when the PRIMARY strength collation element for the variable top has more than two bytes
See Also:
getVariableTop(), RuleBasedCollator.setAlternateHandlingShifted(boolean)
Status:
Stable ICU 2.6.

getVariableTop

public abstract int getVariableTop()
Gets the variable top value of a Collator. Lower 16 bits are undefined and should be ignored.

Returns:
the variable top value of a Collator.
See Also:
setVariableTop(java.lang.String)
Status:
Stable ICU 2.6.

setVariableTop

public abstract void setVariableTop(int varTop)
Sets the variable top to a collation element value supplied. Variable top is set to the upper 16 bits. Lower 16 bits are ignored.

Parameters:
varTop - Collation element value, as returned by setVariableTop or getVariableTop
See Also:
getVariableTop(), setVariableTop(java.lang.String)
Status:
Stable ICU 2.6.

getVersion

public abstract VersionInfo getVersion()
Get the version of this collator object.

Returns:
the version object associated with this collator
Status:
Stable ICU 2.8.

getUCAVersion

public abstract VersionInfo getUCAVersion()
Get the UCA version of this collator object.

Returns:
the version object associated with this collator
Status:
Stable ICU 2.8.

getLocale

public final ULocale getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null. This may may differ from the locale requested at the time of this object's creation. For example, if an object is created for locale en_US_CALIFORNIA, the actual data may be drawn from en (the actual locale), and en_US may be the most specific locale that exists (the valid locale).

Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The * actual locale is returned correctly, but the valid locale is not, in most cases.

Parameters:
type - type of information requested, either ULocale.VALID_LOCALE or ULocale.ACTUAL_LOCALE.
Returns:
the information specified by type, or null if this object was not constructed from locale data.
See Also:
ULocale, ULocale.VALID_LOCALE, ULocale.ACTUAL_LOCALE
Status:
Draft ICU 2.8 (retain).


Copyright (c) 2009 IBM Corporation and others.