Class SoraniNormalizer


  • public class SoraniNormalizer
    extends java.lang.Object
    Normalizes the Unicode representation of Sorani text.

    Normalization consists of:

    • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
    • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
    • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
    • Alternate (joining) form of 'h' (06BE) is converted to 0647
    • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
    • Harakat, tatweel, and formatting characters such as directional controls are removed.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) static char AE  
      (package private) static char DAMMA  
      (package private) static char DAMMATAN  
      (package private) static char DOTLESS_YEH  
      (package private) static char FARSI_YEH  
      (package private) static char FATHA  
      (package private) static char FATHATAN  
      (package private) static char HEH  
      (package private) static char HEH_DOACHASHMEE  
      (package private) static char KAF  
      (package private) static char KASRA  
      (package private) static char KASRATAN  
      (package private) static char KEHEH  
      (package private) static char REH  
      (package private) static char RREH  
      (package private) static char RREH_ABOVE  
      (package private) static char SHADDA  
      (package private) static char SUKUN  
      (package private) static char TATWEEL  
      (package private) static char TEH_MARBUTA  
      (package private) static char YEH  
      (package private) static char ZWNJ  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int normalize​(char[] s, int len)
      Normalize an input buffer of Sorani text
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait