com.ibm.icu.text
Class UnicodeCompressor

java.lang.Object
  extended by com.ibm.icu.text.UnicodeCompressor

public final class UnicodeCompressor
extends Object

A compression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.

USAGE

The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:

  String s = ... ; // get string from somewhere
  byte [] compressed = UnicodeCompressor.compress(s);
 

The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:

  // Compress an array "chars" of length "len" using a buffer of 512 bytes
  // to the OutputStream "out"

  UnicodeCompressor myCompressor         = new UnicodeCompressor();
  final static int  BUFSIZE              = 512;
  byte []           byteBuffer           = new byte [ BUFSIZE ];
  int               bytesWritten         = 0;
  int []            unicharsRead         = new int [1];
  int               totalCharsCompressed = 0;
  int               totalBytesWritten    = 0;

  do {
    // do the compression
    bytesWritten = myCompressor.compress(chars, totalCharsCompressed, 
                                         len, unicharsRead,
                                         byteBuffer, 0, BUFSIZE);

    // do something with the current set of bytes
    out.write(byteBuffer, 0, bytesWritten);

    // update the no. of characters compressed
    totalCharsCompressed += unicharsRead[0];

    // update the no. of bytes written
    totalBytesWritten += bytesWritten;

  } while(totalCharsCompressed < len);

  myCompressor.reset(); // reuse compressor
 

Author:
Stephen F. Booth
See Also:
UnicodeDecompressor
Status:
Stable ICU 2.4.

Field Summary
static int ARMENIANINDEX
           
static int COMPRESSIONOFFSET
           
static int GREEKINDEX
           
static int HALFWIDTHKATAKANAINDEX
           
static int HIRAGANAINDEX
           
static int INVALIDCHAR
           
static int INVALIDWINDOW
           
static int IPAEXTENSIONINDEX
           
static int KATAKANAINDEX
           
static int LATININDEX
           
static int MAXINDEX
           
static int NUMSTATICWINDOWS
           
static int NUMWINDOWS
           
static int RESERVEDINDEX
           
static int SCHANGE0
           
static int SCHANGE1
           
static int SCHANGE2
           
static int SCHANGE3
           
static int SCHANGE4
           
static int SCHANGE5
           
static int SCHANGE6
           
static int SCHANGE7
           
static int SCHANGEU
           
static int SDEFINE0
           
static int SDEFINE1
           
static int SDEFINE2
           
static int SDEFINE3
           
static int SDEFINE4
           
static int SDEFINE5
           
static int SDEFINE6
           
static int SDEFINE7
           
static int SDEFINEX
           
static int SINGLEBYTEMODE
           
static int[] sOffsets
          Static compression window offsets
static int[] sOffsetTable
          For window offset mapping
static int SQUOTE0
           
static int SQUOTE1
           
static int SQUOTE2
           
static int SQUOTE3
           
static int SQUOTE4
           
static int SQUOTE5
           
static int SQUOTE6
           
static int SQUOTE7
           
static int SQUOTEU
           
static int SRESERVED
           
static int UCHANGE0
           
static int UCHANGE1
           
static int UCHANGE2
           
static int UCHANGE3
           
static int UCHANGE4
           
static int UCHANGE5
           
static int UCHANGE6
           
static int UCHANGE7
           
static int UDEFINE0
           
static int UDEFINE1
           
static int UDEFINE2
           
static int UDEFINE3
           
static int UDEFINE4
           
static int UDEFINE5
           
static int UDEFINE6
           
static int UDEFINE7
           
static int UDEFINEX
           
static int UNICODEMODE
           
static int UQUOTEU
           
static int URESERVED
           
 
Constructor Summary
UnicodeCompressor()
          Create a UnicodeCompressor.
 
Method Summary
static byte[] compress(char[] buffer, int start, int limit)
          Compress a Unicode character array into a byte array.
 int compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit)
          Compress a Unicode character array into a byte array.
static byte[] compress(String buffer)
          Compress a string into a byte array.
 void reset()
          Reset the compressor to its initial state.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMPRESSIONOFFSET

public static final int COMPRESSIONOFFSET
See Also:
Constant Field Values

NUMWINDOWS

public static final int NUMWINDOWS
See Also:
Constant Field Values

NUMSTATICWINDOWS

public static final int NUMSTATICWINDOWS
See Also:
Constant Field Values

INVALIDWINDOW

public static final int INVALIDWINDOW
See Also:
Constant Field Values

INVALIDCHAR

public static final int INVALIDCHAR
See Also:
Constant Field Values

SINGLEBYTEMODE

public static final int SINGLEBYTEMODE
See Also:
Constant Field Values

UNICODEMODE

public static final int UNICODEMODE
See Also:
Constant Field Values

MAXINDEX

public static final int MAXINDEX
See Also:
Constant Field Values

RESERVEDINDEX

public static final int RESERVEDINDEX
See Also:
Constant Field Values

LATININDEX

public static final int LATININDEX
See Also:
Constant Field Values

IPAEXTENSIONINDEX

public static final int IPAEXTENSIONINDEX
See Also:
Constant Field Values

GREEKINDEX

public static final int GREEKINDEX
See Also:
Constant Field Values

ARMENIANINDEX

public static final int ARMENIANINDEX
See Also:
Constant Field Values

HIRAGANAINDEX

public static final int HIRAGANAINDEX
See Also:
Constant Field Values

KATAKANAINDEX

public static final int KATAKANAINDEX
See Also:
Constant Field Values

HALFWIDTHKATAKANAINDEX

public static final int HALFWIDTHKATAKANAINDEX
See Also:
Constant Field Values

SDEFINEX

public static final int SDEFINEX
See Also:
Constant Field Values

SRESERVED

public static final int SRESERVED
See Also:
Constant Field Values

SQUOTEU

public static final int SQUOTEU
See Also:
Constant Field Values

SCHANGEU

public static final int SCHANGEU
See Also:
Constant Field Values

SQUOTE0

public static final int SQUOTE0
See Also:
Constant Field Values

SQUOTE1

public static final int SQUOTE1
See Also:
Constant Field Values

SQUOTE2

public static final int SQUOTE2
See Also:
Constant Field Values

SQUOTE3

public static final int SQUOTE3
See Also:
Constant Field Values

SQUOTE4

public static final int SQUOTE4
See Also:
Constant Field Values

SQUOTE5

public static final int SQUOTE5
See Also:
Constant Field Values

SQUOTE6

public static final int SQUOTE6
See Also:
Constant Field Values

SQUOTE7

public static final int SQUOTE7
See Also:
Constant Field Values

SCHANGE0

public static final int SCHANGE0
See Also:
Constant Field Values

SCHANGE1

public static final int SCHANGE1
See Also:
Constant Field Values

SCHANGE2

public static final int SCHANGE2
See Also:
Constant Field Values

SCHANGE3

public static final int SCHANGE3
See Also:
Constant Field Values

SCHANGE4

public static final int SCHANGE4
See Also:
Constant Field Values

SCHANGE5

public static final int SCHANGE5
See Also:
Constant Field Values

SCHANGE6

public static final int SCHANGE6
See Also:
Constant Field Values

SCHANGE7

public static final int SCHANGE7
See Also:
Constant Field Values

SDEFINE0

public static final int SDEFINE0
See Also:
Constant Field Values

SDEFINE1

public static final int SDEFINE1
See Also:
Constant Field Values

SDEFINE2

public static final int SDEFINE2
See Also:
Constant Field Values

SDEFINE3

public static final int SDEFINE3
See Also:
Constant Field Values

SDEFINE4

public static final int SDEFINE4
See Also:
Constant Field Values

SDEFINE5

public static final int SDEFINE5
See Also:
Constant Field Values

SDEFINE6

public static final int SDEFINE6
See Also:
Constant Field Values

SDEFINE7

public static final int SDEFINE7
See Also:
Constant Field Values

UCHANGE0

public static final int UCHANGE0
See Also:
Constant Field Values

UCHANGE1

public static final int UCHANGE1
See Also:
Constant Field Values

UCHANGE2

public static final int UCHANGE2
See Also:
Constant Field Values

UCHANGE3

public static final int UCHANGE3
See Also:
Constant Field Values

UCHANGE4

public static final int UCHANGE4
See Also:
Constant Field Values

UCHANGE5

public static final int UCHANGE5
See Also:
Constant Field Values

UCHANGE6

public static final int UCHANGE6
See Also:
Constant Field Values

UCHANGE7

public static final int UCHANGE7
See Also:
Constant Field Values

UDEFINE0

public static final int UDEFINE0
See Also:
Constant Field Values

UDEFINE1

public static final int UDEFINE1
See Also:
Constant Field Values

UDEFINE2

public static final int UDEFINE2
See Also:
Constant Field Values

UDEFINE3

public static final int UDEFINE3
See Also:
Constant Field Values

UDEFINE4

public static final int UDEFINE4
See Also:
Constant Field Values

UDEFINE5

public static final int UDEFINE5
See Also:
Constant Field Values

UDEFINE6

public static final int UDEFINE6
See Also:
Constant Field Values

UDEFINE7

public static final int UDEFINE7
See Also:
Constant Field Values

UQUOTEU

public static final int UQUOTEU
See Also:
Constant Field Values

UDEFINEX

public static final int UDEFINEX
See Also:
Constant Field Values

URESERVED

public static final int URESERVED
See Also:
Constant Field Values

sOffsetTable

public static final int[] sOffsetTable
For window offset mapping


sOffsets

public static final int[] sOffsets
Static compression window offsets

Constructor Detail

UnicodeCompressor

public UnicodeCompressor()
Create a UnicodeCompressor. Sets all windows to their default values.

See Also:
reset()
Status:
Stable ICU 2.4.
Method Detail

compress

public static byte[] compress(String buffer)
Compress a string into a byte array.

Parameters:
buffer - The string to compress.
Returns:
A byte array containing the compressed characters.
See Also:
compress(char [], int, int)
Status:
Stable ICU 2.4.

compress

public static byte[] compress(char[] buffer,
                              int start,
                              int limit)
Compress a Unicode character array into a byte array.

Parameters:
buffer - The character buffer to compress.
start - The start of the character run to compress.
limit - The limit of the character run to compress.
Returns:
A byte array containing the compressed characters.
See Also:
compress(String)
Status:
Stable ICU 2.4.

compress

public int compress(char[] charBuffer,
                    int charBufferStart,
                    int charBufferLimit,
                    int[] charsRead,
                    byte[] byteBuffer,
                    int byteBufferStart,
                    int byteBufferLimit)
Compress a Unicode character array into a byte array. This function will only consume input that can be completely output.

Parameters:
charBuffer - The character buffer to compress.
charBufferStart - The start of the character run to compress.
charBufferLimit - The limit of the character run to compress.
charsRead - A one-element array. If not null, on return the number of characters read from charBuffer.
byteBuffer - A buffer to receive the compressed data. This buffer must be at minimum four bytes in size.
byteBufferStart - The starting offset to which to write compressed data.
byteBufferLimit - The limiting offset for writing compressed data.
Returns:
The number of bytes written to byteBuffer.
Status:
Stable ICU 2.4.

reset

public void reset()
Reset the compressor to its initial state.

Status:
Stable ICU 2.4.


Copyright (c) 2009 IBM Corporation and others.