com.ibm.icu.text

Class UnicodeDecompressor

public final class UnicodeDecompressor extends Object implements SCSU

A decompression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

USAGE

The static methods on UnicodeDecompressor may be used in a straightforward manner to decompress simple strings:

  byte [] compressed = ... ; // get compressed bytes from somewhere
  String result = UnicodeDecompressor.decompress(compressed);
 

The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeDecompressor offers more powerful APIs allowing iterative decompression:

  // Decompress an array "bytes" of length "len" using a buffer of 512 chars
  // to the Writer "out"

  UnicodeDecompressor myDecompressor         = new UnicodeDecompressor();
  final static int    BUFSIZE                = 512;
  char []             charBuffer             = new char [ BUFSIZE ];
  int                 charsWritten           = 0;
  int []              bytesRead              = new int [1];
  int                 totalBytesDecompressed = 0;
  int                 totalCharsWritten      = 0;

  do {
    // do the decompression
    charsWritten = myDecompressor.decompress(bytes, totalBytesDecompressed, 
                                             len, bytesRead,
                                             charBuffer, 0, BUFSIZE);

    // do something with the current set of chars
    out.write(charBuffer, 0, charsWritten);

    // update the no. of bytes decompressed
    totalBytesDecompressed += bytesRead[0];

    // update the no. of chars written
    totalCharsWritten += charsWritten;

  } while(totalBytesDecompressed < len);

  myDecompressor.reset(); // reuse decompressor
 

Decompression is performed according to the standard set forth in Unicode Technical Report #6

Author: Stephen F. Booth

See Also:

UNKNOWN: ICU 2.4

Constructor Summary
UnicodeDecompressor()
Create a UnicodeDecompressor.
Method Summary
static Stringdecompress(byte[] buffer)
Decompress a byte array into a String.
static char[]decompress(byte[] buffer, int start, int limit)
Decompress a byte array into a Unicode character array.
intdecompress(byte[] byteBuffer, int byteBufferStart, int byteBufferLimit, int[] bytesRead, char[] charBuffer, int charBufferStart, int charBufferLimit)
Decompress a byte array into a Unicode character array.
voidreset()
Reset the decompressor to its initial state.

Constructor Detail

UnicodeDecompressor

public UnicodeDecompressor()
Create a UnicodeDecompressor. Sets all windows to their default values.

See Also: UnicodeDecompressor

UNKNOWN: ICU 2.4

Method Detail

decompress

public static String decompress(byte[] buffer)
Decompress a byte array into a String.

Parameters: buffer The byte array to decompress.

Returns: A String containing the decompressed characters.

See Also: (byte [], int, int)

UNKNOWN: ICU 2.4

decompress

public static char[] decompress(byte[] buffer, int start, int limit)
Decompress a byte array into a Unicode character array.

Parameters: buffer The byte array to decompress. start The start of the byte run to decompress. limit The limit of the byte run to decompress.

Returns: A character array containing the decompressed bytes.

See Also: (byte [])

UNKNOWN: ICU 2.4

decompress

public int decompress(byte[] byteBuffer, int byteBufferStart, int byteBufferLimit, int[] bytesRead, char[] charBuffer, int charBufferStart, int charBufferLimit)
Decompress a byte array into a Unicode character array. This function will either completely fill the output buffer, or consume the entire input.

Parameters: byteBuffer The byte buffer to decompress. byteBufferStart The start of the byte run to decompress. byteBufferLimit The limit of the byte run to decompress. bytesRead A one-element array. If not null, on return the number of bytes read from byteBuffer. charBuffer A buffer to receive the decompressed data. This buffer must be at minimum two characters in size. charBufferStart The starting offset to which to write decompressed data. charBufferLimit The limiting offset for writing decompressed data.

Returns: The number of Unicode characters written to charBuffer.

UNKNOWN: ICU 2.4

reset

public void reset()
Reset the decompressor to its initial state.

UNKNOWN: ICU 2.4