org.tukaani.xz
Class SeekableXZInputStream

java.lang.Object
  extended by java.io.InputStream
      extended by org.tukaani.xz.SeekableInputStream
          extended by org.tukaani.xz.SeekableXZInputStream
All Implemented Interfaces:
java.io.Closeable

public class SeekableXZInputStream
extends SeekableInputStream

Decompresses a .xz file in random access mode. This supports decompressing concatenated .xz files.

Each .xz file consist of one or more Streams. Each Stream consist of zero or more Blocks. Each Stream contains an Index of Streams' Blocks. The Indexes from all Streams are loaded in RAM by a constructor of this class. A typical .xz file has only one Stream, and parsing its Index will need only three or four seeks.

To make random access possible, the data in a .xz file must be splitted into multiple Blocks of reasonable size. Decompression can only start at a Block boundary. When seeking to an uncompressed offset that is not at a Block boundary, decompression starts at the beginning of the Block and throws away data until the target offset is reached. Thus, smaller Blocks mean faster seeks to arbitrary uncompressed offsets. On the other hand, smaller Blocks mean worse compression. So one has to make a compromise between random access speed and compression ratio.

Implementation note: This class uses linear search to locate the correct Stream from the data structures in RAM. It was the simplest to implement and should be fine as long as there aren't too many Streams. The correct Block inside a Stream is located using binary search and thus is fast even with a huge number of Blocks.

Memory usage

The amount of memory needed for the Indexes is taken into account when checking the memory usage limit. Each Stream is calculated to need at least 1 KiB of memory and each Block 16 bytes of memory, rounded up to the next kibibyte. So unless the file has a huge number of Streams or Blocks, these don't take significant amount of memory.

Creating random-accessible .xz files

When using XZOutputStream, a new Block can be started by calling its endBlock method. If you know that the decompressor will need to seek only to certain offsets, it can be a good idea to start a new Block at (some of) these offsets (and perhaps only at these offsets to get better compression ratio).

liblzma in XZ Utils supports starting a new Block with LZMA_FULL_FLUSH. XZ Utils 5.1.1alpha added threaded compression which creates multi-Block .xz files. XZ Utils 5.1.1alpha also added the option --block-size=SIZE to the xz command line tool.

See Also:
SeekableFileInputStream, XZInputStream, XZOutputStream

Constructor Summary
SeekableXZInputStream(SeekableInputStream in)
          Creates a new seekable XZ decompressor without a memory usage limit.
SeekableXZInputStream(SeekableInputStream in, int memoryLimit)
          Creates a new seekable XZ decomporessor with an optional memory usage limit.
 
Method Summary
 int available()
          Returns the number of uncompressed bytes that can be read without blocking.
 void close()
          Closes the stream and calls in.close().
 int getCheckTypes()
          Gets the types of integrity checks used in the .xz file.
 int getIndexMemoryUsage()
          Gets the amount of memory in kibibytes (KiB) used by the data structures needed to locate the XZ Blocks.
 long getLargestBlockSize()
          Gets the uncompressed size of the largest XZ Block in bytes.
 long length()
          Gets the uncompressed size of this input stream.
 long position()
          Gets the uncompressed position in this input stream.
 int read()
          Decompresses the next byte from this input stream.
 int read(byte[] buf, int off, int len)
          Decompresses into an array of bytes.
 void seek(long pos)
          Seeks to the specified absolute uncompressed position in the stream.
 
Methods inherited from class org.tukaani.xz.SeekableInputStream
skip
 
Methods inherited from class java.io.InputStream
mark, markSupported, read, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SeekableXZInputStream

public SeekableXZInputStream(SeekableInputStream in)
                      throws java.io.IOException
Creates a new seekable XZ decompressor without a memory usage limit.

Parameters:
in - seekable input stream containing one or more XZ Streams; the whole input stream is used
Throws:
XZFormatException - input is not in the XZ format
CorruptedInputException - XZ data is corrupt or truncated
UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
java.io.EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
java.io.IOException - may be thrown by in

SeekableXZInputStream

public SeekableXZInputStream(SeekableInputStream in,
                             int memoryLimit)
                      throws java.io.IOException
Creates a new seekable XZ decomporessor with an optional memory usage limit.

Parameters:
in - seekable input stream containing one or more XZ Streams; the whole input stream is used
memoryLimit - memory usage limit in kibibytes (KiB) or -1 to impose no memory usage limit
Throws:
XZFormatException - input is not in the XZ format
CorruptedInputException - XZ data is corrupt or truncated
UnsupportedOptionsException - XZ headers seem valid but they specify options not supported by this implementation
MemoryLimitException - decoded XZ Indexes would need more memory than allowed by the memory usage limit
java.io.EOFException - less than 6 bytes of input was available from in, or (unlikely) the size of the underlying stream got smaller while this was reading from it
java.io.IOException - may be thrown by in
Method Detail

getCheckTypes

public int getCheckTypes()
Gets the types of integrity checks used in the .xz file. Multiple checks are possible only if there are multiple concatenated XZ Streams.

The returned value has a bit set for every check type that is present. For example, if CRC64 and SHA-256 were used, the return value is (1 << XZ.CHECK_CRC64) | (1 << XZ.CHECK_SHA256).


getIndexMemoryUsage

public int getIndexMemoryUsage()
Gets the amount of memory in kibibytes (KiB) used by the data structures needed to locate the XZ Blocks. This is usually useless information but since it is calculated for memory usage limit anyway, it is nice to make it available to too.


getLargestBlockSize

public long getLargestBlockSize()
Gets the uncompressed size of the largest XZ Block in bytes. This can be useful if you want to check that the file doesn't have huge XZ Blocks which could make seeking to arbitrary offsets very slow. Note that huge Blocks don't automatically mean that seeking would be slow, for example, seeking to the beginning of any Block is always fast.


read

public int read()
         throws java.io.IOException
Decompresses the next byte from this input stream.

Specified by:
read in class java.io.InputStream
Returns:
the next decompressed byte, or -1 to indicate the end of the compressed stream
Throws:
CorruptedInputException
UnsupportedOptionsException
MemoryLimitException
XZIOException - if the stream has been closed
java.io.IOException - may be thrown by in

read

public int read(byte[] buf,
                int off,
                int len)
         throws java.io.IOException
Decompresses into an array of bytes.

If len is zero, no bytes are read and 0 is returned. Otherwise this will try to decompress len bytes of uncompressed data. Less than len bytes may be read only in the following situations:

Overrides:
read in class java.io.InputStream
Parameters:
buf - target buffer for uncompressed data
off - start offset in buf
len - maximum number of uncompressed bytes to read
Returns:
number of bytes read, or -1 to indicate the end of the compressed stream
Throws:
CorruptedInputException
UnsupportedOptionsException
MemoryLimitException
XZIOException - if the stream has been closed
java.io.IOException - may be thrown by in

available

public int available()
              throws java.io.IOException
Returns the number of uncompressed bytes that can be read without blocking. The value is returned with an assumption that the compressed input data will be valid. If the compressed data is corrupt, CorruptedInputException may get thrown before the number of bytes claimed to be available have been read from this input stream.

Overrides:
available in class java.io.InputStream
Returns:
the number of uncompressed bytes that can be read without blocking
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Closes the stream and calls in.close(). If the stream was already closed, this does nothing.

Specified by:
close in interface java.io.Closeable
Overrides:
close in class java.io.InputStream
Throws:
java.io.IOException - if thrown by in.close()

length

public long length()
Gets the uncompressed size of this input stream. If there are multiple XZ Streams, the total uncompressed size of all XZ Streams is returned.

Specified by:
length in class SeekableInputStream

position

public long position()
              throws java.io.IOException
Gets the uncompressed position in this input stream.

Specified by:
position in class SeekableInputStream
Throws:
XZIOException - if the stream has been closed
java.io.IOException

seek

public void seek(long pos)
          throws java.io.IOException
Seeks to the specified absolute uncompressed position in the stream. This only stores the new position, so this function itself is always very fast. The actual seek is done when read is called to read at least one byte.

Seeking past the end of the stream is possible. In that case read will return -1 to indicate the end of the stream.

Specified by:
seek in class SeekableInputStream
Parameters:
pos - new uncompressed read position
Throws:
XZIOException - if pos is negative, or if stream has been closed
java.io.IOException - if pos is negative or if a stream-specific I/O error occurs