From version 0.4 onwards an
On The Fly Decompression feature was introduced in BeleniX that enabled
the filesystem on the CDROM to be compressed. The data is decompressed
as and when requests for disk blocks are made by the OS and apps.
Compressing data on the CDROM has 2 benefits :
- Compression enables more software to be crammed onto
the CDROM. BeleniX uses zlib with max compression level that allows 1.8
GB of data to be put on one CD.
- Compressing data reduces access time as more data is
transferred into RAM per I/O operation. It also minimizes seeking of
the CDROM head which is an expensive operation. This reduces time taken
to boot and start apps as the same data is now physically stored in a
much smaller surface area on the CD. Thus the CDROM Drive head has to
move less.
BeleniX uses a quite simple compression technique. The Zlib
decompression code is already available as the zmod kernel module in
OpenSolaris that simplified the task of implementing this feature.
Since we are concerned with only reading data off the CDROM read-only
compression is used. One will not be able to write to the compressed
data once it is generated.
The lofi(7D) kernel module was modified to implement compression. lofi
is a pseudo loopback block device module that enables a file to be
viewed and accessed as a block device. So one can have a filesystem in
a file if it is managed via lofi. Lofi is commonly used in OpenSolaris
to mount ISO images and see their contents.
In BeleniX the entire contents of "/usr" which is on the CDROM is
compressed. The steps that BeleniX uses to generate and use the
compressed file are:
- Copy all the required files to "/usr" in a staging
area.
- Generate an ISO filesystem (hsfs) image of this
"/usr". Hsfs (High Sierra Filesystem) is used since it is
lightweight.
- Compress this ISO image using a custom utility that
generate a specially formatted compressed file.
- This compressed file is then included in the final
bootable ISO image of BeleniX.
- While booting in BeleniX the CDROM is mounted.
Subsequently the compressed file is added as a block device via lofi.
- This pseudo block device is then mounted onto "/usr"
as a hsfs filesystem.
The basic idea here is as
follows:
- Split the given file into fixed size segments
(typically 64K)
- Compress each segment individually and store them
sequentially in another file.
- An array is used to store the starting offset of each
compressed segment in the file.
- Finally a header, the array (also called index) from
steps 2-3 and the compressed segments are copied into the final
compressed file.
- The header contains the following: An 8-byte
signature, The segment size, number of segments (This is also the array
size), uncompressed size of last segment - since the file size may not
be a multiple of the segment size.
- The array that follows the header is used as an index
to get to individual compressed segments. The size of an individual
compressed segment is derived by subtracting it's start offset from the
start offset of the next segment. Thus the array contains an extra
sentinel entry at the end to avoid having an extra check for the last
segment.
Compressed
File Format
The following steps are taken by the modified lofi module to enable
reading from a compressed file:
- When lofiadm(1M) is used to add a file to be managed
via lofi, it ultimately results in a call to lofi_map_file in the lofi
module that does all the preprocessing necessary to open the file and
initialize data structures including a faked-up disk geometry.
- In addition to the above the modified lofi also reads
the first 8 bytes of the file and check for a signature.
- If a proper signature is found then it reads the
header components into memory and initializes the data structures. In
addition it also reads the entire index (array) into memory. This is
currently a series of uint32_t integers. The 4GB addressing range
provided by uint32_t is enough to store upto 12GB of data in one 4GB
compressed file. The array does not occupy too much kernel memory.
About 128 KB of memory is required for a compressed file whose
uncompressed size is 2GB.
- Once lofi receives a request to read some #X number
of bytes from an offset #N in compressed file, it first computes the
starting and ending compressed segment numbers that will contain the
requested data. Since compressed segments are of fixed size, this is
done easily via integer division and modulus. However as an added
optimization the segment size is enforced to be a power of 2. So
bitwise operations can be used instead of division and modulus.
- The file offsets and ranges of the compressed
segments are extracted from the index array and the start offset is
aligned on a disk block boundary (512 bytes at present). These bytes
are then read into memory via segmap_fault in lofi_mapped_rdwr.
- The data read in at step #5 contains all the
compressed segments required. Lofi then loops through the compressed
segments in memory and uncompresses them one by one. Once a segment is
uncompressed the relevant portion of that uncompressed segment that
contains a part of the original data requested is appended to the
buffer provided to lofi by the caller. This computation is a little
tricky as the required data may begin in the middle of one unompressed
segment and end in the middle of another.
- Once the loop ends we return to the caller. The
uncompressed segments have a one byte segment header in the first byte.
This is currently used to indicate whether the segment is at all
compressed or not. A segment will be compressed only if the compression
reduces the segment size below a certain threshold. Currently this
threshold is set at 63K. This avoids the overhead of compressing
segments where there is little gain from compression.
- The offset values in the index array and the header
values are stored on disk in network byte order. In addition the array
indicates the first segment to begin at offset 0 and subsequent
segments at subsequents offsets . This is adjusted to add in the header
and index size when the array is read into memory.
Reading from compressed and normal files
via lofi