Skip to content

Latest commit

 

History

History
174 lines (138 loc) · 5.02 KB

README.md

File metadata and controls

174 lines (138 loc) · 5.02 KB

libhuffman - The Huffman coding library

The Huffman library is a simple, pure C library for encoding and decoding data using a frequency-sorted binary tree. The implementation of this library is pretty straightforward, additional information regarding Huffman coding could be gained from the Wikipedia.

Installation

The build mechanism of the library is based on the CMake tool, so you could easily install it on your distribution in the following way:

$ sudo cmake install

By default the command above install the library into /usr/local/lib and all required headers into /usr/local/include. The installation process is organized using CMake. Just create a new directory build and generate required makefiles:

$ mkdir -p build
$ cmake ..

After that run the install target:

$ make install

Usage

Encoding

To encode the data, use either a file stream huf_fdopen or huf_memopen to use an in-memory stream. Consider the following example, where the input is a memory stream and output of the encoder is also memory buffer of 1MiB size.

void *bufin, *bufout = NULL;
huf_read_writer_t *input, *output = NULL;

// Allocate the necessary memory.
huf_memopen(&input, &bufin, HUF_1MIB_BUFFER);
huf_memopen(&output, &bufout, HUF_1MIB_BUFFER);

// Write the data for encoding to the input.
size_t input_len = 10;
input->write(input->stream, "0123456789", input_len);

Create a configuration used to encode the input string using Huffman algorithm:

huf_config_t config = {
   .reader = input,
   .length = input_len,
   .writer = output,
   .blocksize = HUF_64KIB_BUFFER,
};

huf_error_t err = huf_encode(&config);
printf("%s\n", huf_error_string(err));
  • reader - input ready for the encoding.
  • writer - output for the encoded data.
  • length - length of the data in bytes to encode.
  • blocksize - the length of each chunk in bytes (instead of reading the file twice libhuffman reads and encodes data by blocks).
  • reader_buffer_size - this is opaque reader buffer size in bytes, if the buffer size is set to zero, all reads will be unbuffered.
  • writer_buffer_size - this is opaque writer buffer size ib bytes, if the buffer size is set to zero, all writes will be unbuffered.

After the encoding, the output memory buffer could be automatically scaled to fit all necessary encoded bytes. To retrieve a new length of the buffer, use the following:

size_t out_size = 0;
huf_memlen(output, &out_size);

// The data is accessible through the `bufout` variable or using `read` function:
uint8_t result[10] = {0};
size_t result_len = 10;

// result_len is inout parameter, and will contain the length of encoding
// after the reading from the stream.
output->read(output->stream, result, &result_len);

Decoding

Decoding is similar to the encoding, except that reader attribute of the configuration should contain the data used to decode:

input->write(input->stream, decoding, decoding_len);

huf_config_t config = {
    .reader = input,
    .length = input_len,
    .writer = output,
    .blockize = HUF_64KIB_BUFFER,
};


// After the decoding the original data will be writter to the `output`.
huf_decode(&config);

Resource Deallocation

Once the processing of the encoding is completed, consider freeing the allocated memory:

// This does not free underlying buffer, call free for the buffer.
huf_memclose(&mem_out);

free(buf);

For more examples, please, refer to the tests directory.

Python Bindings

Python bindings for libhuffman library are distributed as PyPI package, to install that package, execute the following command:

pip install huffmanfile

You can use the libhuffman for performant compression and decompression of Huffman encoding. The interface of the Python library is similar to the interface of the bz2 and lzma packages from Python's standard library.

Examples of usage

Reading in a compressed file:

import huffmanfile
with huffmanfile.open("file.hm") as f:
    file_content = f.read()

Creating a compressed file:

import huffmanfile
data = b"Insert Data Here"
with huffmanfile.open("file.hm", "w") as f:
    f.write(data)

Compressing data in memory:

import huffmanfile
data_in = b"Insert Data Here"
data_out = huffmanfile.compress(data_in)

Incremental compression:

import huffmanfile
hfc = huffmanfile.HuffmanCompressor()
out1 = hfc.compress(b"Some data\n")
out2 = hfc.compress(b"Another piece of data\n")
out3 = hfc.compress(b"Even more data\n")
out4 = hfc.flush()
# Concatenate all the partial results:
result = b"".join([out1, out2, out3, out4])

Note, random data tends to compress poorly, while ordered, repetitive data usually yields a high compression ratio.

License

The Huffman library is distributed under MIT license, therefore you are free to do with code whatever you want. See the LICENSE file for full license text.