User Guide

PPM, Prediction by partial matching, is a wellknown compression technique based on context modeling and prediction. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream.

PPMd is an implementation of PPMII by Dmitry Shkarin.

The ppmd-cffi package uses core C files from p7zip. The library has a bare function and no metadata/header handling functions. This means you should know compression parameters and input/output data sizes.

Getting started

Install

The ppmd-cffi is written by Python and C language bound with CFFI, and can be downloaded from PyPI(aka. Python Package Index) using standard ‘pip’ command as like follows;

$ pip install ppmd-cffi

Command line

ppmd-cffi provide command line script to hande .ppmd file.

To compress file

$ ppmd target.dat

To decompress ppmd file

$ ppmd -x target.ppmd

To decompress to STDOUT

$ ppmd -x -c target.ppmd

Programming Interfaces

.ppmd file comression/decompression

ppmd-cffi project provide two functions which compress and decompress .ppmd archive file. PpmdCompressor class provide compress function compress() and PpmdDecompressor class provide extraction function decompress().

Both classes takes version= argument which default is 8, means PPMd Ver. I. Also classes takes target, fname and ftime arguments which is a target file and its properties. target should be a file-like object which has write() method. fname and ftime is a file property which is stored in archive as meta data. fname should be string, and ftime should be a datetime object.

order and mem_in_mb parameters will be vary.

Compression with PPMd ver. H

targetfile = pathlib.Path('target.dat')
fname = 'target.dat'
ftime = datetime.utcfromtimestamp(targetfile.stat().st_mtime)
archivefile = 'archive.ppmd'
order = 6
mem_in_mb = 16
with archivefile.open('wb') as target:
    with targetfile.open('rb') as src:
        with PpmdCompressor(target, fname, ftime, order, mem_in_mb, version=7) as compressor:
            compressor.compress(src)

Compression with PPMd ver. I

targetfile = pathlib.Path('target.dat')
fname = 'target.dat'
ftime = datetime.utcfromtimestamp(targetfile.stat().st_mtime)
archivefile = 'archive.ppmd'
order = 6
mem_in_mb = 8
with archivefile.open('wb') as target:
    with targetfile.open('rb') as src:
        with PpmdCompressor(target, fname, ftime, order, mem_in_mb, version=8) as compressor:
            compressor.compress(src)

Decompression

When construct PpmdDecompressor object, it read header from specified archive file. The header hold a compression parameters such as PPMd version, order and memory size. It also has a filename and timestamp. PpmdDecompressor select a proper decoder based on header data. You need to handle filename and timestamp by your self. A decompressor method will write data to specified file-like object, which should have write() method.

targetfile = pathlib.Path('target.ppmd')
with targetfile.open('rb') as target:
    with PpmdDecompressor(target, target_size) as decompressor:
        extractedfile = pathlib.Path(parent.joinpath(decompressor.filename))
        with extractedfile.open('wb') as ofile:
            decompressor.decompress(ofile)
            timestamp = datetime_to_timestamp(decompressor.ftime)
            os.utime(str(extractedfile), times=(timestamp, timestamp))

Bare encoding/decoding PPMd data

There are several classes to handle bare PPMd data. Note: mem parameter should be as bytes not MB.

  • Ppmd7Encoder(dst, order, mem)

  • Ppmd7Decoder(src, order, mem)

  • Ppmd8Encoder(det, order, mem, restore)

  • Ppmd8Decoder(src, order, mem, restore)