Skip to content

Compression Choices

William Silversmith edited this page Aug 4, 2022 · 11 revisions

CloudVolume has many different codecs to choose from for each compression type. Here is a short guide (to be improved upon) to give some guidance on which one to choose.

Some encodings can be layered with a second stage bitstream compression. We support gzip and brotli (br) mainly because that is what browsers (and hence Neuroglancer) automatically support. It is possible in the future to add support for e.g. zstd but Neuroglancer would have to have a codec for it. Note that brotli is not supported for sharded data currently (Neuroglancer only has a gzip decompression JS module).

EM Images

Generally grayscale 8 or 16 bit electron or light microscopy images.

Choices: raw, raw+gzip, raw+br, png, jpeg

  • If you can tolerate lossy compression, jpeg will be very fast and give the best compression.
  • PNG will give the best lossless compression by about 25% but at the expense of speed.
  • raw+gzip and raw+br have slightly different performance profiles but will give similar compression at the default settings.
  • raw means uncompressed. Very fast on SSD, not so much on remote networks. Untenable for large datasets.
  • jpeg does not support 16-bit images (it technically does, but requires special recompilation of the library so no).

Segmentation

These are usually uint32 or uint64 densely labeled data.

Choices: raw, compressed_segmentation (cseg), compresso (all +gzip or +br)

  • For smooth segmentation, generally go with compresso+br for the best compression ratio and almost top performance.
  • For noisy segmentation, go with cseg+br for the best compression and top performance.

Compresso and cseg are both codecs designed for connectomics data. Compresso is a novel high compression codec.

Voxel-Wise Affinities

Intermediate float32 xyz neighbor affinity predictions used for creating segmentation and region graphs. These are very heavy, 12x bigger than the base image. More information: https://github.com/seung-lab/cloud-volume/wiki/Advanced-Topic:-fpzip-and-kempressed-Encodings

Choices: raw, raw+gz, raw+br, fpzip, kempressed

Alignment Vectors

These are usually float32 images with an X and Y component. Some older versions are int16 to which this advice does not apply.

Choices: raw, raw+gzip, raw+br, fpzip, zfpc

  • The current best choice is to use raw+br
  • zfpc is an experimental lossy compression choice that will likely be the go-to option in the future. Don't pick it for now unless you are in communication with Will Silversmith. It is not visualizable in Neuroglancer yet.
  • The Seung Lab Neuroglancer fork can visualize fpzip: https://github.com/william-silversmith/neuroglancer/tree/wms_fpzip