This is a collection of scripts for visualizing plankton images from the Kaggle National Data Science Bowl competition. Images within each training class are compiled into a single mosaic image, and a bubble plot is created which groups mosaics according to the provided taxonomy. Due to the large size of this image it is saved as a tile pyramid which can be viewed with the included Polymaps viewer.
These scripts require numpy
, scipy
, PIL
and matplotlib
. Executing machine_setup.sh
on a fresh Ubuntu 14.04 installation will install all necessary packages.
Clone this repository into a directory adjecent to your data directory, containing the training images in data/train
.
.
├── data
│ └── train
└── visualization
python make_mosaics.py
will generate mosaic images in themosaics
subdirectory.python make_bubbleplot.py
will generate the bubbleplot and write tile images into thepyramid
subdirectory.- After generation, the bubble plot can be viewed by opening the file
viewer/index.html
in a web browser.
Memory usage is currently very unoptimized, and at least 16GB of RAM is required to render the bubble plot. Use of an Amazon EC2 m3.2xlarge
instance is recommended. Numpy memmap
can be used to reduce this, but computation time is greatly increased. Mosaic creation (make_mosaics.py
) is much less memory intensive.
Zoomed out completely
Zooming in on one group
A single mosaic