Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 16 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,26 @@
# Brut
# Brut-v1.1

This repository contains the code and manuscript text used in the paper
This repository contains an updated version of Brut (https://github.com/ChrisBeaumont/brut) used in the paper

*The Milky Way Project: Leveraging Citizen Science and Machine Learning to Detect Interstellar Bubbles. Beaumont, Goodman, Kendrew, Williams, Simpson 2014, ApJS, in press ([arXiv link](http://arxiv.org/abs/1406.2692))*
Assessing the Performance of a Machine Learning Algorithm in Identifying Bubbles in Dust Emission, ApJ in press ([arXiv link](https://arxiv.org/abs/1711.03480))*

The `v1` tag represents the state of the code at the time of publication.
We make slight changes on the modules that Brut import. The current version Brut can be successfully run with the following libraries

Data associated with this project is also archived at [The Dataverse](http://thedata.harvard.edu/dvn/dv/brut) (doi:10.7910/DVN/26463)
* astropy '2.0.2'
* h5py '2.7.0'
* matplotlib '2.0.2'
* numpy '1.13.3'
* scipy '1.0.0'
* skimage '0.13.0'
* sklearn '0.19.1'
* cloud '2.8.5'

## High level summary
We update the retrained model in models/ directory.

Brut uses a database of known bubbles (from the [Milky Way Project](http://www.milkywayproject.org/)) and Spitzer images from our galaxy to build an automatic bubble classifier. The classifier is based on the Random Forest algorithm, and uses the [WiseRF](http://docs.wise.io/wiserf_python.html) implementation of this algorithm.

The main question that Brut attempts to answer is "does this image contain a bubble?" The images presented to Brut are 2-color square postage stamps extracted from 8 and 24 micron Spitzer images of the Galactic plane.

The [picloud](http://www.picloud.com/) platform was used to perform some of the computation in parallel, in the cloud.

If you want to dig into the details of how the model is built, start with the Makefile in the scripts/ directory.

## Organization

### bubbly/
Contains the python library used to fit Random Forest classification models to Spitzer images

### figures/
Contains code to generate figures in the paper

### notebooks/
Contains several IPython notebooks in various states of organization -- some are polished documents describing aspects of the analysis, others are temporary workbooks.

### paper/
Contains the manuscript text itself

### scripts/
Python scripts to fit models and generate other derived data products


## Reproduction

This repository is MIT Licensed.

To reproduce the figures and models generated for the paper, type:

```
python setup.py develop
cd bubbly/data && make
cd ../../paper && make
```

Though I promise you you'll have to play with dependencies to get this all set up :)

## Dependencies

Brut is built on top of several python libraries, and uses data from the GLIMPSE and MIPSGAL surveys from the Spitzer Space Telescope. You'll need the following libraries

* aplpy
* astropy
* h5py
* IPython
* matplotlib
* numpy
* scipy
* skimage
* sklearn
* picloud
* WiseRF
### models/
Contains the original training model and the model retrained on synthetic images and the orignial traning set.
The synthetic bubble images can be found here (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OSMNDG).

In addition, you need to download the GLIMPSE and MIPSGAL mosaic data. The Makefile inside bubbly/data does this.
82 changes: 79 additions & 3 deletions bubbly/extractors.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np
from scipy.optimize import fmin_l_bfgs_b as minimize
from skimage.morphology import disk
from skimage.filter.rank import percentile_autolevel
from skimage.filters.rank import autolevel_percentile
from skimage.feature import daisy

from .field import get_field
Expand Down Expand Up @@ -60,6 +60,25 @@ def extract(self, lon, l, b, r, **kwargs):
rgb = self._preprocess_rgb(rgb)
return self._extract_rgb(rgb)


def extract_xd(self, lon, l, b, r, **kwargs):
kwargs.setdefault('limits', [1, 97])
kwargs.setdefault('shp', self.shp)
# kwargs.setdefault('i3', True)

rgb = get_field(lon).extract_stamp(l, b, r, **kwargs)

if rgb is None:
raise ValueError("Field is out of bounds")
elif (rgb[:, :, 1] == 0).mean() > 0.1:
raise ValueError("Field has no green channel")

# rgb = self._preprocess_rgb(rgb)
rgb = self._preprocess_rgb(rgb)
# return self._extract_rgb(rgb)
return rgb


def _extract_rgb(self, rgb):
raise NotImplementedError()

Expand Down Expand Up @@ -182,7 +201,22 @@ def _prepare_templates(self, shp):
ts *= (s / 2) / 20
self.rings = np.column_stack(np.exp(-(r - rr) ** 2 / tt ** 2).ravel()
for rr, tt in product(rs, ts))

def _normal_templates_coeff(self, shp):
if self._template_shp == shp:
return
self._template_shp = shp

s = shp[0]
y, x = np.mgrid[0:s, 0:s].astype(np.float)
r = np.hypot(y - s / 2, x - s / 2)

rs = np.linspace(1., s / 2, 7)
ts = np.array([2, 4, 6, 8, 10, 15, 20]).astype(np.float)
ts *= (s / 2) / 20
return np.column_stack(np.exp(-(r - rr) ** 2 / tt ** 2).ravel()
for rr, tt in product(rs, ts))


def _extract_rgb(self, rgb):
self._prepare_templates(rgb.shape)
Expand All @@ -197,13 +231,18 @@ def _extract_rgb(self, rgb):
np.dot(rnorm, self.rings),
np.dot(gnorm, self.rings),
np.dot(rnorm - gnorm, self.rings)])
# result = np.hstack([np.dot(r, self.rings),
# np.dot(rnorm, self.rings)])
return result.reshape(1, -1)


class DaisyExtractor(Extractor):
def _extract_rgb(self, rgb):
kwargs = dict(step=rgb.shape[0]/5, radius=rgb.shape[0] / 10, rings=2,
histograms=6, orientations=8)

self.daisyextractor_xd=np.hstack(daisy(rgb[:, :, i], **kwargs).ravel() for i in [0, 1])

return np.hstack(daisy(rgb[:, :, i], **kwargs).ravel() for i in [0, 1])


Expand All @@ -212,12 +251,49 @@ def __init__(self, orig):
self.orig = orig

def extract(self, lon, l, b, r):

return np.hstack((self.orig.extract(lon, l, b, r),
self.orig.extract(lon, l, b, r / 2),
self.orig.extract(lon, l, b, r * 2),
self.orig.extract(lon, l, b + r / 2, r)))


def extract_xd_mve(self, lon, l, b, r):

return self.orig.extract_xd(lon, l, b, r)

def extract_xd_daisy(self, lon, l, b, r):
methodname = '_extract_rgb'
a = DaisyExtractor()
method_1 = getattr(a, methodname)
return method_1(self.orig.extract_xd(lon, l, b, r))

def extract_xd_ring(self, lon, l, b, r):
methodname = '_extract_rgb'
a = RingExtractor()
method_1 = getattr(a, methodname)
return method_1(self.orig.extract_xd(lon, l, b, r))

def extract_xd_compression(self, lon, l, b, r):
methodname = '_extract_rgb'
a = CompressionExtractor()
method_1 = getattr(a, methodname)
return method_1(self.orig.extract_xd(lon, l, b, r))

def extract_xd_wavelet(self, lon, l, b, r):
methodname = '_extract_rgb'
a = MultiWaveletExtractor()
method_1 = getattr(a, methodname)
return method_1(self.orig.extract_xd(lon, l, b, r))

def extract_xd_ring_templates(self, lon, l, b, r):
methodname = '_normal_templates_coeff'
a = RingExtractor()
method_1 = getattr(a, methodname)
return method_1(self.orig.extract_xd(lon, l, b, r).shape)




class CompositeExtractor(Extractor):
composite_classes = []

Expand Down Expand Up @@ -258,5 +334,5 @@ def enhance_contrast(rgb):
s = rgb.shape
d = disk(s[0] / 5)
for i in range(3):
rgb[:, :, i] = percentile_autolevel(rgb[:, :, i], d, p0=.1, p1=.9)
rgb[:, :, i] = autolevel_percentile(rgb[:, :, i], d, p0=.1, p1=.9)
return rgb
13 changes: 9 additions & 4 deletions bubbly/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import logging

from skimage.transform import resize
from sklearn.metrics import recall_score, auc_score
from sklearn.metrics import recall_score, roc_auc_score
import numpy as np


Expand Down Expand Up @@ -56,9 +56,14 @@ def scale(x, mask=None, limits=None):
if mask is None:
lo, hi = np.percentile(x, limits)
else:
lo, hi = np.percentile(x[mask], limits)

if x[mask].size>0:
# print x[mask].shape
lo, hi = np.percentile(x[mask], limits)
else:
lo,hi=0,0

x = (np.clip(x, lo, hi) - lo) / (hi - lo)
# return x[mask].shape
return (np.sqrt(x) * 255).astype(np.uint8)


Expand All @@ -67,7 +72,7 @@ def resample(arr, shape):
# skimage's resize needs scaled data
lo, hi = np.nanmin(arr), np.nanmax(arr)
arr = (arr - lo) / (hi - lo)
result = resize(arr, shape, mode='nearest')
result = resize(arr, shape, mode='edge')
return result * (hi - lo) + lo


Expand Down
10 changes: 8 additions & 2 deletions bubbly/wiserf.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os

import PyWiseRF
#import PyWiseRF
from sklearn.ensemble import RandomForestClassifier
import cloud
import numpy as np

Expand All @@ -14,7 +15,12 @@ def test():
clf = WiseRF().fit(x, y)
return clf

class WiseRF(PyWiseRF.WiseRF):
#class WiseRF(PyWiseRF.WiseRF):
# def decision_function(self, x):
# p = self.predict_proba(x)
# return p[:, 1] - p[:, 0]

class WiseRF(RandomForestClassifier):
def decision_function(self, x):
p = self.predict_proba(x)
return p[:, 1] - p[:, 0]
15 changes: 12 additions & 3 deletions models/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Models

These files contain model input and output data. They are included in
this repository, but are all generated from other scripts in the
``scripts/`` and ``notebooks/`` directory.
These files contain model input and output data.

* "full_classifier_retrain_xd_all_gini_noise_0722.dat": retrained on the synthetic bubbles with and without noise and the original training set from MWP
* "full_classifier_retrain_xd_all_nonnoise_gini_0722.dat": retrained on the synthetic bubbles without noise and the original training set from MWP
* "full_classifier_xd_only_sim_non_noi_1102.dat": retrained on the synthetic bubbles without noise
* "full_classifier_xd_only_simulation_1029.dat": retrained on the synthetic bubbles with and without noise
* "full_classifier_xd_reduceMWP_simulation_1025.dat": retrained on the synthetic bubbles with and without noise and half of the original training set from MWP
* "full_classifier.dat": original training




Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading