Skip to content

Commit

Permalink
Upgrade to v1.0 (#18)
Browse files Browse the repository at this point in the history
* Update to v1.0.0
  • Loading branch information
yilaili authored Jun 10, 2021
1 parent 0c83707 commit a55c1f0
Show file tree
Hide file tree
Showing 46 changed files with 793 additions and 243 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.pyc
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MicAssess_v1.0.png
68 changes: 48 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,29 @@ Tools to run user-free preprocessing of cryo-EM datasets: https://www.biorxiv.or

MicAssess and 2DAssess are incorporated into the freely available for academic research on COSMIC2 science gateway: https://cosmic2.sdsc.edu:8443/gateway/. Just upload your input files and you can run the jobs on the cloud!

**Updates (v1.0.0)**
We have updated cryoassess package to 1.0! MicAssess has a big update whereas 2DAssess is unchanged.
![MicAssess v1.0](./MicAssess_v1.0.png)
The major change of MicAssess is that it now predicts 6 labels instead of 2. The 6 labels are:
1. Great
2. Decent
3. Contamination, Aggregate, Crack, Breaking, Drifting
4. Empty (no ice)
5. Crystalline ice
6. Empty ice, no particles but vitreous ice

You can find some exemplar images of these 6 labels in the "Examples" folder.

The first two labels are both considered as "good" and will be outputted in a "micrographs_good.star" file. The micrographs in the first "great" label will also be written into a "micrographs_great.star" file. We hope that the "great" label can be helpful to select the most promising micrographs from a big dataset.

The last four labels are all considered as "bad". We hope that MicAssess can not only help with micrographs curation, but also gives the user some information about why this is a "bad" micrograph.

The prediction uses a hierarchical classification model. For a new micrograph, the model will first decide whether this is a "good" or "bad" micrograph (validation accuracy ~93%). The tolerance threshold in this step can be tuned by the user with `--t1`. If the micrograph is classified as "good", the model will further classify whether it belong to the "great" class or "decent" class (validation accuracy ~75%). The tolerance threshold in this step can be tuned by the user with `--t2`. If the micrograph is "bad", the model will further classify which "bad" class it belongs to (validation accuracy ~80%).

MicAssess v1.0 needs new model files, which consist of 4 different .h5 files. Model files can be downloaded per request on https://cosmic-cryoem.org/software/cryo-assess/. For current users, we will just update the files in the shared folder. Please do not change the file names of the models.

Note on 2DAssess: since Relion 4.0 will have its own 2D classification auto-selection tool, we do not plan to update 2DAssess in the future.

**Updates (v0.2.1)**
Credit to @DaniDelHoyo
1. Fixed a bug that caused the malfunction of the prediction results.
Expand All @@ -17,7 +40,6 @@ Credit to @DaniDelHoyo
**Note (5/8/2020)**
2DAssess gives syntax error for some users. We have fix the bug and it should be ok to run now.


**Updates (3/7/2020, v0.1.0)**
1. MicAssess now supports micrographs from K3 as well as K2.
2. pip install now enabled. (Credit to @pconesa)
Expand All @@ -28,21 +50,22 @@ Credit to @DaniDelHoyo

Both MicAssess and 2DAssess are python based and need anaconda installed to run. Anaconda can be downloaded and installed here: https://www.anaconda.com/distribution/

0. If you have an existing cryoassess conda environment from the previous version, you will need to remove it by
```
conda env remove --name cryoassess
```

1. Create an anaconda environment
```
conda create -n cryoassess -c anaconda python=3.6 pyqt=5 cudatoolkit=10.0.130 cudnn=7.6.0 intel-openmp=2019.4
conda create -n cryoassess -c anaconda -c conda-forge python=3.7 cudatoolkit=11.2 cudnn=8.1
```
2. Activate this conda environment by
```
conda activate cryoassess
```
3. Install cryoassess (this package) for cpu
```
pip install path-to-local-clone[cpu]
3. Install cryoassess (this package)
```
Alternatively, if using GPU:
```
pip install path-to-local-clone[gpu]
pip install path-to-local-clone
```

**Download .h5 model files:**
Expand All @@ -53,28 +76,30 @@ You will need the pre-trained model files to run MicAssess and 2DAssess. To down

Note: MicAssess currently works on micrographs from both K2 and K3 camera.

Note: MicAssess currently does not support star file from Relion 3.1.

You will need to activate the conda environment by ```conda activate cryoassess``` before using MicAssess.

To run MicAssess:
```
micassess -i <a micrograph star file> -m <model file>
micassess -i <a micrograph star file, or any valid glob wildcard for mrc files> -m <model directory>
```

Optional arguments:

-d, --detector: Either "K2" or "K3". Default is "K2".
`-d`, `--detector`: Either "K2" or "K3". Default is "K2".

`-o`, `--output`: Name of the output directory. Default is MicAssess.

`-b`, `--batch_size`: Batch size used in prediction. Default is 32. Increasing this number will result in faster prediction, if your GPU memory allows. If memory error/warning appears, you should lower this number.

-o, --output: Name of the output star file. Default is good_micrographs.star.
`--t1`: Threshold for good/bad classification. Default is 0.1. Higher number will cause more good micrographs (including great and good) being classified as bad. On the other hand, if you find good micrographs misclassified as bad, try to lower this number.

-b, --batch_size: Batch size used in prediction. Default is 32. Increasing this number will result in faster prediction, if your GPU memory allows. If memory error/warning appears, you should lower this number.
`--t2`: Threshold for great/decent classification. Default is 0.1. Higher number will cause more great micrographs being classified as good.

-t, --threshold: Threshold for classification. Default is 0.1. Higher number will cause more good micrographs being classified as bad.
`--threads`: Number of threads for conversion. Default is None, using the maximum allowed. If get memory errors, please set it to a reasonable number (e.g. 10). This usually happens when you have super-resolution micrographs from K3.

--threads: Number of threads for conversion. Default is None, using mp.cpu_count(). If get memory error, set it to a reasonable number (e.g. 10). This usually happens when you have super-resolution microgarphs from K3.
`--gpus`: Specify which GPU(s) to use, e.g. 0,1,2,3. Default is 0, which uses only the first GPU.

--gpus: Specify which GPU(s) to use, e.g. 0,1,2,3. Default is 0, which uses only the first GPU.
`--dont_reset`: By default, MicAssess will first convert the mrc files to png format before prediction. Depending on your CPU threads and the number of micrographs, this step may be time-consuming. If you already have the mrc files converted (to png) with a previous run of MicAssess, you can skip the conversion step by using this flag.

The input of MicAssess could be a .star file with a header similar to this:
```
Expand All @@ -88,15 +113,18 @@ Note that the header must have the "\_rlnMicrographName". The star file must be

Optionally, input could be a folder where micrographs are, or a pattern where wildcards are accepted. (See https://docs.python.org/3.6/library/glob.html for more details)

MicAssess will output a "good_micrographs.star" file in the same directory of the input star file. It will also create a MicAssess directory with all the predictions (converted to .jpg files), in case you want to check the performance.
MicAssess will output "micrographs_great.star" and "micrographs_good.star" file in the same directory of the input star file. It will also create a MicAssess directory with all the predictions (converted to png), in case you want to check the performance.

Note: if memory warning appears:
(W tensorflow/core/framework/allocator.cc:108] Allocation of 999571456 exceeds 10% of system memory.)
Reduce the batch size by adding ‘-b 16’, or even a smaller number (8 or 4). The default batch size is 32. You can also increase the batch size to a higher number like 64, if your memory allows. Higher batch size means faster.
Reduce the batch size by adding ‘-b 16’, or even a smaller number (8 or 4). The default batch size is 32. You can also increase the batch size to a higher number like 64, if your memory allows. Higher batch size is usually faster.

Note: We found in practice, the default threshold (0.1) will cause some empty images being misclassified to the "good" class. Increasing the threshold to 0.3 will help to solve this problem.
Note on parameter tuning:
- If you want to keep as many "good" data as possible without worrying too much about some bad data being misclassified as the good ones, you should use a low `--t1` (0.1 or even lower).
- If you just want to get the "great" data as clean as possible and don't really care about losing some "good/great" data, consider using a higher `--t1` (0.2 ~ 0.5). `--t2` can be leave as default of even higher to be more stringent.

**2DAssess:**
Note: Since Relion 4.0 will have its own 2D classification auto-selection tool, we do not plan to update 2DAssess in the future.

You will need to activate the conda environment by ```conda activate cryoassess``` before using 2DAssess.

Expand Down
6 changes: 3 additions & 3 deletions cryoassess/assess2d.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@
from itertools import product
import re
from cryoassess.lib.check_center import checkCenter
from cryoassess.mrcs2jpg import mrcs2jpg
from cryoassess.lib import imgprep
from cryoassess.lib import utils
from cryoassess.lib.mrcs2jpg import mrcs2jpg
# from cryoassess.lib import imgprep
# from cryoassess.lib import utils


def setupParserOptions():
Expand Down
Binary file added cryoassess/lib/__pycache__/__init__.cpython-36.pyc
Binary file not shown.
Binary file added cryoassess/lib/__pycache__/imgprep.cpython-36.pyc
Binary file not shown.
Binary file added cryoassess/lib/__pycache__/utils.cpython-36.pyc
Binary file not shown.
136 changes: 136 additions & 0 deletions cryoassess/lib/fft.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K


def normalize(x):
x /= 127.5
x -= 1.
return x


def fft2(x):
return np.fft.fftshift(np.fft.fft2(np.fft.fftshift(x)))

def fft2_tf(x):
x = tf.cast(x, tf.complex64)
return tf.signal.fftshift(tf.signal.fft2d(tf.signal.fftshift(x)))

def power_spectrum(x):
return np.abs(fft2(x))**2

def power_spectrum_tf(x):
return tf.math.pow(tf.math.abs(fft2_tf(x)), 2)


def radial_avg(img):

x0 = img.shape[1] // 2
y0 = img.shape[0] // 2

x,y = np.meshgrid(np.arange(img.shape[1]), np.arange(img.shape[0]))
R = np.sqrt((x-x0)**2 + (y-y0)**2)

# calculate the mean
f = lambda r : img[(R >= r-.5) & (R < r+.5)].mean()
img_r = min(img.shape[1], img.shape[0]) // 2
r = np.linspace(1, img_r, num=img_r)
mean = np.vectorize(f)(r)

return r, mean

def radial_avg_tf(img):

x0 = tf.math.divide(img.shape[2], 2)
x0 = tf.cast(x0, tf.float32)
y0 = tf.math.divide(img.shape[1], 2)
y0 = tf.cast(y0, tf.float32)

x,y = tf.meshgrid(tf.range(img.shape[2]), tf.range(img.shape[1]))
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
R = tf.math.sqrt(tf.math.square(x-x0) + tf.math.square(y-y0))
# R = K.tile(K.expand_dims(R, axis=0),[tf.shape(img)[0], 1, 1])

# calculate the mean
# f = lambda r : tf.reduce_mean(img[(R >= r-.5) & (R < r+.5)])
f = lambda r : tf.reduce_mean(tf.boolean_mask(img, (R >= r-.5) & (R < r+.5), axis=1), axis=1)

img_r = tf.math.floordiv(tf.math.minimum(img.shape[2], img.shape[1]), 2)
img_r = tf.cast(img_r, tf.float32)
r = tf.linspace(1.0, img_r, num=tf.cast(img_r, tf.int32))
mean = tf.map_fn(f, r)

mean = tf.transpose(tf.squeeze(mean))

return r, mean


def radavg_logps(x, normalize=False):

if normalize:
x = normalize(x)
ps_x = power_spectrum(x)
r, mean = radial_avg(np.log(ps_x))

return mean

def radavg_logps_tf(x):
'''
No normalization.
'''
ps_x = power_spectrum_tf(x)
r, mean = radial_avg_tf(tf.math.log(ps_x))
# mean = K.expand_dims(mean, axis=0)

return mean

def radavg_logps_sigmoid_tf(x):
return tf.math.sigmoid(radavg_logps_tf(x))



# import cv2
# import matplotlib.pyplot as plt
# x = cv2.imread(r'C:\Users\Mutania\Desktop\micrograph.png', cv2.IMREAD_GRAYSCALE).astype('float64')
# x = normalize(x)
# fft_x = fft2(x)
# ps_x = power_spectrum(x)
# plt.imshow(np.log(ps_x), cmap='gray')

# r, mean = radial_avg(np.log(ps_x))
# plt.plot(r, mean)

# x = tf.convert_to_tensor(x)
# mean = radavg_logps_tf(x)



# import cv2, os
# import utils

# allimg_path = r'C:\Users\Mutania\Desktop\MicAssess_v1.0_data\micassess_train_k2\data'
# all_mean = []

# i = 0
# for f in os.listdir(allimg_path):
# fname = os.path.join(allimg_path, f)
# x = cv2.imread(fname, cv2.IMREAD_GRAYSCALE).astype('float64')
# x = utils.preprocess(x)
# mean = radavg_logps(x, normalize=False)
# all_mean.append(mean)
# i += 1
# if i % 500 == 0:
# print(i)
# if i == 2000:
# break

# avg_mean = np.mean(all_mean, axis=0)
# std_mean = np.std(all_mean, axis=0)
# np.save('radavg_mean.npy', avg_mean)
# np.save('radavg_std.npy', std_mean)

# max_mean = np.max(all_mean, axis=0)
# min_mean = np.min(all_mean, axis=0)
# np.save('radavg_max.npy', max_mean)
# np.save('radavg_min.npy', min_mean)
6 changes: 3 additions & 3 deletions cryoassess/lib/imgprep.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
'''
import numpy as np
from PIL import Image
from scipy import ndimage
# from scipy import ndimage
import pandas as pd
# from PIL import ImageOps

Expand Down Expand Up @@ -45,8 +45,8 @@ def scaleImage(img, height=494):
'''
Downsample image, scale the pixel value from 0-255 and save it as the Image object.
'''
img = downsample(img, height)
new_img = ((img-img.min())/((img.max()-img.min())+1e-7)*255).astype('uint8')
new_img = downsample(img, height)
new_img = ((new_img-new_img.min())/((new_img.max()-new_img.min())+1e-7)*255).astype('uint8')
new_img = Image.fromarray(new_img)
new_img = new_img.convert("L")
return new_img
Expand Down
70 changes: 70 additions & 0 deletions cryoassess/lib/mrc2png.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#!/usr/bin/env python3
'''
Read mrc files, use FFT to downsample them to smaller png files.
The output png files will have height as 494 px and h/w ratio will be kept.
'''

import mrcfile
import os
import glob
import numpy as np
import argparse
from PIL import Image
import multiprocessing as mp
from pathlib import Path
from . import star

def setupParserOptions():
ap = argparse.ArgumentParser()
ap.add_argument('-i', '--input', help="Provide the path to the micrographs.star file.")
ap.add_argument('-o', '--output', help='Provide the path to the output directory.')
ap.add_argument('--threads', type=int, default=None,
help='Number of threads for conversion. Default is None, using mp.cpu_count(). If get memory error, set it to a reasonable number.')
args = vars(ap.parse_args())
return args

def downsample(img, height=494):
'''
Downsample 2d array using fourier transform.
factor is the downsample factor.
'''
m,n = img.shape[-2:]
ds_factor = m/height
width = round(n/ds_factor/2)*2
F = np.fft.rfft2(img)
A = F[...,0:height//2,0:width//2+1]
B = F[...,-height//2:,0:width//2+1]
F = np.concatenate([A, B], axis=0)
f = np.fft.irfft2(F, s=(height, width))
return f

def scale_image(img, height=494):
newImg = downsample(img, height)
newImg = ((newImg - newImg.min()) / ((newImg.max() - newImg.min()) + 1e-7) * 255)
newImg = Image.fromarray(newImg).convert('L')
return newImg

def save_image(mrc_name, outdir, height=494):
# print(mrc_name)
try:
micrograph = mrcfile.open(mrc_name, permissive=True).data
micrograph = micrograph.reshape((micrograph.shape[-2], micrograph.shape[-1]))
newImg = scale_image(micrograph, height)
newImg.save(os.path.join(outdir, os.path.splitext(os.path.basename(mrc_name))[0] + '.png'))
except ValueError:
print('An error occured when trying to save ', mrc_name)
pass

def mrc2png(args):
# os.chdir(args['output'])
mic_list = star.star2miclist(args['input'])
Path(os.path.join(args['output'], 'png', 'data')).mkdir(parents=True, exist_ok=True)
threads = mp.cpu_count() if args['threads'] is None else args['threads']
with mp.Pool(threads) as pool:
print('Converting in %d parallel threads....' %threads)
pool.starmap(save_image, ((mrc_name, os.path.join(args['output'], 'png', 'data')) for mrc_name in mic_list))
print('Conversion finished.')

if __name__ == '__main__':
args = setupParserOptions()
mrc2png(args)
Loading

0 comments on commit a55c1f0

Please sign in to comment.