diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..cf48696
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,8 @@
+.Rhistory
+.RData
+.snakemake/
+.ipynb_checkpoints
+config.yml
+test/
+Notebooks/annotate_report_cache/
+Notebooks/benchmark_report_cache/
diff --git a/Benchmarking/.snakemake/log/2022-09-08T133151.890209.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T133151.890209.snakemake.log
deleted file mode 100644
index 341f44d..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T133151.890209.snakemake.log
+++ /dev/null
@@ -1,8 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T133151.890209.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
diff --git a/Benchmarking/.snakemake/log/2022-09-08T133238.758946.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T133238.758946.snakemake.log
deleted file mode 100644
index 9503b4a..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T133238.758946.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T133238.758946.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T141158.777553.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T141158.777553.snakemake.log
deleted file mode 100644
index 33c32ab..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T141158.777553.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T141158.777553.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T141421.312624.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T141421.312624.snakemake.log
deleted file mode 100644
index 179ec5f..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T141421.312624.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T141421.312624.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T141503.422813.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T141503.422813.snakemake.log
deleted file mode 100644
index 8c12be4..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T141503.422813.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T141503.422813.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T141620.098535.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T141620.098535.snakemake.log
deleted file mode 100644
index a4cc840..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T141620.098535.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T141620.098535.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T141718.523238.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T141718.523238.snakemake.log
deleted file mode 100644
index 018e55f..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T141718.523238.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T141718.523238.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T141852.782530.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T141852.782530.snakemake.log
deleted file mode 100644
index f3efa1a..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T141852.782530.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T141852.782530.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142121.740084.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142121.740084.snakemake.log
deleted file mode 100644
index 082e9bd..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142121.740084.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142121.740084.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142322.175430.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142322.175430.snakemake.log
deleted file mode 100644
index 10aa868..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142322.175430.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142322.175430.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142344.721772.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142344.721772.snakemake.log
deleted file mode 100644
index 9e2e818..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142344.721772.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142344.721772.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142356.762010.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142356.762010.snakemake.log
deleted file mode 100644
index a4e210d..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142356.762010.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142356.762010.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142404.703381.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142404.703381.snakemake.log
deleted file mode 100644
index feaadf1..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142404.703381.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142404.703381.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142436.949686.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142436.949686.snakemake.log
deleted file mode 100644
index e762d0d..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142436.949686.snakemake.log
+++ /dev/null
@@ -1,7 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142436.949686.snakemake.log
-Executing main workflow.
-Creating report...
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T142557.810546.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T142557.810546.snakemake.log
deleted file mode 100644
index 337bbf2..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T142557.810546.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T142557.810546.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T143045.014696.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T143045.014696.snakemake.log
deleted file mode 100644
index 9d4439a..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T143045.014696.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T143045.014696.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T143846.878355.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T143846.878355.snakemake.log
deleted file mode 100644
index 659a1e8..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T143846.878355.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T143846.878355.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T143915.339453.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T143915.339453.snakemake.log
deleted file mode 100644
index bf531d7..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T143915.339453.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T143915.339453.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding UMAP_embeddings.csv (0.18 MB).
-Adding UMAP_embeddings.csv (0.27 MB).
-Adding UMAP_embeddings.csv (0.53 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144001.049537.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144001.049537.snakemake.log
deleted file mode 100644
index 736f709..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144001.049537.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144001.049537.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding UMAP_embeddings.csv (0.18 MB).
-Adding UMAP_embeddings.csv (0.27 MB).
-Adding UMAP_embeddings.csv (0.53 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144242.060011.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144242.060011.snakemake.log
deleted file mode 100644
index ec8079d..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144242.060011.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144242.060011.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding UMAP_embeddings.csv (0.18 MB).
-Adding UMAP_embeddings.csv (0.27 MB).
-Adding UMAP_embeddings.csv (0.53 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144432.308860.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144432.308860.snakemake.log
deleted file mode 100644
index 41a75b5..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144432.308860.snakemake.log
+++ /dev/null
@@ -1,16 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144432.308860.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding UMAP_embeddings.csv (0.18 MB).
-Adding UMAP_embeddings.csv (0.27 MB).
-Adding UMAP_embeddings.csv (0.53 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144609.966021.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144609.966021.snakemake.log
deleted file mode 100644
index bfa5547..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144609.966021.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144609.966021.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144635.458546.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144635.458546.snakemake.log
deleted file mode 100644
index fb5b709..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144635.458546.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144635.458546.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144749.295035.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144749.295035.snakemake.log
deleted file mode 100644
index 97dba95..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144749.295035.snakemake.log
+++ /dev/null
@@ -1,10 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144749.295035.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144812.039084.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144812.039084.snakemake.log
deleted file mode 100644
index ada97f7..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144812.039084.snakemake.log
+++ /dev/null
@@ -1,16 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144812.039084.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding UMAP_embeddings.csv (0.18 MB).
-Adding UMAP_embeddings.csv (0.27 MB).
-Adding UMAP_embeddings.csv (0.53 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T144829.502825.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T144829.502825.snakemake.log
deleted file mode 100644
index c97d324..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T144829.502825.snakemake.log
+++ /dev/null
@@ -1,16 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T144829.502825.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding UMAP_embeddings.csv (0.18 MB).
-Adding UMAP_embeddings.csv (0.27 MB).
-Adding UMAP_embeddings.csv (0.53 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T145000.697165.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T145000.697165.snakemake.log
deleted file mode 100644
index a96b824..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T145000.697165.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T145000.697165.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T145117.724678.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T145117.724678.snakemake.log
deleted file mode 100644
index 38d74e7..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T145117.724678.snakemake.log
+++ /dev/null
@@ -1,14 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T145117.724678.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-WorkflowError in line 97 of /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/snakefile:
-Failed to resolve wildcards.
-AttributeError: 'Wildcards' object has no attribute 'sample'
- File "/project/kleinman/modules/software/miniconda/3.8/envs/snakemake/lib/python3.9/site-packages/snakemake/report/__init__.py", line 651, in auto_report
- File "/project/kleinman/modules/software/miniconda/3.8/envs/snakemake/lib/python3.9/site-packages/snakemake/report/__init__.py", line 632, in register_file
- File "/project/kleinman/modules/software/miniconda/3.8/envs/snakemake/lib/python3.9/site-packages/snakemake/report/__init__.py", line 213, in __init__
diff --git a/Benchmarking/.snakemake/log/2022-09-08T145147.458529.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T145147.458529.snakemake.log
deleted file mode 100644
index ee28b62..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T145147.458529.snakemake.log
+++ /dev/null
@@ -1,14 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T145147.458529.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-WorkflowError in line 97 of /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/snakefile:
-Failed to resolve wildcards.
-AttributeError: 'Wildcards' object has no attribute 'samples'
- File "/project/kleinman/modules/software/miniconda/3.8/envs/snakemake/lib/python3.9/site-packages/snakemake/report/__init__.py", line 651, in auto_report
- File "/project/kleinman/modules/software/miniconda/3.8/envs/snakemake/lib/python3.9/site-packages/snakemake/report/__init__.py", line 632, in register_file
- File "/project/kleinman/modules/software/miniconda/3.8/envs/snakemake/lib/python3.9/site-packages/snakemake/report/__init__.py", line 213, in __init__
diff --git a/Benchmarking/.snakemake/log/2022-09-08T145333.876334.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T145333.876334.snakemake.log
deleted file mode 100644
index 9d50714..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T145333.876334.snakemake.log
+++ /dev/null
@@ -1,8 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T145333.876334.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
diff --git a/Benchmarking/.snakemake/log/2022-09-08T145358.279667.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T145358.279667.snakemake.log
deleted file mode 100644
index 4fe8b70..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T145358.279667.snakemake.log
+++ /dev/null
@@ -1,8 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T145358.279667.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
diff --git a/Benchmarking/.snakemake/log/2022-09-08T150009.566381.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T150009.566381.snakemake.log
deleted file mode 100644
index 79b39ab..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T150009.566381.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T150009.566381.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T151241.745673.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T151241.745673.snakemake.log
deleted file mode 100644
index 47165f7..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T151241.745673.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T151241.745673.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T151446.425062.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T151446.425062.snakemake.log
deleted file mode 100644
index 73a8384..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T151446.425062.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T151446.425062.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T151659.197199.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T151659.197199.snakemake.log
deleted file mode 100644
index dfb79fd..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T151659.197199.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T151659.197199.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T152841.441632.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T152841.441632.snakemake.log
deleted file mode 100644
index cb217f9..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T152841.441632.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T152841.441632.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T153008.691643.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T153008.691643.snakemake.log
deleted file mode 100644
index 6d428d3..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T153008.691643.snakemake.log
+++ /dev/null
@@ -1,13 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T153008.691643.snakemake.log
-Executing main workflow.
-Creating report...
-Adding Prediction_Summary.tsv (0.33 MB).
-Adding Prediction_Summary.tsv (0.49 MB).
-Adding Prediction_Summary.tsv (0.95 MB).
-Adding Consensus.png (0.031 MB).
-Adding Consensus.png (0.023 MB).
-Adding Consensus.png (0.052 MB).
-Downloading resources and rendering HTML.
-Report created: report.html.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T153018.042895.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T153018.042895.snakemake.log
deleted file mode 100644
index f934821..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T153018.042895.snakemake.log
+++ /dev/null
@@ -1,24 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T153018.042895.snakemake.log
-Executing main workflow.
-Using shell: /usr/bin/bash
-Provided cores: 2
-Rules claiming more threads will be scaled down.
-Job counts:
- count jobs
- 1 all
- 1 concat
- 1 plot
- 3
-Select jobs to execute...
-
-[Thu Sep 8 15:30:18 2022]
-rule concat:
- input: /project/kleinman/hussein.lakkis/from_hydra/test/BT2016062, /project/kleinman/hussein.lakkis/from_hydra/test/P-1694_S-1694_multiome, /project/kleinman/hussein.lakkis/from_hydra/test/P-1701_S-1701_multiome
- output: /project/kleinman/hussein.lakkis/from_hydra/test/BT2016062/Prediction_Summary.tsv, /project/kleinman/hussein.lakkis/from_hydra/test/P-1694_S-1694_multiome/Prediction_Summary.tsv, /project/kleinman/hussein.lakkis/from_hydra/test/P-1701_S-1701_multiome/Prediction_Summary.tsv
- log: /project/kleinman/hussein.lakkis/from_hydra/test/BT2016062/Gatherpreds.log, /project/kleinman/hussein.lakkis/from_hydra/test/P-1694_S-1694_multiome/Gatherpreds.log, /project/kleinman/hussein.lakkis/from_hydra/test/P-1701_S-1701_multiome/Gatherpreds.log
- jobid: 1
-
-Terminating processes on user request, this might take some time.
-Cancelling snakemake on user request.
diff --git a/Benchmarking/.snakemake/log/2022-09-08T153714.016757.snakemake.log b/Benchmarking/.snakemake/log/2022-09-08T153714.016757.snakemake.log
deleted file mode 100644
index 9921f22..0000000
--- a/Benchmarking/.snakemake/log/2022-09-08T153714.016757.snakemake.log
+++ /dev/null
@@ -1,22 +0,0 @@
-Building DAG of jobs...
-Nothing to be done.
-Complete log: /project/kleinman/hussein.lakkis/from_hydra/scCoAnnotate/Benchmarking/.snakemake/log/2022-09-08T153714.016757.snakemake.log
-Executing main workflow.
-Using shell: /usr/bin/bash
-Provided cores: 2
-Rules claiming more threads will be scaled down.
-Job counts:
- count jobs
- 1 all
- 1 concat
- 1 plot
- 3
-Select jobs to execute...
-
-[Thu Sep 8 15:37:14 2022]
-rule concat:
- input: /project/kleinman/hussein.lakkis/from_hydra/test/BT2016062, /project/kleinman/hussein.lakkis/from_hydra/test/P-1694_S-1694_multiome, /project/kleinman/hussein.lakkis/from_hydra/test/P-1701_S-1701_multiome
- output: /project/kleinman/hussein.lakkis/from_hydra/test/BT2016062/Prediction_Summary.tsv, /project/kleinman/hussein.lakkis/from_hydra/test/P-1694_S-1694_multiome/Prediction_Summary.tsv, /project/kleinman/hussein.lakkis/from_hydra/test/P-1701_S-1701_multiome/Prediction_Summary.tsv
- log: /project/kleinman/hussein.lakkis/from_hydra/test/BT2016062/Gatherpreds.log, /project/kleinman/hussein.lakkis/from_hydra/test/P-1694_S-1694_multiome/Gatherpreds.log, /project/kleinman/hussein.lakkis/from_hydra/test/P-1701_S-1701_multiome/Gatherpreds.log
- jobid: 1
-
diff --git a/Config/config.default.yml b/Config/config.default.yml
new file mode 100644
index 0000000..788747d
--- /dev/null
+++ b/Config/config.default.yml
@@ -0,0 +1,58 @@
+
+SingleR:
+ threads: 1
+
+Correlation:
+ threads: 1
+
+scPred:
+ threads: 1
+ classifier: 'svmRadial'
+
+scClassify:
+ threads: 1
+
+SciBet:
+ threads: 1
+
+singleCellNet:
+ threads: 1
+
+scHPL:
+ threads: 1
+ classifier: 'svm'
+ dimred: 'False'
+ threshold: 0.5
+
+SVMlinear:
+ threads: 1
+ threshold: 0
+ classifier: 'SVMlinear'
+
+SVC:
+ threads: 1
+ classifier: 'rbf'
+ threshold: 0.5
+
+ACTINN:
+ threads: 1
+
+scLearn:
+ threads: 1
+
+scID:
+ threads: 1
+
+scAnnotate:
+ threads: 1
+ threshold: 0.5
+
+scNym:
+ threads: 1
+ threshold: 0.5
+
+CellTypist:
+ threads: 1
+ feature_selection: 'True'
+ majority_voting: 'True'
+ threshold: 0.5
diff --git a/Notebooks/annotate_report.Rmd b/Notebooks/annotate_report.Rmd
new file mode 100644
index 0000000..d44e134
--- /dev/null
+++ b/Notebooks/annotate_report.Rmd
@@ -0,0 +1,394 @@
+---
+title: "scCoAnnotate - `r params$sample`"
+output:
+ html_document:
+ df_print: paged
+ theme: flatly
+ toc: yes
+ toc_float: yes
+ toc_depth: 1
+ code_folding: hide
+params:
+ refs: ''
+ tools: ''
+ consensus: ''
+ output_dir: ''
+ sample: ''
+ threads: ''
+ marker_genes: ''
+ query: ''
+---
+
+```{r setup, knitr_options, echo=F}
+knitr::opts_chunk$set(message = FALSE, warning=FALSE)
+```
+
+```{r fig.show='hide', include=F}
+library(tidyverse)
+library(ComplexHeatmap)
+library(Seurat)
+library(MetBrewer)
+library(plotly)
+library(kableExtra)
+
+#empty plotly plot to make sure the other plotly plots get printed later
+plotly_empty()
+
+# format notebook parameters
+threads = as.numeric(params$threads)
+refs = strsplit(params$refs, split = ' ')[[1]]
+tools = c('Consensus',strsplit(params$tools, split = ' ')[[1]])
+marker_genes = strsplit(params$marker_genes, split = ' ')[[1]]
+```
+
+```{r}
+plot_tool_correlation_heatmap = function(seurat, tools){
+
+ mat = query@meta.data %>%
+ select(all_of(tools)) %>%
+ rownames_to_column('cell') %>%
+ pivot_longer(!cell) %>%
+ mutate(value = factor(value)) %>%
+ mutate(value = as.numeric(value)) %>%
+ pivot_wider(names_from = name, values_from = value) %>%
+ column_to_rownames('cell') %>%
+ cor()
+
+ mat[is.na(mat)] = 0
+
+ col_fun = circlize::colorRamp2(c(-1, 0, 1), c("#2274A5", "beige", "#F75C03"))
+
+ count = query@meta.data %>%
+ select(all_of(tools)) %>%
+ rownames_to_column('cell') %>%
+ pivot_longer(!cell) %>%
+ filter(value %in% c('Unresolved', 'No Consensus')) %>%
+ dplyr::count(name, .drop = F) %>%
+ mutate(freq = round(n/nrow(query@meta.data)*100)) %>%
+ select(!n) %>%
+ column_to_rownames('name')
+
+ count[setdiff(names(seurat@meta.data %>% select(tools)), rownames(count)),] = 0
+ count = count[order(match(rownames(count), colnames(mat))), , drop = FALSE]
+
+ ha = columnAnnotation('% Unresolved/No Consensus' = anno_barplot(count, border = F, gp = gpar(fill = '#596475', col = '#596475')))
+
+ h = ComplexHeatmap::Heatmap(mat,
+ name = 'Correlation',
+ col = col_fun,
+ width = ncol(mat)*unit(7, "mm"),
+ height = nrow(mat)*unit(7, "mm"),
+ rect_gp = gpar(col = "white", lwd = 2),
+ top_annotation = ha,
+ show_column_dend = F)
+ return(h)
+}
+
+create_color_pal = function(class, mb = 'Juarez'){
+ pal = sample(met.brewer(mb, length(class)))
+ names(pal) = class
+ pal['Unresolved'] = 'lightgrey'
+ pal['No Consensus'] = 'grey'
+ return(pal)
+}
+
+plot_bar_largest_group = function(seurat, meta_column = '', pal = pal, fr = 0.1){
+
+df = seurat@meta.data %>%
+ count(seurat_clusters, .data[[meta_column]]) %>%
+ group_by(seurat_clusters) %>%
+ mutate(`%` = (n / sum(n))) %>%
+ mutate(meta = ifelse(`%` < fr, NA, .data[[meta_column]]))
+
+pal = pal[unique(na.omit(df$meta))]
+
+df = df %>%
+ mutate(meta = factor(meta, levels = c(NA, names(pal)), exclude = NULL))
+
+p1 = df %>%
+ ggplot(aes(x = seurat_clusters, y = `%`, fill = meta, text = sprintf(" %s
%s ",
+ meta,
+ scales::percent(`%`, scale = 100, accuracy = .1)))) +
+ geom_bar(stat = 'identity', position="fill") +
+ scale_fill_manual(values = pal, na.value = 'white', name = '') +
+ scale_y_continuous(expand = c(0,0), labels = scales::percent_format(scale = 100, accuracy = 1)) +
+ coord_flip() +
+ theme_bw() +
+ theme(text = element_text(size = 10),
+ axis.title = element_blank(),
+ axis.line = element_line(size = 0.5),
+ panel.border = element_blank(),
+ panel.grid = element_blank(),
+ strip.background = element_blank(),
+ aspect.ratio = 0.5)
+
+ p2 = ggplotly(p1, tooltip = c('text')) %>% toWebGL()
+
+ return(p2)
+}
+
+plot_percentage_predicted_consensus_class = function(seurat, tools){
+ col_fun = circlize::colorRamp2(c(0, 50, 100), c("#2274A5", "beige", "#F75C03"))
+
+x = seurat@meta.data %>%
+ select(c('Consensus', tools)) %>%
+ group_by(Consensus) %>%
+ pivot_longer(!c('Consensus')) %>%
+ group_by(Consensus, name) %>%
+ count(value, .drop = F) %>%
+ mutate(freq = n/sum(n)*100) %>%
+ filter(Consensus == value) %>%
+ select(name, freq, Consensus) %>%
+ pivot_wider(values_from = 'freq', names_from = 'Consensus',values_fill = 0) %>%
+ column_to_rownames('name')
+
+ h = ComplexHeatmap::Heatmap(x,
+ name = '%',
+ col = col_fun,
+ width = ncol(x)*unit(4, "mm"),
+ height = nrow(x)*unit(4, "mm"),
+ rect_gp = gpar(col = "white", lwd = 2), row_names_side = 'left',
+ show_row_dend = F, column_names_gp = gpar(size = 7))
+
+ return(h)
+}
+
+color_class_seurat = function(seurat, meta_column, pal){
+ list = list()
+ pal['Unsure'] = 'red'
+ pal['No Consensus'] = 'red'
+ Idents(seurat) = meta_column
+ class = (table(query@meta.data[[meta_column]]) %>% as.data.frame() %>% filter(Freq > 20))$Var
+
+ for(c in class){
+ lab = names(Idents(seurat)[Idents(seurat) == c])
+ p = DimPlot(seurat, cells.highlight = lab, cols = 'lightgrey', cols.highlight = pal[c], pt.size = 1) + umap_theme + ggtitle(c)
+ list[[c]] = p
+ }
+
+ return(list)
+}
+
+feature_plot_seurat = function(seurat, genes){
+ list = list()
+
+ genes = genes[genes %in% rownames(seurat@assays$RNA)]
+ for(g in genes){
+ p = FeaturePlot(seurat, features = g, cols = c("#F2EFC7", "#BC412B"), order = T) + umap_theme + ggtitle(g)
+ list[[g]] = p
+ }
+ return(list)
+}
+
+umap_plotly = function(seurat, meta_column, pal){
+
+ p1 = cbind(seurat@reductions$umap@cell.embeddings, seurat@meta.data) %>%
+ slice(sample(1:n())) %>%
+ ggplot(aes(UMAP_1, UMAP_2, color = .data[[meta_column]], text = .data[[meta_column]])) +
+ geom_point(alpha = 0.8) +
+ scale_color_manual(values = pal) +
+ theme_bw() + umap_theme + theme(legend.position = 'right')
+
+ p2 = ggplotly(plot = p1, tooltip = c('text')) %>% layout(autosize = F, width = 550, height = 450) %>% toWebGL()
+
+ return(p2)
+}
+
+calculate_percentage_unsure = function(pred, order){
+ warn = pred %>%
+ select(order) %>%
+ pivot_longer(order) %>%
+ mutate(value = factor(value)) %>%
+ group_by(name) %>%
+ count(value, .drop = F) %>%
+ mutate(frac = n/sum(n)*100) %>%
+ filter(!(!name == 'Consensus' & value == 'No Consensus')) %>%
+ filter(value %in% c('No Consensus', 'Unsure')) %>%
+ mutate(warn = case_when(frac >= 70 ~ 'HIGH',
+ frac < 70 & frac > 30 ~ 'MEDIUM',
+ frac <= 30 ~ 'LOW'))
+
+ warn = data.frame(TOOL = warn$name,
+ LABEL = warn$value,
+ PERCENTAGE = warn$frac,
+ FLAG = warn$warn) %>%
+ mutate(TOOL = factor(TOOL, levels = order))
+
+ warn = warn[order(warn$TOOL),]
+
+ warn$FLAG = cell_spec(warn$FLAG,
+ bold = T,
+ background = case_when(warn$FLAG == 'HIGH' ~ "red",
+ warn$FLAG == 'MEDIUM' ~ "yellow",
+ warn$FLAG == 'LOW' ~ "green"))
+
+ warn$TOOL = cell_spec(warn$TOOL,
+ bold = ifelse(warn$TOOL == 'Consensus', T, F),
+ background = ifelse(warn$TOOL == 'Consensus', 'black', 'white'),
+ color = ifelse(warn$TOOL == 'Consensus', 'white', 'black'))
+ return(warn)
+}
+
+umap_theme = theme(aspect.ratio = 1,
+ text = element_text(size = 10),
+ axis.title = element_blank(),
+ axis.text = element_blank(),
+ axis.ticks = element_blank(),
+ axis.line = element_blank(),
+ panel.border = element_rect(colour = "grey", fill=NA, size=0.5),
+ panel.grid.major = element_blank(),
+ panel.grid.minor = element_blank(),
+ panel.background = element_blank(),
+ legend.position = "none")
+```
+
+
+```{r}
+# read prediction summary for each reference
+list = list()
+
+for(r in refs){
+ list[[r]]$lab = data.table::fread(paste0(params$output_dir, '/model/', r, '/labels.csv'), header = T)
+
+ list[[r]]$pred = data.table::fread(paste0(params$output_dir, '/', params$sample, '/', r, '/Prediction_Summary.tsv')) %>%
+ harmonize_unsure(., list[[r]]$lab)
+
+ # create reference pal
+ list[[r]]$pal = create_color_pal(list[[r]]$lab$label)
+
+ #save(list[[r]]$pal, file = paste0(params$output_dir, '/model/', r, '/class_pal.Rda'))
+}
+
+# read expression matrix for sample
+query = data.table::fread(paste0(params$query),
+ nThread=threads,
+ header=T,
+ data.table=F) %>%
+ column_to_rownames('V1')
+```
+
+```{r, results='hide'}
+# create seurat object from expression matrix
+set.seed(12345)
+query = t(query)
+query = CreateSeuratObject(query, row.names = colnames(query))
+
+query = query %>%
+ NormalizeData() %>%
+ FindVariableFeatures() %>%
+ ScaleData() %>%
+ RunPCA() %>%
+ FindNeighbors(dims = 1:30) %>%
+ FindClusters(resolution = 0.5) %>%
+ RunUMAP(dims = 1:30)
+```
+
+```{r fig.width=8,fig.height=8,echo=FALSE,message=FALSE,results="asis"}
+set.seed(12345)
+cat("\n")
+
+for(r in refs){
+
+ query = AddMetaData(query, list[[r]]$pred)
+
+ cat(" \n#", r, "{.tabset} \n")
+
+ cat(" \n## Sample \n")
+
+ cat("
Clusters
")
+
+ p = umap_plotly(query, 'seurat_clusters', unname(list[[r]]$pal))
+ print(htmltools::tagList(p))
+
+ cat("\n")
+
+ cat("Expression selected genes
")
+
+ if(!length(marker_genes) == 0){
+ l = feature_plot_seurat(query, marker_genes)
+ if(length(marker_genes) < 9){
+ cowplot::plot_grid(plotlist = l[1:9], ncol = 3) %>% print()
+ }else{
+ for(i in seq(from = 1, by = 9, length.out = round(length(marker_genes)/9))){
+ cowplot::plot_grid(plotlist = l[i:(i+8)], ncol = 3) %>% print()
+ }
+ }
+ }
+
+ cat("\n")
+
+ cat(" \n## Prediction QC \n")
+
+ cat("Percentage Unsure
")
+
+ calculate_percentage_unsure(list[[r]]$pred, order = tools) %>%
+ kbl(escape = FALSE, row.names = F) %>%
+ kable_styling(position = "center") %>%
+ print()
+
+ cat("\n")
+
+ cat("Correlation between tools
")
+
+ h = plot_tool_correlation_heatmap(query, tools = tools)
+ draw(h)
+
+ cat("\n")
+
+ cat("Percentage overlap between tools and consensus
")
+
+ h = plot_percentage_predicted_consensus_class(query, tools = tools)
+ draw(h)
+
+ cat("\n")
+
+ cat(" \n## Prediction {.tabset} \n")
+
+ for(t in tools){
+ cat(" \n### ", t , " \n")
+
+ cat("Top class per cluster
")
+
+ p = plot_bar_largest_group(query, t, fr = 0.1, pal = list[[r]]$pal)
+ print(htmltools::tagList(p))
+
+ cat("UMAP
")
+
+ cat("\n")
+
+ p = umap_plotly(query, t, list[[r]]$pal)
+ print(htmltools::tagList(p))
+
+ cat("\n")
+
+ cat("UMAP per class
")
+
+ l = color_class_seurat(query, t, list[[r]]$pal)
+ if(length(l)< 9){
+ cowplot::plot_grid(plotlist = l[1:9], ncol = 3) %>% print()
+ }else{
+ for(i in seq(from = 1, by = 9, length.out = round(length(l)/9))){
+ cowplot::plot_grid(plotlist = l[i:(i+8)], ncol = 3) %>% print()
+ }
+ }
+
+ cat("\n")
+ }
+}
+```
+
+# Report Info
+
+## Parameters
+
+```{r echo=FALSE,message=FALSE,results="asis"}
+for(p in names(params)){
+ cat(" \n -",p,": ", params[[p]], " \n")
+}
+```
+
+## Session
+
+```{r}
+sessionInfo()
+```
diff --git a/Notebooks/benchmark_report.Rmd b/Notebooks/benchmark_report.Rmd
new file mode 100644
index 0000000..c27b1b5
--- /dev/null
+++ b/Notebooks/benchmark_report.Rmd
@@ -0,0 +1,244 @@
+---
+title: "scCoAnnotate - Benchmarking"
+output:
+ html_document:
+ df_print: paged
+ theme: flatly
+ toc: yes
+ toc_float: yes
+ toc_depth: 1
+ code_folding: hide
+params:
+ tools: ''
+ ref_name: ''
+ pred_path: ''
+ fold: ''
+---
+
+```{r, echo=FALSE}
+knitr::opts_chunk$set(message = FALSE, warning=FALSE)
+```
+
+```{r}
+set.seed(1234)
+library(tidyverse)
+library(caret)
+library(ComplexHeatmap)
+```
+
+```{r}
+get_pred = function(pred, tool, true){
+ pred %>%
+ select(tool) %>%
+ mutate(label = .data[[tool]],
+ label = ifelse(!label %in% true$label, NA, label),
+ label = factor(label, ordered = TRUE)) %>%
+ return()
+}
+
+# Plot confusion matrix as a heatmap
+plot_cm = function(cm_table){
+ col_fun = circlize::colorRamp2(c(range(cm_table)[1],
+ range(cm_table)[2]/2,
+ range(cm_table)[2]),
+ c("#5C80BC", "#F2EFC7", "#FF595E"))
+
+ h = Heatmap(cm_table,
+ name = 'Counts',
+ col = col_fun,
+ width = ncol(cm_table)*unit(2, "mm"),
+ height = nrow(cm_table)*unit(2, "mm"),
+ cluster_rows = F,
+ cluster_columns = F,
+ row_names_gp = gpar(fontsize = 7),
+ column_names_gp = gpar(fontsize = 7),
+ column_title = 'True Class',
+ row_title = 'Predicted Class')
+
+ return(h)
+}
+
+# Plot class stat per fold (F1 etc) as barplot
+plot_stat = function(cm_byclass, stat){
+p = cm_byclass %>%
+ as.data.frame() %>%
+ rownames_to_column('class') %>%
+ separate(class, into = c(NA, 'class'), sep = ': ') %>%
+ ggplot(aes(reorder(class, -.data[[stat]]), .data[[stat]])) +
+ geom_bar(stat = 'identity', col = 'white', fill = 'lightgrey') +
+ theme_bw() +
+ theme(text = element_text(size = 10),
+ axis.title.x = element_blank(),
+ axis.line = element_line(size = 0.5),
+ panel.border = element_blank(),
+ panel.grid.major = element_blank(),
+ panel.grid.minor = element_blank(),
+ axis.text.x = element_text(angle = 45,
+ vjust = 1,
+ hjust=1),
+ aspect.ratio = 0.5) +
+ scale_y_continuous(expand = c(0, 0)) +
+ geom_hline(yintercept = c(1, 0.5), linetype = 'dotted', color = 'red')
+
+return(p)
+}
+
+# plot F1 accross folds for each class as a boxplot
+plot_stat_boxplot = function(list, tool, stat){
+
+df = lapply(list[[tool]], get_stat, stat = stat)
+
+bind_rows(df) %>%
+ ggplot(aes(reorder(class, -.data[[stat]], mean), .data[[stat]])) +
+ geom_boxplot() +
+ theme_bw() +
+ theme(text = element_text(size = 10),
+ axis.title.x = element_blank(),
+ axis.line = element_line(size = 0.5),
+ panel.border = element_blank(),
+ panel.grid.major = element_blank(),
+ panel.grid.minor = element_blank(),
+ axis.text.x = element_text(angle = 45,
+ vjust = 1,
+ hjust=1),
+ aspect.ratio = 0.5) +
+ scale_y_continuous(limits = c(0, 1),
+ expand = c(0, 0)) +
+ geom_hline(yintercept = c(1, 0.5), linetype = 'dotted', color = 'red')
+}
+
+# plot average stat for all tools
+plot_mean_tool = function(list, stat, tools){
+
+df = lapply(list, function(x){lapply(x, get_stat, stat = stat) %>% bind_rows()})
+
+df = bind_rows(df) %>%
+ group_by(class, tool) %>%
+ mutate(mean = mean(.data[[stat]])) %>%
+ distinct(class, tool, mean) %>%
+ pivot_wider(names_from = 'class', values_from = mean) %>%
+ column_to_rownames('tool')
+
+df[is.na(df)] = 0
+
+col_fun = circlize::colorRamp2(c(0,
+ range(df)[2]/2,
+ range(df)[2]),
+ c("#3B5B91", "#F2EFC7", "#CC0007"))
+
+split = c('Consensus', rep('tools', length(tools)-1))
+
+h = Heatmap(df,
+ name = paste('Mean ', stat),
+ col = col_fun,
+ width = ncol(df)*unit(4, "mm"),
+ height = nrow(df)*unit(6, "mm"),
+ row_names_side = 'left',
+ row_names_gp = gpar(fontsize = 12),
+ show_column_dend = F,
+ show_row_dend = F,
+ row_split = split,
+ cluster_row_slices = F,
+ row_title = NULL)
+
+return(h)
+}
+
+#--------- HELPER FUNCTIONS ----------------
+
+# gets stat for each fold and returns data frame
+get_stat = function(x, stat){
+ x$byClass %>%
+ as.data.frame() %>%
+ rownames_to_column('class') %>%
+ separate(class, into = c(NA, 'class'), sep = ': ') %>%
+ select(class, .data[[stat]]) %>%
+ mutate(fold = x$fold,
+ tool = x$tool)
+}
+#-------------------------------------------
+```
+
+```{r}
+tools = c('Consensus', strsplit(params$tools, split = ' ')[[1]])
+fold = as.numeric(params$fold)
+```
+
+```{r}
+# Read prediction and true labels for each tool and each fold and calculate confusion matrix and stats
+# Save everything in a list object with hierarchy TOOL > FOLD > STATS
+list = list()
+for(n in 1:fold){
+
+ # read tru lables
+ true = data.table::fread(paste0(params$pred_path, '/fold', n, '/test_labels.csv'), header = T) %>%
+ column_to_rownames('V1') %>%
+ mutate(label = factor(label, ordered = TRUE))
+
+ # read prediction summary for fold
+ pred = data.table::fread(paste0(params$pred_path, '/fold', n, '/Prediction_Summary.tsv'), header = T)%>%
+ column_to_rownames('cellname')
+
+ for(t in tools){
+
+ tmp = get_pred(pred, t, true)
+
+ list[[t]][[n]] = confusionMatrix(data = tmp$label, reference = true$label, mode = 'everything')
+ list[[t]][[n]]$fold = paste0('fold', n)
+ list[[t]][[n]]$tool = t
+ }
+}
+```
+
+```{r fig.width=10,echo=FALSE,message=FALSE,results="asis"}
+cat(" \n#", params$ref_name , "{.tabset} \n")
+
+cat(" \n## Summary \n")
+cat("Average F1 score per tool and class
")
+
+plot_mean_tool(list, 'F1', tools)
+
+cat("\n")
+
+for(t in tools) {
+ cat(" \n##", t, "{.tabset} \n")
+
+ print(plot_stat_boxplot(list, t, 'F1'))
+
+ cat("\n")
+
+ for(n in 1:fold){
+ cat(" \n###", paste0('Fold ', n), " \n")
+
+ cat("Confusion Matrix
")
+
+ draw(plot_cm(list[[t]][[n]]$table))
+
+ cat("F1
")
+
+ print(plot_stat(list[[t]][[n]]$byClass, 'F1'))
+
+ cat("\n")
+ }
+}
+```
+
+# Report Info
+
+## Parameters
+```{r echo=FALSE,message=FALSE,results="asis"}
+for(p in names(params)){
+ cat(" \n -",p,": ", params[[p]], " \n")
+}
+```
+
+
+## Session
+
+```{r}
+sessionInfo()
+```
+
+
+
+
diff --git a/README.md b/README.md
index 6fd72e8..9066f90 100644
--- a/README.md
+++ b/README.md
@@ -1,237 +1,653 @@
# scCoAnnotate
-# Summary
+Snakemake pipeline for consensus prediction of cell types in single-cell RNA sequencing (scRNA-seq) data. The pipeline allows users to run up to 15 different reference-based annotation tools (statistical models and machine learning approaches) to predict cell type labels of multiple scRNA-seq samples. It then outputs a consensus of the predictions, which has been found to have increased accuracy in benchmarking experiments compared to the individual predictions alone, by combining the strengths of the different approaches.
-scRNA-seq based prediction of cell-types using a fast and efficient Snakemake pipeline to increase automation and reduce the need to run several scripts and experiments. The pipeline allows the user to select what single-cell annotation tools they want to run on a selected reference to annotate a list of query datasets. It then outputs a consensus of the predictions across tools selected. This pipeline trains classifiers on genes common to the reference and all query datasets.
+The pipeline is automated and running it does not require prior knowledge of machine learning. It also features parallelization options to exploit available computational resources for maximal efficiency. This pipeline trains classifiers on genes common to the reference and all query datasets.
-The pipeline also features parallelization options to exploit the computational resources available.
+Two different workflows can be run as part of scCoAnnotate. The annotation workflow takes both a references data set and query samples with the aim of annotating the query samples. The benchmarking workflow takes only the reference and preforms a M fold cross validation.
-# Installation and Dependencies
+
-Install [Snakemake](https://snakemake.readthedocs.io/en/stable/) in your linux environment.
+See the snakemake rule graph for a more detailed description of the annotation workflow:
+[Annotation Workflow](rulegraph.annotation.pdf)
-You need to have have [R](https://www.r-project.org/) Version 4.0.5 and Python 3.6.5.
+# :running_woman: Quickstart tutorial
-```bash
-$ conda activate base
-$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
-```
+1. [Clone repository and install dependencies](#1-clone-repository-and-install-dependencies)
+2. [Prepare reference](#2-prepare-reference)
+3. [Prepare query samples](#3-prepare-query-samples)
+4. [Prepare config file](#4-prepare-config-file)
+5. [Prepare HPC submission script](#5-prepare-hpc-submission-script)
+### 1. Clone repository and install dependencies
+This step is not nessesary if you are part of the Kleinman group!
-You need to also install all the dependancies for the tools you plan on using. You have to copy everything present in this repository and not break paths because it would disrupt the dependancies. One note is to change the paths in run_ACTINN.py to match your own directories when you clone the repository. The paths are in lines 44,45,49.
+Clone git repository in appropriate location:
+```bash
+git clone https://github.com/fungenomics/scCoAnnotate.git
+```
+Install R packages and python modules as specified in [Installation and Dependencies](#gear-installation-and-dependencies)
+If you are part of the Kleinman group you only need to load the module on Narval or Hydra:
-Current version of snakemake is snakemake/5.32.0
+```bash
+module load scCoAnnotate/2.0
+```
-# Quickstart
+### 2. Prepare reference
-Using snakemake is straight forward and simple. The rules and processes are arranged as per this rule graph:
+The input format for the references is a **cell x gene matrix** (.csv) of raw counts and a **cell x label matrix** (.csv).
-Rule preprocess gets the common genes and creates temporary reference and query datasets based ob the common genes. Rule concat appends all predictions into one tab seperate file (prediction_summary.tsv) and gets the consensus prediction
+Both the **cell x gene matrix** and **cell x label matrix** need the first column to be the cell names in matching order with an empty column name.
+**cell x gene matrix**
+```bash
+'',gene1,gene2,gene3
+cell1,1,24,30
+cell2,54,20,61
+cell3,0,12,0
+cell4,1,13,17
+```
-![dag](https://user-images.githubusercontent.com/59002771/191146873-5c680bbd-d11c-418c-ae96-7662ee7f99ed.png)
+ **cell x label matrix**
+```bash
+'',label
+cell1,label1
+cell2,label1
+cell3,label3
+cell4,label2
+```
+### 3. Prepare query samples
+The input format for the query samples is a **cell x gene matrix** (.csv) of raw counts.
-You need to set everything up in a config file and then run the following command:
+The first column needs to be the cell names with an empty column name.
+**cell x gene matrix**
```bash
-snakemake --use-conda --configfile config.yml --cores 3
+'',gene1,gene2,gene3
+cell1,27,1,34
+cell2,0,12,56
+cell3,0,17,12
+cell4,54,20,61
```
-## Config File:
+### 4. Prepare config file
+
+For each set of query samples a config file needs to be prepared with information about the samples, the reference, the tools you want to run and how to calculate the consensus.
+
+Multiple references can be specified with an unique **reference name**.
+
+Full list of available tools can be found here: [Available tools](#hammer-and-wrench-available-tools)
+Make sure that the names of the selected tools have the same capitalization and format as this list.
+
+The consensus method selected in **consensus_tools** can either be 'all' (which uses all the tools in **tools_to_run**) or a list of tools to include.
+
+See: [Example Config](example.config.yml)
+
```yaml
-# target directory
-output_dir:
+# target directory
+output_dir: