Goal of the paper:
Develop a Python module (or R package?) to generate informative visualizations of the Physcraper results.
Explore: Do we really need a module or a workflow/tutorial/vignette would be enough?? See ETE cookboks, e.g., ncbi taxonomic database query and taxonomic tree visualization and reproducing an analysis from 1998.
We want people to be able to seamlessley:
- swap taxonomic names on output trees,
- compare tips on original and updated trees,
- visualize new branches in updated tree
- visualize conflict between original and updated tree
- for this we need a Python function that uses the conflict API
- conflict can only go vs synth tree, so compare original and updated tree vs synth
- then compare the two comparisons :p
Name of the module: TBD
Resources:
- Physcraper code repository on GitHub
- Examples of runs written on Jupyter notebook
- Plotting results with R
- Physcraper code documentation
- Physcraper published paper
- Collab paper first notes on google docs
- Authorship guidelines
- Ideas for research questions
- Tutorial to set up git and ssh
TODO:
- Demo how we create visualizations of the results using R
- Writing an ms with backwards design
- Think of good names for the module: physcraper-plots
- https://www.youtube.com/watch?v=euNvxWaRQMY save tables as pdf with Python
- Demo: setting up an opentree kernel, instructions here
- Check mantidae run for Randy, alignment assertion error
– Q: How to use new version of physcraper in our jupyter notebook? – Code review with EJM: Integrating the conflict API – adding documentation to functions
- Add a module to the Physcraper repo called
viz.py
: – worked on an independent branchviz-module
– create a virtualenv for this new branch, and install withpip install -e .
– added a test totests/test_viz.py
– Homework - Lucia: Remove box around tree - Luna: Add color option to new tip labels
- Demo: creating a function that reads, plots and saves a tree as pdf
- Reading and summarizing information from csv files:
python bin/tree_comparison.py -d OUTPUT/PATH/AND/DIR/NAME -o tmp
- Results from last week's challenge
- Started a jupyter notebook on this repo, to show results from the challenge.
- New challenge: finish adding code from results of last week's challenge into the new jupyter notebook
- Everybody has their personal cluster repo to work on
- Finish explaining the Physcraper results structure.
- Cover the outputs folder: tree files
- Show figures from paper
- Challenge:
- First part: look for Python modules that plot trees
- Part two: write the steps (functions) needed to read, plot, and save as pdf one of the output trees
- Hint: use pseudocode if needed
- Fixed permission issues from last session
- Decided we will no longer work on a shared repo on the cluster
- Cloned repo on our own cluster account
- Started exploring and explaining the Physcraper results structure.
- Reviewed what Physcraper does
- Covered the inputs folder: taxon ids, search group
- Covered the runs folder: blast results, alignments
- Overview of curating chronograms in opentree
- Cluster Ticket: git pull permissions for Randy and Kiana, and the rest of us
- EJM: error reading a DNA file from the database with eberybody’s run
- EJM and cluster:
kbeheshtian@mrcdlogin collab-paper]$ git pull
Warning: fetch updated the current branch head.
Warning: fast-forwarding your working tree from
Warning: commit e00cebf80107895335c440f57ba3b48707b5fe87.
Already up-to-date.
- Md syntax:
- create dropdown stuff
- add emojis
-
Q: Is there a way to increase the run time in the cluster while the job is still running?? - unfortunately, no: "this ability is reserved for system admins only. If you attempt to update the time limit of a running job, you will receive this error:
Access/permission denied for job <job id>
. You can update the time limit of a pending job only:scontrol update JobId=<job id> TimeLimit=<dd-hh:mm:ss>
." -
Creating personal workflow files as jupyter notebooks or markdown files.
- Markdown files - Success!
- Jupyter notebook - will work on these later if needed for the Python module.
- Overview of cluster errors:
- Not finding muscle (Lucia) - solved: do not use conda
- Error accessing collab-paper (Randy) - solved: give full path of genbank database
- Everybody starts a run on the cluster - success!
- Lucia - Primates
- Randy - Mantis
- Kiana - Podarcis
- Luna - Felis
- Summary of Physcraper
- EJM helped figure out errors and we tried running Physcraper again!
- ssh authentication locally
- Git cloning the collab-paper project repository
- Overview of Physcraper
- Choose a paper subject - Success!
- Python module to generate figures
- Running the Physcraper examples and a taxon of your interest on the cluster - People got some errors again
- Will fix them with EJM and try again next week!
- Running Physcraper on the Merced cluster with conda
- Run Lucia’s example:
- Download alignment
- Work on the jupyter notebook: Explain how to modify the alignment to run on Physcraper
- Run this on a physcraper docker locally:
time physcraper_run.py
--study_id pg_2407
--tree_id tree5076
-db local_blast_db_OLD
--search_taxon ott:913935
--alignment /project/linked_dir/alignments/M585.nex --aln_schema nexus
--bootstrap_reps 2
--output /project/linked_dir/luna_lucia_primates
-
Figure out why it is not running on the cluster - office hours?
-
How to run Physcraper with a local BLAST database
- The -db argument
-
See Kiana’s and other’s research questions
-
Work on the collab-paper repo
-
Linking the alignment data folder to physcraper docker
-
Next: Jasper reaching out to Haemosporida experts
- run docker on the UC server -> we do not need the physcraper docker, EJM is making the install with venv and conda so we can all run it