Skip to content

McTavishLab/collab-paper

Repository files navigation

Welcome to the first McTavish Lab collab paper repo!

Goal of the paper:

Develop a Python module (or R package?) to generate informative visualizations of the Physcraper results.

Explore: Do we really need a module or a workflow/tutorial/vignette would be enough?? See ETE cookboks, e.g., ncbi taxonomic database query and taxonomic tree visualization and reproducing an analysis from 1998.

We want people to be able to seamlessley:

  • swap taxonomic names on output trees,
  • compare tips on original and updated trees,
    • visualize new branches in updated tree
  • visualize conflict between original and updated tree
    • for this we need a Python function that uses the conflict API
    • conflict can only go vs synth tree, so compare original and updated tree vs synth
    • then compare the two comparisons :p

Name of the module: TBD

Resources:

  1. Physcraper code repository on GitHub
  2. Examples of runs written on Jupyter notebook
  3. Plotting results with R
  4. Physcraper code documentation
  5. Physcraper published paper
  6. Collab paper first notes on google docs
  7. Authorship guidelines
  8. Ideas for research questions
  9. Tutorial to set up git and ssh

Project log

TODO:

May 4

  • Check mantidae run for Randy, alignment assertion error

April 27, 2022

– Q: How to use new version of physcraper in our jupyter notebook? – Code review with EJM: Integrating the conflict API – adding documentation to functions

April 20, 2022

  • Add a module to the Physcraper repo called viz.py: – worked on an independent branch viz-module – create a virtualenv for this new branch, and install with pip install -e . – added a test to tests/test_viz.py – Homework - Lucia: Remove box around tree - Luna: Add color option to new tip labels

April 14, 2022

  • Demo: creating a function that reads, plots and saves a tree as pdf
  • Reading and summarizing information from csv files:
    • python bin/tree_comparison.py -d OUTPUT/PATH/AND/DIR/NAME -o tmp

April 6, 2022

  • Results from last week's challenge
  • Started a jupyter notebook on this repo, to show results from the challenge.
  • New challenge: finish adding code from results of last week's challenge into the new jupyter notebook

March 30, 2022

  • Everybody has their personal cluster repo to work on
  • Finish explaining the Physcraper results structure.
    • Cover the outputs folder: tree files
    • Show figures from paper
    • Challenge:
      • First part: look for Python modules that plot trees
      • Part two: write the steps (functions) needed to read, plot, and save as pdf one of the output trees
      • Hint: use pseudocode if needed

March 23, 2022 - Spring break

March 16, 2022 - Cancelled

March 9, 2022

  • Fixed permission issues from last session
  • Decided we will no longer work on a shared repo on the cluster
    • Cloned repo on our own cluster account
  • Started exploring and explaining the Physcraper results structure.
    • Reviewed what Physcraper does
    • Covered the inputs folder: taxon ids, search group
    • Covered the runs folder: blast results, alignments
    • Overview of curating chronograms in opentree

March 2, 2022

kbeheshtian@mrcdlogin collab-paper]$ git pull
Warning: fetch updated the current branch head.
Warning: fast-forwarding your working tree from
Warning: commit e00cebf80107895335c440f57ba3b48707b5fe87.
Already up-to-date.
  • Md syntax:
    • create dropdown stuff
    • add emojis

Feb 23, 2022

  • Q: Is there a way to increase the run time in the cluster while the job is still running?? - unfortunately, no: "this ability is reserved for system admins only. If you attempt to update the time limit of a running job, you will receive this error: Access/permission denied for job <job id>. You can update the time limit of a pending job only: scontrol update JobId=<job id> TimeLimit=<dd-hh:mm:ss>."

  • Creating personal workflow files as jupyter notebooks or markdown files.

    • Markdown files - Success!
    • Jupyter notebook - will work on these later if needed for the Python module.

Feb 16, 2022

  • Overview of cluster errors:
    • Not finding muscle (Lucia) - solved: do not use conda
    • Error accessing collab-paper (Randy) - solved: give full path of genbank database
  • Everybody starts a run on the cluster - success!
    • Lucia - Primates
    • Randy - Mantis
    • Kiana - Podarcis
    • Luna - Felis

Feb 9, 2022

  • Summary of Physcraper
  • EJM helped figure out errors and we tried running Physcraper again!
  • ssh authentication locally
  • Git cloning the collab-paper project repository

Feb 2, 2022

  • Overview of Physcraper
  • Choose a paper subject - Success!
    • Python module to generate figures
  • Running the Physcraper examples and a taxon of your interest on the cluster - People got some errors again
    • Will fix them with EJM and try again next week!

Dec meeting:

  • Running Physcraper on the Merced cluster with conda
  • Run Lucia’s example:
    • Download alignment
    • Work on the jupyter notebook: Explain how to modify the alignment to run on Physcraper
  • Run this on a physcraper docker locally:
time physcraper_run.py 
--study_id pg_2407 
--tree_id tree5076 
-db local_blast_db_OLD 
--search_taxon ott:913935 
--alignment /project/linked_dir/alignments/M585.nex --aln_schema nexus 
--bootstrap_reps 2 
--output /project/linked_dir/luna_lucia_primates
  • Figure out why it is not running on the cluster - office hours?

  • How to run Physcraper with a local BLAST database

    • The -db argument
  • See Kiana’s and other’s research questions

  • Work on the collab-paper repo

  • Linking the alignment data folder to physcraper docker

  • Next: Jasper reaching out to Haemosporida experts

Nov meeting:

  • run docker on the UC server -> we do not need the physcraper docker, EJM is making the install with venv and conda so we can all run it