Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Re-organize the codebase #68

Open
qiyunzhu opened this issue May 27, 2017 · 0 comments
Open

(WIP) Re-organize the codebase #68

qiyunzhu opened this issue May 27, 2017 · 0 comments

Comments

@qiyunzhu
Copy link
Contributor

qiyunzhu commented May 27, 2017

Plan for Re-organizing the WGS-HGT codebase

Rationale

The original WGS-HGT was a workflow to benchmark multiple currently available HGT detection tools. Since then, the goal of the project has largely evolved and expanded. Meanwhile, some coding techniques and standards have been updated too. Thus I am planning on re-organizing the codebase to keep it updated.

Here we define that WGS-HGT is a loosen repository that hosts all relevent codes under the larger framework of the "Web of Life" project. Codes live here until they are migrated to more suitable repositories.

Naming

The phrase "WGS-HGT" isn't easy to pronounce, and doesn't precisely describe the current plan of the whole project. People have suggested two candidates:

  1. "weboflife", the same as the project name, least confusing, but when lowercased and merged looks bit awkward.
  2. "horizomer", which indicates that the complete set of horizontally acquired genes in a genome should be called a "horizome", and the goal of the software package is to identify them.

What do you think? Any new ideas are welcome!

Structure

The codebase shall be divided into the following second-level directories under wgshgt:

  • wrapper: Codes for running third-party programs, reformatting inputs and parsing outputs.
    • One program occupies one subdirectory, for modularity purpose. The directory should contain one Python script that provides programming interface for crosstalking with the program, and Bash scripts if necessary.
    • Codes that automate the installation of the programs should be included too, but their actual content can be migrated to conda recipes, leaving only interface.
  • data: Codes for constructing or retrieving data (e.g., random gene shuffler, genome evolution simulator, genome downloader), actual datasets (if small), and descriptions of large, external test datasets (if not automatically retrievable).
  • reference: Codes for building reference databases, including genome pool, gene family pool, species tree, gene tree, etc. Or just descriptions of reference databases.
    • If the scripts have to call external programs to fullfill the function, they should call wrappers rather than launching programs by themselves, unless the programs are very generic (e.g., a GNU tool).
  • predict: Codes for inferring HGT and other evolutionary events on individual input genomes. These are for end users to analyze their own datasets.
  • render: Codes for visualizing trees, networks and other forms of display items.
  • benchmark: Codes for performing benchmark of HGT-prediction methods and other tools.
  • misc: Codes that cannot fit into existing categories, or codes that have not been sufficiently engineered to live in other directories.

Each directory may contain a tests directory to host unit test scripts. Each tests directory may contains a data directory to store small data files for unit tests. But the unit test codes may also access datasets in first-level data directory.

Because individual steps for predicting, rendering and benchmarking may have to be executed in different work environments, most scripts should have command-line interface (via click).

Please share with people your valuable thoughts. Thank you!

@ekopylova @wasade @RNAer @mortonjt @sjanssen2 @antgonza @tkosciol

@qiyunzhu qiyunzhu changed the title (WIP) Reorganize WGS-HGT codebase (WIP) Re-organize the codebase May 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant