Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up ion channel data and make 'ingest' more maintainable #319

Closed
mwatts15 opened this issue Aug 16, 2017 · 3 comments
Closed

Clean up ion channel data and make 'ingest' more maintainable #319

mwatts15 opened this issue Aug 16, 2017 · 3 comments
Assignees

Comments

@mwatts15
Copy link
Contributor

mwatts15 commented Aug 16, 2017

Ion channel data was recently added to PyOpenWorm, but there are a few things that should be done to make the translation more maintainable considering that the source data is still subject to change, so that future imports are likely.

  • Replace magic numbers for columns with descriptive names (e.g., what's special about 101 in the neurons - channels spreadsheet)
  • Detect and drop 'n/a', 'None', '' and similar for channel descriptions and expression patterns -- absence indicates lack of data in PyOpenWorm, so the n/a is superfluous
  • It looks like the neuron - channels relationships can be between neurons or classes of neurons in the data set, but we distinguish between these in PyOpenWorm. Code should be added to detect when a column is a neuron class (i.e., create a list of which columns are neuron classes and check against that) and only create Neurons.
  • Normalize the expression pattern data: it looks like there's some '|'-separated data in the expression_pattern column. It's not necessary now to make the effort of defining an expression pattern type, but at least the multiple entities embedded in that text should be broken out.

Data

  1. Neurons to channels
  2. Muscles to channels
  3. https://github.com/openworm/PyOpenWorm/blob/dev/OpenWormData/aux_data/ion_channel.csv
@mwatts15 mwatts15 changed the title Make ion channel data 'ingest' more maintainable Clean up ion channel data and make 'ingest' more maintainable Aug 16, 2017
@slarson
Copy link
Member

slarson commented Aug 16, 2017

Great stuff! One note, we can drop the 'classes of neurons' info out because the columns that specify ion channel to neuron relationships build on the 'classes of neurons' data and therefore having them both is redundant.

On the modifying the TSV-- for the simplicity of the data sources page, can we keep the TSVs to be raw dumps from google spreadsheets and handle skipping headers in the code? I get what you are saying about cleaner code, but we get better reproducibility here if there isn't an extra step to modify the TSV after it comes right out of our spreadsheet.

Thanks!!

@slarson
Copy link
Member

slarson commented Sep 18, 2017

@mwatts15 Did this ever get done in the course of GSoC / PRs by @shubhsingh594 ?

@mwatts15
Copy link
Contributor Author

@slarson No, things are as they were.

mwatts15 added a commit that referenced this issue Sep 20, 2017
@mwatts15 mwatts15 self-assigned this Sep 20, 2017
mwatts15 added a commit that referenced this issue Oct 24, 2017
- Making updates from the #319 branch
- Removing SQLite references
- Removing unneeded 'requirements' dependency
- Adding a couple of gitignores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants