Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update original csv #7

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Update original csv #7

wants to merge 28 commits into from

Conversation

GuiraudTresor
Copy link
Contributor

@GuiraudTresor GuiraudTresor commented Aug 25, 2022

splitted the genus names from the species names in the species column and all unclassified species i.e sp. were substituted to an empty space, but will later on be removed from the database.

@sjanssen2
Copy link
Member

image

looks like latest changes regarding formatting of pH_Optimum values are not yet merged

@sjanssen2
Copy link
Member

image

@sjanssen2
Copy link
Member

here is my suggestion for critical Species names:
image
all other species name issues should be solved by two rules:

  1. strip white spaces
  2. if ' subsp. ' is in the species name, split on ' subsp. ' take the left part, delete genus name

@sjanssen2
Copy link
Member

to allow copy and paste, same table as above

<title></title>
<meta name="generator" content="LibreOffice 6.4.7.2 (Linux)"/>
<style type="text/css">
	body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
	a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em;  } 
	a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em;  } 
	comment { display:none;  } 
</style>
old_Genus old_Species new_Genus new_Species 0 stefans suggestion
Anaerosalibacter Anaerosalibacter sp. Anaerosalibacter sp 1 Anaerosalibacter sp.
Aphanizomenon flos-aquae Aphanizomenon aquae 7 flos-aquae
Bacillus Bacillus safensis subsp. osmophilus Bacillus safensis 7 safensis
Brevundimonas Brevundimonas sp. Brevundimonas sp 1 Brevundimonas sp.
Dolichospermum flos-aquae Dolichospermum aquae 7 flos-aquae
Finegoldia Finegoldia sp. Finegoldia sp 1 Finegoldia sp.
Halomonas denitri<81>cans Halomonas a5 8 denitrificans
Methanothermobacter Methanothermobacter sp. Methanothermobacter sp 2 Methanothermobacter sp.
Mucilaginibacter rigui Mucilaginibacter rigui Mucilaginibacter rigui rigui 6 rigui
Mycobacterium gordonae Mycobacterium paragordonae Mycobacterium gordonae paragordonae 3 paragordonae
Oscillatoria nigro-viridis Oscillatoria viridis 4 nigro-viridis
Paulownia witches-broom Paulownia broom 3 witches-broom
Plasticicumulans not yet known; article proposed:Plasticicumulans lactativoran Plasticicumulans yet 5 lactativoran
Pseudoclavibacter Pseudoclavibacter sp. Pseudoclavibacter sp 14 Pseudoclavibacter sp.
Ruania albidi<89>ava Ruania a5 8 albidiflava
Selenomonas Selenomonas sp. Selenomonas sp 1 Selenomonas sp.
Sphingobacterium Sphingobacterium composti [homonym] Sphingobacterium composti 7 composti
Thalassospira A40-3 Thalassospira 3 1 A40-3
Thermovibrio ammoni<81>cans Thermovibrio a5 10 ammonificans
archaeon GW2011_AR10 archaeon AR10 3 GW2011_AR10
archaeon GW2011_AR20 archaeon AR20 3 GW2011_AR20
haloarchaeon 3A1-DGR haloarchaeon DGR 3 3A1-DGR
olei IMMIBHF-1T olei 1T 2 IMMIBHF-1T

@GuiraudTresor
Copy link
Contributor Author

GuiraudTresor commented Sep 1, 2022 via email

@sjanssen2
Copy link
Member

sjanssen2 commented Sep 1, 2022 via email

@GuiraudTresor
Copy link
Contributor Author

GuiraudTresor commented Sep 1, 2022 via email

@sjanssen2
Copy link
Member

there are still some errors left, e.g.
image

@sjanssen2
Copy link
Member

Automatic tests start becoming useful, i.e. you might want to start reading their error messages:

AssertionError: False is not true : all Genus and Species names should be non-emptry:
                Genus       Species          trait source          val
370478            NaN  Streptomyces  Gram_positive    JGI     1.000000
144     Acaryochloris           NaN     GC_content   NCBI    46.716700
145     Acaryochloris           NaN    Gene_number   NCBI  6849.000000
146     Acaryochloris           NaN      Genome_Mb   NCBI     7.254700
1097      Acetobacter           NaN     Copies_16S  rrnDB     5.000000
...               ...           ...            ...    ...          ...
370239      bacterium           NaN      Genome_Mb   NCBI     1.089570
370240      bacterium           NaN      Genome_Mb   NCBI     1.089560
370241      bacterium           NaN      Genome_Mb   NCBI     0.943189
370242      bacterium           NaN      Genome_Mb   NCBI     0.916044
370243      bacterium           NaN      Genome_Mb   NCBI     0.761814

@sjanssen2
Copy link
Member

why are you doing that: all unclassified species i.e sp. were substituted to an empty space ? They are not unclassified, the species and genus name are identical, thus the suffix cp..

You should NOT remove rows in this PR!

@GuiraudTresor
Copy link
Contributor Author

Sp. does not stands for the specie name. Sp. is just the abbreviation for Species. Because the exact name of the species are not known Sp. was written in the species column. That is the reason, why it is considered as unclassified. Guitar did the same in his Rcode, as he excluded rows with Sp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants