Error in check_no_rs_snp()
when joining with duplicate key values
#176
Labels
bug
Something isn't working
check_no_rs_snp()
when joining with duplicate key values
#176
Hi,
alright, I've been struggling with an edge-case again for a few days and I think I have found the reason for it. I hope at least...
If you think this should be implemented differently I'd gladly submit a PR in case we figure something out. Like simply matching against the index or something.
Edit: I guess this is a duplicate of #164
1. Bug description
Whenever many entries in the
SNP
column are NA, using the joining by key feature from data.table, e.g. incheck_no_rs_snp()
will result in data.table matching allNA
entries insumstats_dt
to allNA
s inmiss_rs
, leading to a quickly inflated number of rows.For example, when
imputation_ind = TRUE
, the following segment will cause a crash (line 415):Console output
Expected behaviour
I think having NA as
SNP
entry is rare and often CHR:BP:REF:ALT or something is used (I guess GWAS Catalog and openGWAS do this?). I am currently munging data from various other sources and my current problems come from the GBMI summary statistics, a large and prominent GWAS meta analysis consortium that sadly decided against using one of the suggested standard formats.Still, it might be desirable to handle this case more robustly.
2. Reproducible example
Code
Data
A summary statistics file that leads to errors can be downloaded from here: https://docs.google.com/spreadsheets/d/1sSU_JfPKs6EZLcY9t3gXsSGPZA1LsP_z/edit
3. Session info
(Add output of the R function
utils::sessionInfo()
below. This helps us assess version/OS conflicts which could be causing bugs.)The text was updated successfully, but these errors were encountered: