Error in cDict[[chrom_col]] : subscript out of bounds #83

mkoromina · 2022-03-08T16:39:52Z

## 1. Bug description

Upon running the finemap_loci() function, the above mentioned error message occurs.
(this is somehow a continuation from a thread in issue #74.)

### Console output

Full error message from console:
+ Extracting relevant variants from fullSS...
+ Query Method: tabix
+ QUERY: Chromosome = 22 ; Min position = 41143879 ; Max position = 41163879
echotabix:: Converting full summary stats file to tabix format for fast querying...
echotabix:: Ensuring file is bgzipped.
echotabix:: Tabix-indexing file.
Error in cDict[[chrom_col]] : subscript out of bounds
In addition: Warning messages:
1: In readLines(con = large_file, n = n) :
  line 1 appears to contain an embedded nul
2: In readLines(con = large_file, n = n) :
  line 2 appears to contain an embedded nul
Fine-mapping complete in:
Time difference of 5.5 secs
Returning results as nested list.

Expected behaviour

I'd expect the finemap_loci() function to run efficiently.
To note also, that upon running the script below, the munged sumstats (.parquet) are further zipped (i.e., .parquet.bgz), which I am not sure if it is somehow related to the error message.

##update: same error message produced when using .tsv files with MungeSumStats package.

## 2. Reproducible example
##Code

##all echoverse sub packages are loaded 

##import top SNPs (to be fine-mapped)
top_SNPs <- import_topSNPs(topSS = file.path("/path/to/my/top-snps"),
show_table = T, 
chrom_col = "CHR",
position_col = "BP", 
snp_col= "SNP",
pval_col= "P",
effect_col= "OR",
locus_col = "SNP",
grouping_vars = c("SNP"))
      
##path to GWAS summary stats
fullSS_path <- file.path("/path/to/my/parquet/file")

##run the fine-mapping function
my.results<- finemap_loci(top_SNPs = top_SNPs, 
                                          results_dir = file.path("path/to/results/folder"),
                                          loci = c("rs13044225","rs5758064"),#top_SNPs$Locus,
                                          dataset_name = "mydataset",
                                          dataset_type = "GWAS",  
                                          force_new_subset = TRUE,
                                          force_new_LD = FALSE,
                                          force_new_finemap = TRUE,
                                          remove_tmps = FALSE,
                                          
                                          # Munge full sumstats first
                                          munged = FALSE,
                                          
                                          # SUMMARY STATS ARGUMENTS
                                          fullSS_path = fullSS_path,
                                          fullSS_genome_build = "hg19",
                                          query_by = "tabix",  
                                          MAF_col = "calculate",   
                                         
                                          # FILTERING ARGUMENTS
                                          bp_distance = 10000,
                                          min_MAF = 0.001,  
                                          trim_gene_limits = FALSE,
                                         
                                          # FINE-MAPPING ARGUMENTS
                                          ## General
                                          finemap_methods = c("ABF","FINEMAP","SUSIE","POLYFUN_SUSIE"), 
                                          n_causal = 5,
                                          PP_threshold = .95, 
                                         
                                          # LD ARGUMENTS 
                                          LD_reference = "1KGphase1",#"UKB",
                                          superpopulation = "EUR",
                                          download_method = "axel",
                                          verbose = TRUE)

### Data
Unfortunately, I cannot upload the data, but here is a short description:
-My loci to be fine mapped are saved in a standard excel format with the following header:
SNP | CHR | BP | P | OR | SE.

-My GWAS sumstats have been munged with the munge_polyfun_sumstats.py and they are saved in a .parquet format.

## 3. Session info

R version 4.1.2 (2021-11-01)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] echotabix_0.99.2 echoplot_0.99.1 echoLD_0.99.1 echofinemap_0.99.0 echoconda_0.99.4 echodata_0.99.6 echoannot_0.99.3 echolocatoR_2.0.0

loaded via a [1] utf8_1.2.2 [6] AnnotationDbi_1.56.2 [11] munsell_0.5.0 [16] OrganismDbi_1.36.0 [21] DescTools_0.99.44 [26] rprojroot_2.0.2 [31] BiocFileCache_2.2.1 [36] cachem_1.0.6 [41] scales_1.1.1 [46] ggbio_1.42.0 [51] rtracklayer_1.54.0 [56] BiocManager_1.30.16 [61] GenomicFeatures_1.46.5 [66] tools_4.1.2 [71] proxy_0.4-26 [76] base64enc_0.1-3 [81] prettyunits_1.1.1 [86] ggrepel_0.9.1 [91] data.table_1.14.2 [96] ProtGenerics_1.26.0 [101] jpeg_0.1-9 [106] tibble_3.1.6 [111] Formula_1.2-4 [116] DBI_1.1.2 [121] Matrix_1.4-0 [126] parallel_4.1.2 [131] RCircos_1.2.2 [136] stringr_1.4.0 [141] htmlTable_2.4.0 [146] rjson_0.2.21 [151] viridisLite_0.4.0 [156] lattice_0.20-45 [161] survival_3.3-1 [166] Rgraphviz_2.38.0 [171] latticeExtra_0.6-29 [176] ape_5.6-2 namespace (and not attached):
reticulate_1.24 R.utils_2.11.0 tidyselect_1.1.2 RSQLite_2.2.10
htmlwidgets_1.5.4 grid_4.1.2 BiocParallel_1.28.3 XGR_1.1.7
DT_0.21 colorspace_2.0-3 Biobase_2.54.0 filelock_1.0.2
knitr_1.37 supraHex_1.32.0 rstudioapi_0.13 stats4_4.1.2
MatrixGenerics_1.6.0 GenomeInfoDbData_1.2.7 mixsqp_0.3-43 bit64_4.0.5
vctrs_0.3.8 generics_0.1.2 xfun_0.30 biovizBase_1.42.0
R6_2.5.1 GenomeInfoDb_1.30.1 AnnotationFilter_1.18.0 bitops_1.0-7
reshape_0.8.8 DelayedArray_0.20.0 assertthat_0.2.1 BiocIO_1.4.0
nnet_7.3-17 rootSolve_1.8.2.3 gtable_0.3.0 lmom_2.8
ensembldb_2.18.3 rlang_1.0.2 clisymbols_1.2.0 splines_4.1.2
lazyeval_0.2.2 dichromat_2.0-0 hexbin_1.28.2 checkmate_2.0.0
yaml_2.3.5 reshape2_1.4.4 crosstalk_1.2.0 snpStats_1.44.0
ggnetwork_0.5.10 backports_1.4.1 Hmisc_4.6-0 RBGL_1.70.0
ggplot2_3.3.5 ellipsis_0.3.2 jquerylib_0.1.4 RColorBrewer_1.1-2
BiocGenerics_0.40.0 coloc_5.1.1 Rcpp_1.0.8 plyr_1.8.6
progress_1.2.2 zlibbioc_1.40.0 purrr_0.3.4 RCurl_1.98-1.6
rpart_4.1.16 viridis_0.6.2 S4Vectors_0.32.3 SummarizedExperiment_1.24.0
cluster_2.1.2 here_1.0.1 fs_1.5.2 magrittr_2.0.2
dnet_1.1.7 openxlsx_4.2.5 gh_1.3.0 mvtnorm_1.1-3
matrixStats_0.61.0 hms_1.1.1 patchwork_1.1.1 XML_3.99-0.9
IRanges_2.28.0 gridExtra_2.3 compiler_4.1.2 biomaRt_2.50.3
crayon_1.5.0 R.oo_1.24.0 htmltools_0.5.2 tzdb_0.2.0
tidyr_1.2.0 expm_0.999-6 Exact_3.1 lubridate_1.8.0
dbplyr_2.1.1 MASS_7.3-55 rappdirs_0.3.3 boot_1.3-28
readr_2.1.2 piggyback_0.1.1 cli_3.2.0 R.methodsS3_1.8.1
igraph_1.2.11 GenomicRanges_1.46.1 pkgconfig_2.0.3 GenomicAlignments_1.30.0
foreign_0.8-82 xml2_1.3.3 bslib_0.3.1 XVector_0.34.0
VariantAnnotation_1.40.0 digest_0.6.29 graph_1.72.0 Biostrings_2.62.0
gld_2.6.4 restfulr_0.0.13 curl_4.3.2 Rsamtools_2.10.0
seqminer_8.4 lifecycle_1.0.1 nlme_3.1-155 jsonlite_1.8.0
BSgenome_1.62.0 fansi_1.0.2 susieR_0.11.92 pillar_1.7.0
GGally_2.1.2 KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2
glue_1.6.2 zip_2.2.0 png_0.1-7 bit_4.0.4
sass_0.4.0 class_7.3-20 stringi_1.7.6 blob_1.2.2
memoise_2.0.1 dplyr_1.0.8 irlba_2.3.5 e1071_1.7-9

The text was updated successfully, but these errors were encountered:

mkoromina · 2022-03-08T16:42:28Z

May I just say that I am not sure how the bug label was added in there, but please feel free to remove it and leave it as an issue to be solved! Thanks a lot!

bschilder · 2022-03-16T23:11:56Z

Seems like there's at least several things going on here.

Most importantly, echolocatoR doesn't support sumstats provided in parquet file format (yet). When you write "/path/to/my/parquet/file" do you literally mean a parquet file? Because that won't work. As I mentioned here, you need to convert it to a format that data.table can recognize (e.g. tsv.gz). There's a number of ways to do this, but if you prefer to stay in R, you can use arrow.

dat <- arrow::read_parquet(filepath)
data.table::fwrite(dat, newfilepath, sep="\t")

PS - I set the bug label to be automatic, so don't worry about it!

mkoromina · 2022-03-17T13:38:04Z

Hi @bschilder, thank you very much for this! I experienced though the same issue when using .tsv files created by MungeSumstats. I will try though your way with arrow package and come back to you if needed. Thanks a lot once again!

bschilder · 2022-03-17T15:49:37Z

Ok, keep me posted!
I should also mention I'm looking into some issues with munging and tabix indexing PGC files such as the BD2021 dataset.
https://github.com/neurogenomics/MungeSumstats/issues/91

mkoromina · 2022-03-24T21:09:01Z

Hey, Just an update in here, so as to let you know that .parquet format won't work (even when reformatted via arrow::read_parquet to a .tsv.gz file). The main issue is actually error messages occurring and which have to do with its appropriate tabix indexing in downstream analysis.

mkoromina added the bug Something isn't working label Mar 8, 2022

bschilder closed this as completed Mar 16, 2022

bschilder added this to 🦇🦇 echoverse 🦇🦇 Aug 26, 2022

bschilder moved this to Done in 🦇🦇 echoverse 🦇🦇 Aug 26, 2022

bschilder self-assigned this Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in cDict[[chrom_col]] : subscript out of bounds #83

Error in cDict[[chrom_col]] : subscript out of bounds #83

mkoromina commented Mar 8, 2022 •

edited by bschilder

Loading

mkoromina commented Mar 8, 2022 •

edited

Loading

bschilder commented Mar 16, 2022 •

edited

Loading

mkoromina commented Mar 17, 2022

bschilder commented Mar 17, 2022

mkoromina commented Mar 24, 2022

Error in cDict[[chrom_col]] : subscript out of bounds #83

Error in cDict[[chrom_col]] : subscript out of bounds #83

Comments

mkoromina commented Mar 8, 2022 • edited by bschilder Loading

Expected behaviour

mkoromina commented Mar 8, 2022 • edited Loading

bschilder commented Mar 16, 2022 • edited Loading

mkoromina commented Mar 17, 2022

bschilder commented Mar 17, 2022

mkoromina commented Mar 24, 2022

mkoromina commented Mar 8, 2022 •

edited by bschilder

Loading

mkoromina commented Mar 8, 2022 •

edited

Loading

bschilder commented Mar 16, 2022 •

edited

Loading