Error in standardize_subset #109

PhoebeGuo97 · 2022-09-16T17:40:14Z

1. Bug description

Due to an issue as described in https://github.com/neurogenomics/MungeSumstats/issues/117, I ran MSS with tabix_index=FALSE and then ran tabix.createIndex(fullSS_path). After that, running finemap_loci got an error: Error in standardize_subset(locus = locus, top_SNPs = top_SNPs, fullSS_genome_build = fullSS_genome_build, :
Could not find any rows in full data that matched query :(

Any help will be appreciated!

Expected behaviour

(A clear and concise description of what you expected to happen.)

2. Reproducible example

Code

fullSS_path <- "/mnt/belinda_local/fangfei/data/Jansen2019/AD_sumstats_Jansenetal_2019sept.txt.gz"
fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path, ref_genome = "GRCH37",tabix_index=FALSE,force_new=TRUE, save_path='/mnt/belinda_local/fangfei/data/Jansen2019/formatted_Jansen2019.tsv.gz')

tabix.createIndex(fullSS_path)

Jansen_2019.results <- finemap_loci(
  # GENERAL ARGUMENTS
  top_SNPs = top_SNPs,
  #  It's best to give absolute paths
  results_dir = "/mnt/belinda_local/fangfei/data/echolocatoR",
  loci = top_SNPs$Locus,
  dataset_name = "Jansen_2019",
  dataset_type = "GWAS",
  force_new_subset = FALSE,
  force_new_LD = FALSE,
  force_new_finemap = TRUE,
  remove_tmps = FALSE,
  
  # Munge full sumstats first
  munged = TRUE,
  
  # SUMMARY STATS ARGUMENTS
  fullSS_path = fullSS_path,
  fullSS_genome_build = "hg19",
  query_by ="tabix",
  MAF_col = "EAF",
  effect_col = "BETA",
  stderr_col = "SE",
  sample_size = "Nsum",
  N_cases_col = "Neff",
  A1_col = "A1",
  A2_col = "A2",
  
  # FILTERING ARGUMENTS
  ## It's often desirable to use a larger window size
  ## (e.g. 2Mb which is bp_distance=500000*2),
  ## but we use a small window here to speed up the process.
  bp_distance = 10000,#500000*2,
  min_MAF = 0.001,
  trim_gene_limits = FALSE,
  
  # FINE-MAPPING ARGUMENTS
  ## General
  finemap_methods = c("ABF","FINEMAP","SUSIE","POLYFUN_SUSIE"),
  n_causal = 5,
  PP_threshold = .95,
  
  # LD ARGUMENTS
  LD_reference = "UKB",#"UKB",
  #superpopulation = "EUR",
  download_method = "axel",
  
  # PLOT ARGUMENTS
  ## general
  plot.types = c("simple"),
  ## Generate multiple plots of different window sizes;
  ### all SNPs, 4x zoomed-in, and a 50000bp window
  plot.zoom = c("all","4x","10x"),
  ## XGR
  # plot.XGR_libnames=c("ENCODE_TFBS_ClusteredV3_CellTypes"),
  ## Roadmap
  plot.Roadmap = FALSE,
  plot.Roadmap_query = NULL,
  # Nott et al. (2019)
  plot.Nott_epigenome = F,
  plot.Nott_show_placseq = F,
  
  verbose = TRUE
)

Console output

[1] "+ CONDA:: 'echoR' conda environment not found. Using default 'base' instead."
[1] "Checking for tabix installation..."
[1] "Checking for bcftools installation..."
[1] "+ Assuming sumstats have already been processed with MungeSumstats"

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
1 (1 / 29)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "+ Extracting relevant variants from fullSS..."
[1] "+ Query Method: tabix"
[1] "+ QUERY: Chromosome = 1 ; Min position = 161145392 ; Max position = 161165392"
[1] "TABIX:: Copying existing tabix file ==> /mnt/belinda_local/fangfei/data/Jansen2019/formatted_Jansen2019.tsv.gz"
[1] "Determining chrom type from file header"
[1] "Chromosome format = 1"
[1] "TABIX:: Extracting subset of sum stats"
[1] "+ TABIX:: Returning 0 x 0 data.table"
[1] "++ Saving query ==> /mnt/belinda_local/fangfei/data/echolocatoR/GWAS/Jansen_2019/1/1_Jansen_2019_subset.tsv.gz"
[1] "LD:: Standardizing summary statistics subset."
Error in standardize_subset(locus = locus, top_SNPs = top_SNPs, fullSS_genome_build = fullSS_genome_build,  : 
  Could not find any rows in full data that matched query :(
In addition: Warning messages:
1: In data.table::fwrite(dat, file = subset_path, nThread = nThread,  :
  Input has no columns; creating an empty file at '/mnt/belinda_local/fangfei/data/echolocatoR/GWAS/Jansen_2019/1/1_Jansen_2019_subset.tsv.gz' and exiting.
2: In data.table::fread(subset_path, nrows = 2) :
  File '/mnt/belinda_local/fangfei/data/echolocatoR/GWAS/Jansen_2019/1/1_Jansen_2019_subset.tsv.gz' has size 0. Returning a NULL data.table.

Data

https://ctg.cncr.nl/documents/p1651/AD_sumstats_Jansenetal_2019sept.txt.gz

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] seqminer_8.4        GenomeInfoDb_1.33.7 IRanges_2.31.2      S4Vectors_0.35.3    BiocGenerics_0.43.4

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.26             R.utils_2.12.0              tidyselect_1.1.2           
  [5] RSQLite_2.2.17              AnnotationDbi_1.59.1        htmlwidgets_1.5.4           grid_4.2.1                 
  [9] BiocParallel_1.31.12        XGR_1.1.7                   munsell_0.5.0               codetools_0.2-18           
 [13] interp_1.1-3                DT_0.25                     colorspace_2.0-3            OrganismDbi_1.39.1         
 [17] Biobase_2.57.1              filelock_1.0.2              knitr_1.40                  supraHex_1.35.0            
 [21] rstudioapi_0.14             DescTools_0.99.46           MatrixGenerics_1.9.1        GenomeInfoDbData_1.2.8     
 [25] mixsqp_0.3-43               bit64_4.0.5                 echoconda_0.99.6            basilisk_1.9.10            
 [29] vctrs_0.4.1                 generics_0.1.3              xfun_0.33                   biovizBase_1.45.0          
 [33] BiocFileCache_2.5.0         R6_2.5.1                    AnnotationFilter_1.21.0     bitops_1.0-7               
 [37] cachem_1.0.6                reshape_0.8.9               DelayedArray_0.23.2         assertthat_0.2.1           
 [41] BiocIO_1.7.1                scales_1.2.1                nnet_7.3-17                 rootSolve_1.8.2.3          
 [45] gtable_0.3.1                lmom_2.9                    ggbio_1.45.0                ensembldb_2.21.4           
 [49] rlang_1.0.5                 MungeSumstats_1.5.13        echodata_0.99.12            splines_4.2.1              
 [53] lazyeval_0.2.2              rtracklayer_1.57.0          gargle_1.2.1                dichromat_2.0-0.1          
 [57] hexbin_1.28.2               checkmate_2.1.0             BiocManager_1.30.18         yaml_2.3.5                 
 [61] reshape2_1.4.4              snpStats_1.47.1             GenomicFeatures_1.49.6      ggnetwork_0.5.10           
 [65] backports_1.4.1             Hmisc_4.7-1                 RBGL_1.73.0                 tools_4.2.1                
 [69] echoplot_0.99.3             ggplot2_3.3.6               ellipsis_0.3.2              RColorBrewer_1.1-3         
 [73] proxy_0.4-27                coloc_5.2.0                 Rcpp_1.0.9                  plyr_1.8.7                 
 [77] base64enc_0.1-3             progress_1.2.2              zlibbioc_1.43.0             purrr_0.3.4                
 [81] RCurl_1.98-1.8              basilisk.utils_1.9.3        prettyunits_1.1.1           rpart_4.1.16               
 [85] deldir_1.0-6                viridis_0.6.2               SummarizedExperiment_1.27.3 ggrepel_0.9.1              
 [89] cluster_2.1.4               fs_1.5.2                    magrittr_2.0.3              data.table_1.14.2          
 [93] echotabix_0.99.7            dnet_1.1.7                  openxlsx_4.2.5              mvtnorm_1.1-3              
 [97] ProtGenerics_1.29.0         matrixStats_0.62.0          patchwork_1.1.2             hms_1.1.2                  
[101] XML_3.99-0.10               echolocatoR_2.0.0           jpeg_0.1-9                  readxl_1.4.1               
[105] gridExtra_2.3               compiler_4.2.1              biomaRt_2.53.2              tibble_3.1.8               
[109] crayon_1.5.1                R.oo_1.25.0                 htmltools_0.5.3             echoannot_0.99.7           
[113] tzdb_0.3.0                  Formula_1.2-4               tidyr_1.2.1                 expm_0.999-6               
[117] Exact_3.1                   DBI_1.1.3                   dbplyr_2.2.1                MASS_7.3-58.1              
[121] rappdirs_0.3.3              boot_1.3-28                 Matrix_1.5-1                readr_2.1.2                
[125] piggyback_0.1.4             cli_3.4.0                   R.methodsS3_1.8.2           echofinemap_0.99.2         
[129] parallel_4.2.1              igraph_1.3.4                GenomicRanges_1.49.1        pkgconfig_2.0.3            
[133] GenomicAlignments_1.33.1    dir.expiry_1.5.1            RCircos_1.2.2               foreign_0.8-82             
[137] xml2_1.3.3                  XVector_0.37.1              echoLD_0.99.6               stringr_1.4.1              
[141] VariantAnnotation_1.43.3    digest_0.6.29               graph_1.75.0                Biostrings_2.65.6          
[145] cellranger_1.1.0            htmlTable_2.4.1             gld_2.6.5                   restfulr_0.0.15            
[149] curl_4.3.2                  Rsamtools_2.13.4            rjson_0.2.21                lifecycle_1.0.2            
[153] nlme_3.1-159                jsonlite_1.8.0              viridisLite_0.4.1           BSgenome_1.65.2            
[157] fansi_1.0.3                 susieR_0.12.27              downloadR_0.99.3            pillar_1.8.1               
[161] lattice_0.20-45             GGally_2.1.2                KEGGREST_1.37.3             fastmap_1.1.0              
[165] httr_1.4.4                  survival_3.4-0              googleAuthR_2.0.0           glue_1.6.2                 
[169] remotes_2.4.2               zip_2.2.1                   png_0.1-7                   bit_4.0.4                  
[173] Rgraphviz_2.41.1            class_7.3-20                stringi_1.7.8               blob_1.2.3                 
[177] latticeExtra_0.6-30         memoise_2.0.1               dplyr_1.0.10                irlba_2.3.5                
[181] e1071_1.7-11                ape_5.6-2

The text was updated successfully, but these errors were encountered:

bschilder · 2022-09-20T10:16:34Z

Hi @PhoebeGuo97, it looks like you're following the documentation from echolocatoR 1.0 with echolocatoR 2.0. Try running the example at the bottom of the help page that appears when you use: ?echolocatoR::finemap_loci

I realize this is a bit confusing as the documentation website has not yet been updated (i have to get it passing all GitHub Actions checks first). So in the meantime, please follow the documentation via the ?<function_name> help pages instead.

PhoebeGuo97 · 2022-09-22T19:25:37Z

Hi @bschilder, I tried using echolocatoR 2.0 and running the example. I got a new error: Error: object 'find_executables_remote' is not exported by 'namespace:echoconda'.

bschilder · 2022-10-29T10:47:11Z

Hi @bschilder, I tried using echolocatoR 2.0 and running the example. I got a new error: Error: object 'find_executables_remote' is not exported by 'namespace:echoconda'.

Hi @PhoebeGuo97, sorry for the delay. This was due to a update in echoconda where the function name had changed. Can you try updating all the echoverse packages and try again?

devtools::install_github("RajLabMSSM/echolocatoR", dependencies = TRUE)

Best,
Brian

bschilder · 2022-10-29T12:52:10Z

After reinstalling, can you try the following example @PhoebeGuo97 ? This seems to work for me.

#### Munge sumstats ####
fullSS_path <- MungeSumstats::format_sumstats(path = "https://ctg.cncr.nl/documents/p1651/AD_sumstats_Jansenetal_2019sept.txt.gz",
                                              ref_genome = "GRCH37",
                                              tabix_index=FALSE,
                                              force_new=TRUE)
#### Prepare topSNPs dataframe ####
topSNPs <- echodata::import_topSNPs(topSS = "https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0311-9/MediaObjects/41588_2018_311_MOESM3_ESM.xlsx",
                                    sheet = "Table S2",
                                    startRow=5,
                                    cols=1:13,
                                    munge = TRUE)
#### Run echolocatoR #### 
res <- echolocatoR::finemap_loci(
  fullSS_path = fullSS_path,
  topSNPs = topSNPs[1:2,], # test just the first 2 loci
  finemap_methods = c("ABF","FINEMAP","SUSIE"),
  dataset_name = "Jansen2019",
  fullSS_genome_build = "hg19",
  bp_distance = 1000,
  munged = TRUE)

PhoebeGuo97 · 2022-11-03T17:37:48Z

After reinstalling, can you try the following example @PhoebeGuo97 ? This seems to work for me.

#### Munge sumstats ####
fullSS_path <- MungeSumstats::format_sumstats(path = "https://ctg.cncr.nl/documents/p1651/AD_sumstats_Jansenetal_2019sept.txt.gz",
                                              ref_genome = "GRCH37",
                                              tabix_index=FALSE,
                                              force_new=TRUE)
#### Prepare topSNPs dataframe ####
topSNPs <- echodata::import_topSNPs(topSS = "https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0311-9/MediaObjects/41588_2018_311_MOESM3_ESM.xlsx",
                                    sheet = "Table S2",
                                    startRow=5,
                                    cols=1:13,
                                    munge = TRUE)
#### Run echolocatoR #### 
res <- echolocatoR::finemap_loci(
  fullSS_path = fullSS_path,
  topSNPs = topSNPs[1:2,], # test just the first 2 loci
  finemap_methods = c("ABF","FINEMAP","SUSIE"),
  dataset_name = "Jansen2019",
  fullSS_genome_build = "hg19",
  bp_distance = 1000,
  munged = TRUE)

Hi @bschilder, thanks for the update. I can successfully run the example. However, if I add "LD_reference = "UKB" in fine map_loci(), I will get ModuleNotFoundError: No module named 'scipy' at Step 2 ▶▶▶ Extract Linkage Disequilibrium and other steps won't be finished. Could you please let me know how to fix it? Thanks!

PhoebeGuo97 added the bug Something isn't working label Sep 16, 2022

bschilder self-assigned this Sep 20, 2022

bschilder added this to 🦇🦇 echoverse 🦇🦇 Sep 20, 2022

bschilder moved this to Todo in 🦇🦇 echoverse 🦇🦇 Sep 20, 2022

bschilder moved this from Todo to In Progress in 🦇🦇 echoverse 🦇🦇 Oct 29, 2022

bschilder closed this as completed Nov 1, 2022

Repository owner moved this from In Progress to Done in 🦇🦇 echoverse 🦇🦇 Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in standardize_subset #109

Error in standardize_subset #109

PhoebeGuo97 commented Sep 16, 2022

bschilder commented Sep 20, 2022

PhoebeGuo97 commented Sep 22, 2022

bschilder commented Oct 29, 2022

bschilder commented Oct 29, 2022 •

edited

Loading

PhoebeGuo97 commented Nov 3, 2022

Error in standardize_subset #109

Error in standardize_subset #109

Comments

PhoebeGuo97 commented Sep 16, 2022

1. Bug description

Expected behaviour

2. Reproducible example

Code

Console output

Data

3. Session info

bschilder commented Sep 20, 2022

PhoebeGuo97 commented Sep 22, 2022

bschilder commented Oct 29, 2022

bschilder commented Oct 29, 2022 • edited Loading

PhoebeGuo97 commented Nov 3, 2022

bschilder commented Oct 29, 2022 •

edited

Loading