Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in standardize_subset #109

Closed
PhoebeGuo97 opened this issue Sep 16, 2022 · 5 comments
Closed

Error in standardize_subset #109

PhoebeGuo97 opened this issue Sep 16, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@PhoebeGuo97
Copy link

1. Bug description

Due to an issue as described in https://github.com/neurogenomics/MungeSumstats/issues/117, I ran MSS with tabix_index=FALSE and then ran tabix.createIndex(fullSS_path). After that, running finemap_loci got an error: Error in standardize_subset(locus = locus, top_SNPs = top_SNPs, fullSS_genome_build = fullSS_genome_build, :
Could not find any rows in full data that matched query :(

Any help will be appreciated!

Expected behaviour

(A clear and concise description of what you expected to happen.)

2. Reproducible example

Code

fullSS_path <- "/mnt/belinda_local/fangfei/data/Jansen2019/AD_sumstats_Jansenetal_2019sept.txt.gz"
fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path, ref_genome = "GRCH37",tabix_index=FALSE,force_new=TRUE, save_path='/mnt/belinda_local/fangfei/data/Jansen2019/formatted_Jansen2019.tsv.gz')

tabix.createIndex(fullSS_path)

Jansen_2019.results <- finemap_loci(
  # GENERAL ARGUMENTS
  top_SNPs = top_SNPs,
  #  It's best to give absolute paths
  results_dir = "/mnt/belinda_local/fangfei/data/echolocatoR",
  loci = top_SNPs$Locus,
  dataset_name = "Jansen_2019",
  dataset_type = "GWAS",
  force_new_subset = FALSE,
  force_new_LD = FALSE,
  force_new_finemap = TRUE,
  remove_tmps = FALSE,
  
  # Munge full sumstats first
  munged = TRUE,
  
  # SUMMARY STATS ARGUMENTS
  fullSS_path = fullSS_path,
  fullSS_genome_build = "hg19",
  query_by ="tabix",
  MAF_col = "EAF",
  effect_col = "BETA",
  stderr_col = "SE",
  sample_size = "Nsum",
  N_cases_col = "Neff",
  A1_col = "A1",
  A2_col = "A2",
  
  # FILTERING ARGUMENTS
  ## It's often desirable to use a larger window size
  ## (e.g. 2Mb which is bp_distance=500000*2),
  ## but we use a small window here to speed up the process.
  bp_distance = 10000,#500000*2,
  min_MAF = 0.001,
  trim_gene_limits = FALSE,
  
  # FINE-MAPPING ARGUMENTS
  ## General
  finemap_methods = c("ABF","FINEMAP","SUSIE","POLYFUN_SUSIE"),
  n_causal = 5,
  PP_threshold = .95,
  
  # LD ARGUMENTS
  LD_reference = "UKB",#"UKB",
  #superpopulation = "EUR",
  download_method = "axel",
  
  # PLOT ARGUMENTS
  ## general
  plot.types = c("simple"),
  ## Generate multiple plots of different window sizes;
  ### all SNPs, 4x zoomed-in, and a 50000bp window
  plot.zoom = c("all","4x","10x"),
  ## XGR
  # plot.XGR_libnames=c("ENCODE_TFBS_ClusteredV3_CellTypes"),
  ## Roadmap
  plot.Roadmap = FALSE,
  plot.Roadmap_query = NULL,
  # Nott et al. (2019)
  plot.Nott_epigenome = F,
  plot.Nott_show_placseq = F,
  
  verbose = TRUE
)

Console output

[1] "+ CONDA:: 'echoR' conda environment not found. Using default 'base' instead."
[1] "Checking for tabix installation..."
[1] "Checking for bcftools installation..."
[1] "+ Assuming sumstats have already been processed with MungeSumstats"

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
1 (1 / 29)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "+ Extracting relevant variants from fullSS..."
[1] "+ Query Method: tabix"
[1] "+ QUERY: Chromosome = 1 ; Min position = 161145392 ; Max position = 161165392"
[1] "TABIX:: Copying existing tabix file ==> /mnt/belinda_local/fangfei/data/Jansen2019/formatted_Jansen2019.tsv.gz"
[1] "Determining chrom type from file header"
[1] "Chromosome format = 1"
[1] "TABIX:: Extracting subset of sum stats"
[1] "+ TABIX:: Returning 0 x 0 data.table"
[1] "++ Saving query ==> /mnt/belinda_local/fangfei/data/echolocatoR/GWAS/Jansen_2019/1/1_Jansen_2019_subset.tsv.gz"
[1] "LD:: Standardizing summary statistics subset."
Error in standardize_subset(locus = locus, top_SNPs = top_SNPs, fullSS_genome_build = fullSS_genome_build,  : 
  Could not find any rows in full data that matched query :(
In addition: Warning messages:
1: In data.table::fwrite(dat, file = subset_path, nThread = nThread,  :
  Input has no columns; creating an empty file at '/mnt/belinda_local/fangfei/data/echolocatoR/GWAS/Jansen_2019/1/1_Jansen_2019_subset.tsv.gz' and exiting.
2: In data.table::fread(subset_path, nrows = 2) :
  File '/mnt/belinda_local/fangfei/data/echolocatoR/GWAS/Jansen_2019/1/1_Jansen_2019_subset.tsv.gz' has size 0. Returning a NULL data.table.

Data

https://ctg.cncr.nl/documents/p1651/AD_sumstats_Jansenetal_2019sept.txt.gz

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] seqminer_8.4        GenomeInfoDb_1.33.7 IRanges_2.31.2      S4Vectors_0.35.3    BiocGenerics_0.43.4

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.26             R.utils_2.12.0              tidyselect_1.1.2           
  [5] RSQLite_2.2.17              AnnotationDbi_1.59.1        htmlwidgets_1.5.4           grid_4.2.1                 
  [9] BiocParallel_1.31.12        XGR_1.1.7                   munsell_0.5.0               codetools_0.2-18           
 [13] interp_1.1-3                DT_0.25                     colorspace_2.0-3            OrganismDbi_1.39.1         
 [17] Biobase_2.57.1              filelock_1.0.2              knitr_1.40                  supraHex_1.35.0            
 [21] rstudioapi_0.14             DescTools_0.99.46           MatrixGenerics_1.9.1        GenomeInfoDbData_1.2.8     
 [25] mixsqp_0.3-43               bit64_4.0.5                 echoconda_0.99.6            basilisk_1.9.10            
 [29] vctrs_0.4.1                 generics_0.1.3              xfun_0.33                   biovizBase_1.45.0          
 [33] BiocFileCache_2.5.0         R6_2.5.1                    AnnotationFilter_1.21.0     bitops_1.0-7               
 [37] cachem_1.0.6                reshape_0.8.9               DelayedArray_0.23.2         assertthat_0.2.1           
 [41] BiocIO_1.7.1                scales_1.2.1                nnet_7.3-17                 rootSolve_1.8.2.3          
 [45] gtable_0.3.1                lmom_2.9                    ggbio_1.45.0                ensembldb_2.21.4           
 [49] rlang_1.0.5                 MungeSumstats_1.5.13        echodata_0.99.12            splines_4.2.1              
 [53] lazyeval_0.2.2              rtracklayer_1.57.0          gargle_1.2.1                dichromat_2.0-0.1          
 [57] hexbin_1.28.2               checkmate_2.1.0             BiocManager_1.30.18         yaml_2.3.5                 
 [61] reshape2_1.4.4              snpStats_1.47.1             GenomicFeatures_1.49.6      ggnetwork_0.5.10           
 [65] backports_1.4.1             Hmisc_4.7-1                 RBGL_1.73.0                 tools_4.2.1                
 [69] echoplot_0.99.3             ggplot2_3.3.6               ellipsis_0.3.2              RColorBrewer_1.1-3         
 [73] proxy_0.4-27                coloc_5.2.0                 Rcpp_1.0.9                  plyr_1.8.7                 
 [77] base64enc_0.1-3             progress_1.2.2              zlibbioc_1.43.0             purrr_0.3.4                
 [81] RCurl_1.98-1.8              basilisk.utils_1.9.3        prettyunits_1.1.1           rpart_4.1.16               
 [85] deldir_1.0-6                viridis_0.6.2               SummarizedExperiment_1.27.3 ggrepel_0.9.1              
 [89] cluster_2.1.4               fs_1.5.2                    magrittr_2.0.3              data.table_1.14.2          
 [93] echotabix_0.99.7            dnet_1.1.7                  openxlsx_4.2.5              mvtnorm_1.1-3              
 [97] ProtGenerics_1.29.0         matrixStats_0.62.0          patchwork_1.1.2             hms_1.1.2                  
[101] XML_3.99-0.10               echolocatoR_2.0.0           jpeg_0.1-9                  readxl_1.4.1               
[105] gridExtra_2.3               compiler_4.2.1              biomaRt_2.53.2              tibble_3.1.8               
[109] crayon_1.5.1                R.oo_1.25.0                 htmltools_0.5.3             echoannot_0.99.7           
[113] tzdb_0.3.0                  Formula_1.2-4               tidyr_1.2.1                 expm_0.999-6               
[117] Exact_3.1                   DBI_1.1.3                   dbplyr_2.2.1                MASS_7.3-58.1              
[121] rappdirs_0.3.3              boot_1.3-28                 Matrix_1.5-1                readr_2.1.2                
[125] piggyback_0.1.4             cli_3.4.0                   R.methodsS3_1.8.2           echofinemap_0.99.2         
[129] parallel_4.2.1              igraph_1.3.4                GenomicRanges_1.49.1        pkgconfig_2.0.3            
[133] GenomicAlignments_1.33.1    dir.expiry_1.5.1            RCircos_1.2.2               foreign_0.8-82             
[137] xml2_1.3.3                  XVector_0.37.1              echoLD_0.99.6               stringr_1.4.1              
[141] VariantAnnotation_1.43.3    digest_0.6.29               graph_1.75.0                Biostrings_2.65.6          
[145] cellranger_1.1.0            htmlTable_2.4.1             gld_2.6.5                   restfulr_0.0.15            
[149] curl_4.3.2                  Rsamtools_2.13.4            rjson_0.2.21                lifecycle_1.0.2            
[153] nlme_3.1-159                jsonlite_1.8.0              viridisLite_0.4.1           BSgenome_1.65.2            
[157] fansi_1.0.3                 susieR_0.12.27              downloadR_0.99.3            pillar_1.8.1               
[161] lattice_0.20-45             GGally_2.1.2                KEGGREST_1.37.3             fastmap_1.1.0              
[165] httr_1.4.4                  survival_3.4-0              googleAuthR_2.0.0           glue_1.6.2                 
[169] remotes_2.4.2               zip_2.2.1                   png_0.1-7                   bit_4.0.4                  
[173] Rgraphviz_2.41.1            class_7.3-20                stringi_1.7.8               blob_1.2.3                 
[177] latticeExtra_0.6-30         memoise_2.0.1               dplyr_1.0.10                irlba_2.3.5                
[181] e1071_1.7-11                ape_5.6-2

@PhoebeGuo97 PhoebeGuo97 added the bug Something isn't working label Sep 16, 2022
@bschilder bschilder self-assigned this Sep 20, 2022
@bschilder
Copy link
Member

Hi @PhoebeGuo97, it looks like you're following the documentation from echolocatoR 1.0 with echolocatoR 2.0. Try running the example at the bottom of the help page that appears when you use: ?echolocatoR::finemap_loci

I realize this is a bit confusing as the documentation website has not yet been updated (i have to get it passing all GitHub Actions checks first). So in the meantime, please follow the documentation via the ?<function_name> help pages instead.

@PhoebeGuo97
Copy link
Author

Hi @bschilder, I tried using echolocatoR 2.0 and running the example. I got a new error: Error: object 'find_executables_remote' is not exported by 'namespace:echoconda'.

@bschilder
Copy link
Member

Hi @bschilder, I tried using echolocatoR 2.0 and running the example. I got a new error: Error: object 'find_executables_remote' is not exported by 'namespace:echoconda'.

Hi @PhoebeGuo97, sorry for the delay. This was due to a update in echoconda where the function name had changed. Can you try updating all the echoverse packages and try again?

devtools::install_github("RajLabMSSM/echolocatoR", dependencies = TRUE)

Best,
Brian

@bschilder
Copy link
Member

bschilder commented Oct 29, 2022

After reinstalling, can you try the following example @PhoebeGuo97 ? This seems to work for me.

#### Munge sumstats ####
fullSS_path <- MungeSumstats::format_sumstats(path = "https://ctg.cncr.nl/documents/p1651/AD_sumstats_Jansenetal_2019sept.txt.gz",
                                              ref_genome = "GRCH37",
                                              tabix_index=FALSE,
                                              force_new=TRUE)
#### Prepare topSNPs dataframe ####
topSNPs <- echodata::import_topSNPs(topSS = "https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0311-9/MediaObjects/41588_2018_311_MOESM3_ESM.xlsx",
                                    sheet = "Table S2",
                                    startRow=5,
                                    cols=1:13,
                                    munge = TRUE)
#### Run echolocatoR #### 
res <- echolocatoR::finemap_loci(
  fullSS_path = fullSS_path,
  topSNPs = topSNPs[1:2,], # test just the first 2 loci
  finemap_methods = c("ABF","FINEMAP","SUSIE"),
  dataset_name = "Jansen2019",
  fullSS_genome_build = "hg19",
  bp_distance = 1000,
  munged = TRUE) 

@bschilder bschilder moved this from Todo to In Progress in 🦇🦇 echoverse 🦇🦇 Oct 29, 2022
Repository owner moved this from In Progress to Done in 🦇🦇 echoverse 🦇🦇 Nov 1, 2022
@PhoebeGuo97
Copy link
Author

After reinstalling, can you try the following example @PhoebeGuo97 ? This seems to work for me.

#### Munge sumstats ####
fullSS_path <- MungeSumstats::format_sumstats(path = "https://ctg.cncr.nl/documents/p1651/AD_sumstats_Jansenetal_2019sept.txt.gz",
                                              ref_genome = "GRCH37",
                                              tabix_index=FALSE,
                                              force_new=TRUE)
#### Prepare topSNPs dataframe ####
topSNPs <- echodata::import_topSNPs(topSS = "https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0311-9/MediaObjects/41588_2018_311_MOESM3_ESM.xlsx",
                                    sheet = "Table S2",
                                    startRow=5,
                                    cols=1:13,
                                    munge = TRUE)
#### Run echolocatoR #### 
res <- echolocatoR::finemap_loci(
  fullSS_path = fullSS_path,
  topSNPs = topSNPs[1:2,], # test just the first 2 loci
  finemap_methods = c("ABF","FINEMAP","SUSIE"),
  dataset_name = "Jansen2019",
  fullSS_genome_build = "hg19",
  bp_distance = 1000,
  munged = TRUE) 

Hi @bschilder, thanks for the update. I can successfully run the example. However, if I add "LD_reference = "UKB" in fine map_loci(), I will get ModuleNotFoundError: No module named 'scipy' at Step 2 ▶▶▶ Extract Linkage Disequilibrium and other steps won't be finished. Could you please let me know how to fix it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

2 participants