-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: scanTabix: '4' not present in tabix index
#33
Comments
I find that rtracklayer::import can import your data. As you note, the seqlevelsStyle is UCSC. A GRanges without this style can never succeed. After
|
For scanTabix, with a correct seqlevelsStyle in the query,
I don't understand that. |
This is from samtools / htslib with an update to the Bioconductor version required; see #32 (comment) |
A long due update! See Bioconductor/Rhtslib#4 and #8 (comment). Should we make this a priority for BioC 3.16? |
Getting this error as well now with VCFs from the 1000 Genomes Project, which I previously didn't have any issues with.` This seems to have the effect of breaking Here are several examples that currently work to varying degrees. target_path <- file.path(
"ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/",
"ALL.chr4.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz")
param <- GenomicRanges::GRanges("4:14737349-16737284")
## This produces the error message but successfully returns the header
header <- Rsamtools::headerTabix(file = target_path)
## This does NOT produce the error message and successfully returns the header
header <- VariantAnnotation::scanVcfHeader(file = target_path)
## These both produce the error message but return only 468 variants from all of chrom 4
## Was previously returning many many thousands of variants.
vcf <- VariantAnnotation::readVcf(target_path)
vcf <- Rsamtools::scanTabix(file = target_path)
## Produces the same "Read block operation failed" error message as the other two methods,
## but then fails with an error in R, thus returning no output:
### Error in read.table(con, sep = "\t", ...) :
### incomplete final line found by readTableHeader on 'gzcon(ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521//ALL.chr4.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz)'
tbx <- Rsamtools::TabixFile(target_path)
out <- rtracklayer::import(tbx)
## These both produce the error message but return 0 variants
## Was previously returning 24,376 variants.
vcf <- VariantAnnotation::readVcf(target_path, param=param)
## Warning: Take a very long time
vcf <- Rsamtools::scanTabix(file = target_path, param=param)
## Produces the same error message (3 times) but return 0 variants,
## and then throws an error indicating that there's no input to parse.
tbx <- Rsamtools::TabixFile(target_path)
out <- rtracklayer::import(tbx, which=param) Session info
|
I'll look into this more closely; I also remember the 1000 genomes VCFs working. Unrelated, but |
thanks @mtmorgan. ah, hadn't thought of that with URLs, i'll be more careful with those in the future. |
Just a heads up, this also affects I've updated the reprex above to demonstrate that the same error occurs with this method. |
Interestingly, Tagging the seqminer reprex target_path <- paste(
"ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/",
"ALL.chr4.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz",
sep="/")
out <- seqminer::tabix.read.table(target_path, tabixRange = "4:14737349-16737284")
dim(out)
## [1] 28228 1101 Session info
|
@bschilder would you consider forking the relevant Bioconductor packages and adding unit tests that exhibit the problems you have identified, and adding these unit tests that will fail until these conditions are fixed? |
@lawremi |
I'm afraid this is a bit more than I can commit to atm, but please feel free to use the example I provided above. I think that should contain all of the information you need. @vjcitn |
Just checking in, has there been progress on fixing this? @hpages @mtmorgan |
Progress is being made but we need to release 3.15 of the whole ecosystem. After that I will try to deal with this. |
An update on this: This is done for Linux and Mac (see Bioconductor/Rhtslib#4 (comment)). We still need to make sure things work properly on Windows. |
thanks for the update @hpages Updating
|
VariantAnnotation needs to be re-installed (it calls Rsamtools' C code from C code). It has had a version bump so should be installable via BiocManager either later today or later on Sunday, all being well... It looks like Rhtslib is available via BiocManager https://bioconductor.org/packages/3.16/bioc/html/Rhtslib.html. Packages need to be installed in the correct order (which BiocManager::install() takes care of, once updated versions have successfully propagated...) ... first Rhtslib then Rsamtools then VariantAnnotation. If you installed Rsamtools (using a previous version of Rhtslib), then Rhtslib, Rsamtools will be statically linked to the previous version of Rhtslib, which explains why you see the same behavior. If you install Rhtslib then Rsamtools but don't install VariantAnnotation, the readVcf() etc will result in a segfault because VariantAnnotation is expecting a different version of the Rsamtools C code. I'm not completely familiar with the macOS build system, but in general it is important that the same compiler and compiler settings are used for each library, so in general one would want to either install all from source, or all as binaries. |
The latest Rsamtools (2.13.2) was updated to work with the new Rhtslib (based on htslib 1.15.1). It is now available in BioC 3.16 (current devel) via Can someone confirm that this issue is gone with Rsamtools 2.13.2? We want to make sure that this is tested on Windows before we close. Thanks! H. |
I confirm that
from #33 (comment) succeeds with
|
Excellent. Thanks Vince! |
Hello,
So I seem to be having some issues with querying remote tabix files (e.g from ENCODE). Though I'm not sure if this is strictly related to the file being remote, or some other difference in how the file is formatted.
Reprex
Main example
Extended examples
Session info
The text was updated successfully, but these errors were encountered: