Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running past the end of a BED #8

Open
rulixxx opened this issue Nov 17, 2021 · 1 comment
Open

Running past the end of a BED #8

rulixxx opened this issue Nov 17, 2021 · 1 comment

Comments

@rulixxx
Copy link

rulixxx commented Nov 17, 2021

Expected Behavior

Terminate correctly when iterating over a BED file for intersecting intervals.

Current Behavior

Error caused by trying to read pass the end of the stream.

Possible Solution / Implementation

This worked for me:

Added an extra condition in the loop of function Indexes.done

function Indexes.done(iter::Indexes.TabixOverlapIterator, state)
    buffer = BioGenerics.IO.stream(iter.reader)
    source = buffer.stream
    if state.chunkid == 0
        if isempty(state.chunks)
            return true
        end
        state.chunkid += 1
        seek(source, state.chunks[state.chunkid].start)
    end
    while state.chunkid ≤ lastindex(state.chunks)
        chunk = state.chunks[state.chunkid]
        # The `virtualoffset(source)` is not synchronized with the current reading position because data are buffered in `buffer` for parsing text.
        # So we need to check not only `virtualoffset` but also `nb_available`, which returns the current buffered data size.
        while !eof(iter.reader.state.stream) && (bytesavailable(buffer) > 0 || BGZFStreams.virtualoffset(source) < chunk.stop)
            read!(iter.reader, state.record)
            c = Indexes.icmp(state.record, iter.interval)
            if c == 0  # overlapping
                return false
            elseif c > 0
                # no more overlapping records in this chunk
                break
            end
        end
        state.chunkid += 1
        if state.chunkid ≤ lastindex(state.chunks)
            seek(source, state.chunks[state.chunkid].start)
        end
    end
    # no more overlapping records
    return true
end

Steps to Reproduce (for bugs)

Sorry I encountered this sometime ago so I no longer have the BED files. Might have been brought about when working with concatenated bgzipped files.

@CiaranOMara
Copy link
Member

Thanks for this report.

If it were bgzipped files, there is a known issue that affects the calculation of the virtual offset. The issue occurs when multiple threads are in use.

I think this issue will be addressed upstream with BioJulia/BGZFStreams.jl#27.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants