This repository was started before Heng Li wrote his article "Fast high-level programming languages", which contains a native Nim implementation (see klib below), which is just as fast as the implementation here (depending on whether you reuse memory or not) and could simply be used instead.
A Nim wrapper for Heng Li's kseq/readfq, an efficient and fast parser for FastQ and Fasta files. nimreadfq supports reading of FastQ and Fasta files from stdin (use "-"), gzipped or flat files and is fast (see benchmark below).
The main function is readFQ()
, an iterator that yields FQRecord(s)
. An alternative is readFQPtr()
, which returns FQRecordPtr(s)
. The difference is that the latter uses ptr char
instead of strings and is thus potentially faster but memory is reused during iterations.
See example.nim
and tests/tester.nim
for code examples.
The initial Nim integration (and hard work) was done by Haibao Tang as part of his bio-pipeline repo. Haibao generously granted full rights to his code base, after which I started this separate package called nimreadfq for integration into nimble.
nimreadfq is significantly faster than packages with similar functionality. Below are example timings for reading 5,682,010 sequences from M_abscessus_HiSeq.fq
(source; see also ./benchmark/get_fq.sh
) run on my MacBook Pro 2019:
fastq:
- readfqPtr: 2.3s
- klib: 7.0s
- readfq: 7.6s
- fastx: 39.6s
- bioseq: 42.1s
fastq.gz:
- readfq gz: 15.6s
- klib gz: 15.8s
- bioseq gz: 150.0s
How to reproduce results:
cd ./benchmark
nimble build
./benchmark