-
Notifications
You must be signed in to change notification settings - Fork 28
Extending mzR
The mzR
R/Bioconductor package provides a unified API to the common open and community-driven file formats and parsers available for mass spectrometry data, namely mzXML
, mzML
and mzData
(see vignette for details). It uses C
and C++
code from other third party open-source projects and heavily relies on the Rcpp
package to, notably, provide a direct mapping from R
to C++
infrastructure.
Currently, mzR
provides two actual backends to read Mass Spectrometry raw data:
-
netCDF
which reads, as the name implies,netCDF
data -
RAMP
to readmzData
andmzXML
via the ISBRAMP
parser. This backend can also readmzML
through the proteowizardRAMPadapter
around the proteowizard infrastructure, but this interface is limited to the lowest common denominator between themzXML
/mzData
/mzML
formats.
This project is intended to add several related backends to mzR
, by providing a direct wrapper around -- and full access to -- the proteowizard msdata
object. The candidate will interact closely with Laurent Gatto and Steffen Neumann, and the proteowizard and Rcpp
communities.
The pwiz/mzML backend should be a drop-in replacement and pass unit tests also for the Bioconductor XCMS
and MSnbase
packages. Any XCMS
and MSnbase
modifications required will be done by Steffen Neumann and Laurent Gatto respectively. Secondly, the pwiz/mzML should provide access to the <chromatogram>
s stored in an mzML file (Martens et al. 2011).
The project also aims at facilitating access to identification data in the mzIdentML
data format (Jones et al. 2012) through the proteowizard framework. A similar backend, as currently available to raw mass spectrometry files (mzXML
, mzML
, mzData
), will be developed for mzIdentML
files.
At the end of the project, the candidate will be familiar with the major mass-spectrometry data formats and main MS toolkits used in proteomics and metabolomics. After successful completion of the project, the candidate will be added to the list of mzR
contributors.
- Difficulty: medium to difficult, depending on experience and
C++
fluency. - Skills needed: intermediate R programming, knowledge of package development helpful, good knowledge of
C
and especiallyC++
essential. The candidate will have to familiarise herself with the mass-spectrometry data, the respective data formats and the proteowizard code base. - Deliverable: pwiz and identificaiton backends to be added to the
mzR
package. - Mentors: Laurent Gatto and Steffen Neuman, with additional Rcpp support from Dirk Eddelbuettel.
- References: see project description.