All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- ✨
io.alphadia
: Read support for AlphaDIAprecursors.tsv
(#103 by @rodvrees) - ✨
io.fragpipe
: Read support for FragPipepsm.tsv
(#103 by @rodvrees) - ✨
io.diann
: Read support for DIA-NN TSV (#103 by @rodvrees)
- 💥
psm_list
: When returning a PSM property across the full PSMList (e.g.psm_list["peptidoform"]
),np.fromiter
is now used instead ofnp.array
. This fixes an issue where if all peptidoforms have the same length, a 3D array of parsed sequences (amino acids and modifications) was be returned instead of an array ofPeptidoform
object. However, this does mean that all resulting arrays will have theobject
dtype instead of the previously coerced dtypes. This might lead to issues downstream. (#102) - ➖
io.idxml
: Make pyOpenMS an optional dependency, working around OpenMS/OpenMS#7600 for now. Foridxml
support, install psm_utils with theidxml
extra dependencies. (#107 by @paretje)
- 🐛
io.pepxml
: Fix modification location and mass parsing. Position had an off-by-one error and the reported mass was the sum of the residue and modification instead of the modification alone. (fixes #100, #104)
io
: Fix Sage filename pattern for automatic file type inferenceio.flashlfq
: Fix writing PSMs without protein accessionio.flashlfq
: Fix column namesPeptide Monoisotopic Mass
andProtein Accession
.io.idxml
: Fix parsing if spectra file name not present #92
Peptidoform
: Addmodified_sequence
property to return the modified sequence in ProForma format, but without charge state.io
: Add support for reading and writing FlashLFQ generic TSV files.
io.percolator
: Fix and improve ScanNr inferring and writingio.percolator
: Infer style from file extension if not provided (enables dynamic style determination in, for instance,convert
function).
- Peptidoform: Allow comparison between a peptidoform and a peptidoform string; allow direct indexing with square brackets, which indexes or slices parsed_sequence (in #89)
- TSV: Avoid flooding logs when reading a different file format by raising exception when three consecutive rows could not be parsed (in #88)
io.xtandem
: Fix parsing PSMs and complete protein names in XTandem (by @julianu in #83)io.tsv
: Fix warning formatting when parsing TSV (by @paretje in #85)io
: Fix support for mzIdentML and pepXML files from Comet (by @paretje in #87)
io
: Read and write support for writing PSMs to Apache Parquet for efficient storage of PSM lists.io.sage
: Support for Sage results in Parquet format (newSageParquetReader
, renamedSageReader
toSageTSVReader
).
- Upgrade Pydantic dependency to v2. The PSM
spectrum_id
field is now always coerced to a string. io.proteoscape
: Use pyarrow to iteratively read from Parquet instead of first reading an entire dataframe with Pandas.io.sage
: Update compatibility to Sage v0.14- Remove temporary patch for caching Proforma modification resolvers (now in Pyteomics v4.7.2).
- Speed up mass calculation for large datasets by caching Proforma modification resolvers. Temporary patch until implemented in Pyteomics (see levitsky/pyteomics#147).
io.proteoscape
: Parse filename into PSMrun
field.
io.proteoscape
: Allow ProteoScapeReader instantiation from Pandas DataFrame and access PSM by index.
- Remove accidental print statement.
io.idxml
: Fixed parenthesis in type hint
io.idxml
: Filter OPENMS_DATA_PATH warnings (see compomics/ms2rescore#129 and OpenMS/OpenMS#7418)io.proteoscape
: Rename module from TIMScore to ProteoScape.io.proteoscape
: Use correct search engine score (x_corr_score
instead oftims_score
)
io.timscore
: Add support for TIMScore Parquet files.
- Fixed
_csv.Error: field larger than field limit (131072)
for very large fields when reading CSV-based PSM files. - Pinned Pyteomics version to avoid pickling issues in multithreading (to be investigated)
Peptidoform
: Support adding and applying global terminal modifications. For now using a workaround while waiting for official support and an implementation in Pyteomics. See HUPO-PSI/ProForma#6.
io.xtandem
: Parse double mass modifications as double modification instead of merging and summing mass shifts into a single modification.io.xtandem
: Avoid float formatting issues when parsing modification mass label.io.xtandem
: Parse all proteins intoprotein_list
instead of only the first one.io.tsv
: Log error instead of raising exception when a TSV row cannot be parsed.
io.xtandem
: Fixed bug when extracting run name (introduced in v0.7.0)
- Tests: Added tests for _format_number_as_string function
- Tests: Added more test cases for
peptidoform.rename_modifications
for mass modifications io.xtandem
: To parserun
value, fall back to PSM file name if run name cannot be parsed fromlabel
field
peptidoform.rename_modifications
: Fixed mapping of negative mass modificationsio.xtandem
: Fixed regular expression to parserun
value fom XMLlabel
fieldio.idxml
: Fix handling multiple types inrescoring_features
when writing (fixes #60)
io.idxml
: Write support for idXML files, including merging an existing idXML with aPSMList
🎉io.xtandem
: New argumentscore_key
to select which score to parse asPSM.score
.io.xtandem
: Parserun
name from X!Tandem PSM files- Docs: Add intersphinx links to other package documentation pages.
io.idxml
: Use pyOpenMS instead of Pyteomics for reading idXML (~5x faster⚡)
- Fix reading of pepXML files without RT
- Fixed Black formatting throughout project
io.pepxml
: Fix reading pepXML files without retention time information.
io
: Added newio.pepxml
reader
- Docs: Add ionbot to README.rst, fix order in API docs
Peptidoform
: Added support foriter()
andlen()
methodsPeptidoform
: Added support for initialization from apyteomics.proforma.ProForma
objectPSM
: Addprecursor_mz_error
propertyPSMList
: Added support forappend()
andextend()
methods.io
: Added newio.ionbot
readerio
: Added support for Proteome Discoverer MSF filesio.mzid
: Parse inverse reduced ion mobility from mzid files (e.g. from PEAKS)io.mzid
: Add support for user to define custom score keyio.mzid
: AddProteome Discoverer Delta Score
to known scores (with spaces, no colons)io.mzid
: Allow inconsistent presence of score in PSMs in a single mzid file
PSM
: Values of therescoring_features
dictionary are now coerced to floats- io: Raise
PSMUtilsIOException
when passed filetype is not known io
: Make io readerread_file
method inheritable (code cleanup)io.mzid
: Throw warning when no known score can be parsed from mzid file instead of errorio.mzid
: Move spectrum level parsing of rt and ion mobility to functionio.mzid
: GivePeptideShaker PSM score
priority over other potential search engine scores (required for correct PeptideShaker mzid parsing)io.percolator
: Add option to writePSMScore
andChargeN
as features to PIN file. Default is nowFalse
.- Formatting: Increase max line length to 99 (code formatting)
PSMList
: Fix issue wherepsm_list["protein_list"]
resulted in a Numpy error due to the inconsistent shape of the lists.io.tsv
: Throw more descriptivePSMUtilsIOException
when handeling tsv errorsio.msamanda
: Fix support for N/C-terminal modificationsio.Percolator.PercolatorTabWriter
: Allow rescoring features that are not infeature_names
(extrasaction
is now specified inDictWriter
)- Use raw strings for escape characters where needed
- Fix compatibility with sqlalchemy 2.0 (move of
declarative_base
) - online: Remove useless == True
- docs: Set newer
build>os
configuration for readthedocs.org - CI: Upgrade Github action versions
PSMList
: Revert comparison operator change from v0.4.0 that results in brokencalculate_qvalues()
method (E711; Numpy array, not singleton)
- Add + operator support for
PSMList
- Add utility functions for m/z-mass conversion in new module
psm_utils.utils
peptidoform
: CatchProFormaError
and reraisePeptidoformException
with invalid peptidoform in message
io.msamanda
: ChangedREQUIRED_COLUMNS
to include new features from the MS Amanda output CSV fileio.peptide_record
Catch theIndexError
when a modification has a position that is out of range for the peptide, and raise anInvalidPeprecModificationError
instead.- Rename optional dependency
doc
todocs
- Implement "raise from e" when applicable throughout package
- Added missing
io.msamanda
API docs
io.sage
: Changespectrum_fdr
tospectrum_q
(crf. lazear/sage#64).
- Add reader for Sage PSM files.
io.mzid
: Add reading/writing of PEP and q-values
psm
: The default values ofPSM.provenance_data
,PSM.metadata
andPSM.rescoring_features
are nowdict()
instead ofNone
.PSMList
: Also allow Numpy integers for indexing a single PSMio.mzid.MzidReader
: Attempt to parseretention time
orscan start time
cvParams from both SpectrumIdentificationResult as SpectrumIdentificationItem levels. Note that according to the mzIdentML specification document (v1.1.1) neither cvParams are expected to be present at either level.io.mzid.MzidReader
: Preferspectrum title
cvParam overspectrumID
attribute forPSM.spectrum_id
as these titles always match to the peak list files. In this case,spectrumID
is saved inmetadata["mzid_spectrum_id"]
. Fall back tospectrumID
ifspectrum title
is absent.io.mzid.MzidWriter
:PSM.retention_time
is now written as cvParamretention time
instead ofscan start time
, and to theSpectrumIdentificationItem
level instead of theSpectrumIdentificationResult
level, as theoretically in psm_utils, multiple PSMs for the same spectrum can have different values forretention_time
.io.mzid.MzidWriter
: Write PSM score as cvParamsearch engine specific score
instead of userParamscore
.io.percolator.PercolatorTabWriter
: For PIN-style files: UseSpecId
instead ofPSMId
and writePSMScore
andChargeN
columns by default.- Filter warnings from
psims.mzmlb
on import, asmzmlb
is not used
psm
: Fix missing qvalue and pep in docstringpeptidoform
: ProForma mass modifications are now correctly parsed within therename_modifications
function.io.maxquant.MSMSReader
: Correctly parse emptyProteins
column toNone
io.percolator.PercolatorTabReader
: Correctly parse Percolator peptidoform notation if no leading or trailing amino acids are present (e.g..ACDK.
instead ofK.ACDK.E
).io.percolator.PercolatorTabWriter
: ScanNr is now correctly written as an integer counting from the first PSM in the file.io.percolator.PercolatorTabWriter
: If no protein information is present, write the peptidoform preceded byPEP_
to the Proteins column.io.idxml
: Read metadata as stringsio.mzid.MzidReader
: SetPSM.retention_time
toNone
instead offloat('nan')
if missing from the PSM file.io.mzid
: Fix reading of file if charge is missingio.mzid
: Fix writing if protein_list is Noneio.mzid
: Consider allPeptideEvidence
entries for aSpectrumIdentificationItem
to determineis_decoy
io.mzid
: Fix handling of mzIdentML files whenis_decoy
field is not present (fixes #30)io.tsv
: RaisePSMUtilsIOException
with clear error message when TSVprotein_list
cannot be read
- Fix bug in
io._base_classes
(introduced in v0.2.2) - Fix bug in TSVReader for reading TSV files with empty protein_list
io.peptide_record
: Fix bug where provenance itemfilename
was not a string- Various minor fixes after linting
Peptidoform
: Addis_modified
property
io.mzid
: Fix issues when parsing Comet or MSAmanda-generated mzIdentML files and certain fields are missing.
PSM
: Addion_mobility
fieldPSMList
: Allow slicing with bool arrays (e.g.,psm_df[psm_df["retention_time"] < 2000]
)rename_modifications
: Add support for fixed modifications- Add example files
- Online: Add support for GZipped files
- Online: Add support for logarithmic score (e.g. e-values)
- Docs: Extend contributing with example contributions
- Docs: Add notes to
PSM.get_usi()
method - Docs: Extend quickstart on PSMList
- Docs: Add "psm_utils tags" for file formats, as used in high-level read/write/convert functions
- Docs: Peptide Record: add notes on unsupported modification types; add example for C-terminal modification
- Docs: More clearly document conversion to DataFrame
- Docs: Add bioconda install instructions
- Docs: Add citation for preprint
- Tests: Added tests for PSMList
set_ranks
andget_rank1_psms
methods
PSMList
: Refactorset_ranks
andget_rank1_psms
methods- Update
.vscode/settings.json
- Typing: Replace Union with OR operator
|
- Online: Use percentiles instead of randomly sampling for PP plot
- Docs: Force TOC-tree max depth
- Tests: Expand unit tests in general
PSMList
: Truncate__repr__
to first five entries only, avoiding crashing notebook outputPeptidoform
: Minor typing fixadd_fixed_modifications
: Allow input as dict as well as list of tuplesio
: Fix issue where theNamedTemporaryFile
for_supports_write_psm
was seen as invalid Percolator fileio.convert
: passprogressbar
argument to class, notwrite_file
io.mzid
: Add more supported MS-GF score names, make SpecEValue defaultio.peptide_record
:spec_id
is now a required column (spectrum_id
is also required in PSM)io.peptide_record
: Fix parsing of C-terminal modifications from proforma to peprecio.percolator
: Fix Percolator peptide notation writing (fixes #18)io.tsv
: Fix issue whereTSVReader
would not use string type for metadataio.xtandem
: Fix issue where optional arguments were not accepted byXTandemReader
io.xtandem
: Do not split spectrum title on spaceio.xtandem
: Fix issue where optional arguments were not accepted byXTandemReader
- Online: Fix pi-0 diagonal calculation
- Remove obsolete to do comments in code
- Initial version