- Fix for handling missing metadata keys #223. Thanks @andrewdavidsmith
- Fix for handling ENA urls for paired end data
- Fix for handling ENA urls
- Migrated to pyproject.toml
- Add support for Biosamples and bioproject #199
- Use retmode xml for Geo search #200
- Documentation fixes
- Fix for [gse-to-srp] returning unrequested GSEs #186
- Fix for [download] using [public_urls]
- Fix for [gsm-to-srx] returning false positives #165
- Fix for delimiter not being consistent when metadata is printed on terminal #147
- ENA search is currently broken because of an API change
- Fix for [gse-to-srp] to handle cases where a project is missing but SRXs are returned #186
- Fix gse-to-gsm #187
- Fix for [pysradb download] - using [public_url]
- Fix for SRX -> SRR and related conversions #183
- BREAKING change: Overhaul of how urls and associated metadata are returned (not backward compatible); all column names are lower cased by default
- Fix extra space in "organism_taxid" column
- Added support for Experiment attributes #89
- Fix ENA fastq fetching #163
- Fix for fetching alternative URLs
- Added ability to fetch alternative URLs (GCP/AWS) for metadata #161
- Fix for xmldict 0.13.0 no longer defaulting to OrderedDict #159
- Fix for missing experiment model and description in metadata #160
- Add [study_title] to [--detailed] flag (#152)
- Fix [KeyError] in [metadata] where some new IDs do not have any metadata (#151)
- Do not exit if a qeury returns no hits (#149)
- Fixed [gsm-to-gse] failure (#128)
- Fixed case sensitivity bug for ENA search (#144)
- Fixed publication date bug for search (#146)
- Added support for downloading data from GEO [pysradb dowload -g GSE] (#129)
- Dropped Python 3.6 since pandas 1.2 is not supported
- Retired
metadb
andSRAdb
based search through CLI - everything defaults toSRAweb
SRAweb
now supports search- [N/A] is now replaced with [pd.NA]
- Two new fields in `--detailed`: [instrument_model] and [instrument_model_desc] #75
- Updated documentation
- [library_layout] is now outputted in metadata #56
- [-detailed] unifies columns for ENA fastq links instead of appending _x/_y #59
- bugfix for parsing namespace in xml outputs #65
- XML errors from NCBI are now handled more gracefully #69
- Documentation and dependency updates
- [pysradb download] now supports multiple threads for paralle downloads
- [pysradb download] also supports ultra fast downloads of FASTQs from ENA using aspera-client
- Added test cases for SRAweb
- API limit exceeding errors are automagically handled
- Bug fixes for GSE <=> SRR
- Bug fix for metadata - supports multiple SRPs
Contributors
- Dibya Gautam
- Marius van den Beek
- Bug fix: Handle API-rate limit exceeding => Retries
- Enhancement: 'Alternatives' URLs are now part of [--detailed]
- Bug fix: Handle Python3.6 for capture_output in subprocess.run
- All the subcommands (srx-to-srr, srx-to-srs) will now print additional columns where the first two columns represent the relevant conversion
- Fixed a bug where for fetching entries with single efetch record
- Major fix: some SRRs would go missing as the experiment dict was being created only once per SRR (See #15)
- Features: More detailed metadata by default in the SRAweb mode
- See notebook: https://colab.research.google.com/drive/1C60V-
- Feature: instrument, run size and total spots are now printed in the metadata by default (SRAweb mode only)
- Issue: Fixed an issue with srapath failing on SRP. srapath is now run on individual SRRs.
- Introduced [SRAweb] to perform queries over the web if the SQLite is missing or does not contain the relevant record.
- This release completely changes the command line interface replacing click with argparse (#3)
- Removed Python 2 comptaible stale code
- `srr-to-gsm`: convert SRR to GSM
- SRAmetadb.sqlite.gz file is deleted by default after extraction
- When SRAmetadb is not found a confirmation is seeked before downloading
- Confirmation option before SRA downloads
- download() works with wget
- [--out_dir] is now [out-dir]
Important: Python2 is no longer supported. Please consider moving to Python3.
- Included docs in the index whihch were missed out in the previous release
- `gsm-to-srr`: convert GSM to SRR
- `gsm-to-srx`: convert GSM to SRX
- `gsm-to-gse`: convert GSM to GSE
The following commad line options have been renamed and the changes are not compatible with 0.6.0 release:
- [sra-metadata] -> [metadata].
- [sra-search] -> [search].
- [srametadb] -> [metadb].
- Fixed bugs introduced in 0.5.0 with API changes where multiple redundant columns were output in [sra-metadata]
- [download] now allows piped inputs
- Support for filtering by SRX Id for SRA downloads.
- `srr_to_srx`: Convert SRR to SRX/SRP
- `srp_to_srx`: Convert SRP to SRX
- Stripped down [sra-metadata] to give minimal information
- Added [--assay], [--desc], [--detailed] flag for [sra-metadata]
- Improved table printing on terminal
- Fixed unicode error in tests for Python2
- Added a new [BASEdb] class to handle common database connections
- Initial support for GEOmetadb through GEOdb class
- Initial support or a command line interface:
- download Download SRA project (SRPnnnn)
- gse-metadata Fetch metadata for GEO ID (GSEnnnn)
- gse-to-gsm Get GSM(s) for GSE
- gsm-metadata Fetch metadata for GSM ID (GSMnnnn)
- sra-metadata Fetch metadata for SRA project (SRPnnnn)
- Added three separate notebooks for SRAdb, GEOdb, CLI usage
- [sample_attribute] and [experiment_attribute] are now included by default in the df returned by [sra_metadata()]
- [expand_sample_attribute_columns: expand metadata dataframe based on attributes in `sample_attribute] column
- New methods to guess cell/tissue/strain: [guess_cell_type()]/[guess_tissue_type()]/[guess_strain_type()]
- Improved README and usage instructions
- [search_sra()] allows full text search on SRA metadata.
The following methods have been renamed and the changes are not compatible with 0.1.0 release:
- [get_query()] -> [query()].
- [sra_convert()] -> [sra_metadata()].
- [get_table_counts()] -> [all_row_counts()].
- [download_sradb_file()] makes fetching [SRAmetadb.sqlite] file easy; wget is no longer required.
- [ftp] protocol is now supported besides [fsp] and hence [aspera-client] is now optional. We however, strongly recommend [aspera-client] for faster downloads.
- Silenced [SettingWithCopyWarning] by excplicitly doing operations on a copy of the dataframe instead of the original.
Besides these, all methods now follow a [numpydoc] compatible documentation.
- First release on PyPI.