Skip to content

Latest commit

 

History

History
93 lines (62 loc) · 4.04 KB

README.md

File metadata and controls

93 lines (62 loc) · 4.04 KB

SSH Open Marketplace Data Library

This repository contains a Python library to download and process the SSH Open Marketplace dataset, and a set of notebooks providing examples and use cases to use this library.

The libary has been designed to be used by the SSH Open Marketplace Editorial Team and provides a set of ad hoc functions that can be used in Python Notebooks or programs. The various notebooks included in this repository allow any user to gain an overview of the SSH Open Marketplace (notebook 2) and authenticated users to write back to the SSH Open Marketplace specific curation information. See the SSH Open Marketplace user documentation for more details.

Usage

To use the library functionalities:

A - Create an instance of mplib.MPData, and load locally the MP data. The function:

getMPItems (category: str, local: boolean) -> DataFrame

downloads MP dataset and store it locally. The data is provided as a Data Frame i.e. data is organized in a tabular fashion and columns are labeled with the names of the attribites in MP datamodel.

Example:

 from sshmarketplacelib import MPData as mpd

 mpdata = mpd()
 ts_df=mpdata.getMPItems ("pubblications", True)

the data is returned as a Data Frame:

id category label persistentId lastInfoUpdate status description contributors properties externalIds
10414 publication 3D-ICONS -- 3D Digitisation of Icons of Europe... jOum8c 2021-06-23T17:03:55+0000 approved 3D-ICONS was a pilot project funded under the ... [] [{'id': 41261, 'type': {'code': 'language', 'l... []
7454 publication 4 Default Text Structure - The TEI Guidelines Y3Vmhy 2021-06-22T13:30:43+0000 approved No description provided. [] [{'id': 41094, 'type': {'code': 'language', 'l... []
10738 publication 9 Dictionaries - The TEI Guidelines vQ7Bvs 2021-06-23T17:04:34+0000 approved No description provided. [] [{'id': 41163, 'type': {'code': 'language', 'l... []

B - Use the helper functions to analyse the Market Place data, for example the function below returns the number of null values for all propertes in each item category:

getNullValues()-> DataFrame

Example:

 from sshmarketplacelib import helper as hlpr

 utils = hlpr.Util()
 nv_df=utils.getNullValues()

Returns:

category
property: missed values
datasetpublicationtool-or-service training-material workflow
accessibleAt 1 7 475 14 1
composedOf 305 137 1671 321 0
concept.candidate 46 5 157 0 0
..................

The notebook LibTest.ipynb shows how to use the Library in a notebook.

The complete documentation is being created... [TBD]

Installation

It is recommended to install library in a virtual environment to avoid dependency clash. To install the library enter cloned directory and install it via pip with explicit requirements.txt from the project:

  • Clone the repository, enter the directory and install requirements:
git clone https://github.com/SSHOC/marketplace-curation.git
cd marketplace-curation
pip install ./ -r ./requirements.txt
  • Edit the config.yaml.template file and set the values, then rename the file as config.yaml

  • Create a folder called 'data' in the same folder of your notebooks/programs