Skip to content

stelar-eu/klms-deploy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository provides instructions, documentation, and examples regarding deployment of the Knowledge Lake Management System (KLMS) developed by the STELAR project. The STELAR KLMS supports and facilitates a holistic approach for FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready (high-quality, reliably labeled) data. It allows to (semi-)automatically turn a raw data lake into a knowledge lake by: (a) enhancing the data lake with a knowledge layer; and (b) developing and integrating a set of data management tools and workflows. The knowledge layer comprises: (a) a data catalog that offers automatically enhanced metadata for the raw data assets in the lake; and (b) a knowledge graph that semantically describes and interlinks these data assets using suitable domain ontologies and vocabularies. The provided STELAR tools and workflows offer novel functionalities for: (a) data discovery and quality management; (b) data linking and alignment, and (c) data annotation and synthetic data generation.

alt text

KLMS core components

  • Keycloak is used for Identity and Access Management;

  • Data Catalog of datasets in KLMS, deployed as a CKAN site. Metadata about published datasets (i.e., CKAN packages and resources) is stored in a PostgreSQL database.

  • A Knowledge Graph is deployed via Ontop, employing mappings from the database to a virtual RDF graph according to the KLMS ontology.

  • MinIO serves as a storage layer for the files in the data lake.

  • Stelar Operator necessary to design and implement workflows inside the STELAR KLMS using the Apache Airflow workflow engine.

  • An instance of MLFlow maintains metadata regarding all executions in the same PostgreSQL database (the one also used by Data Catalog).

  • Dashboards offer a quick overview about datasets, workflows and tasks managed by the KLMS.

  • A RESTful Data API is used for managing and searching resources in the KLMS.

The STELAR KLMS supports two alternative workflow engines:

  • In its Community Edition, it supports Apache Airflow, which is a very popular open-source platform for this purpose.

  • In its Professional and Enterprise editions, it supports the RapidMiner Studio & AI Hub, which is a widely used commercial platform for machine learning and data science workflows.

KLMS tools

Examples

  • Orchestration of several KLMS components for entity extraction and linking over unstructured food safety data employing Airflow workflow engine and the Data API for publishing and searching in the Data Catalog.

License

The contents of this project are licensed under the GPL-2.0 license.

About

Deployment related code and configuration artifacts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •