Overview

This repository provides instructions, documentation, and examples regarding deployment of the Knowledge Lake Management System (KLMS) developed by the STELAR project. The STELAR KLMS supports and facilitates a holistic approach for FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready (high-quality, reliably labeled) data. It allows to (semi-)automatically turn a raw data lake into a knowledge lake by: (a) enhancing the data lake with a knowledge layer; and (b) developing and integrating a set of data management tools and workflows. The knowledge layer comprises: (a) a data catalog that offers automatically enhanced metadata for the raw data assets in the lake; and (b) a knowledge graph that semantically describes and interlinks these data assets using suitable domain ontologies and vocabularies. The provided STELAR tools and workflows offer novel functionalities for: (a) data discovery and quality management; (b) data linking and alignment, and (c) data annotation and synthetic data generation.

KLMS core components

Keycloak is used for Identity and Access Management;
Data Catalog of datasets in KLMS, deployed as a CKAN site. Metadata about published datasets (i.e., CKAN packages and resources) is stored in a PostgreSQL database.
A Knowledge Graph is deployed via Ontop, employing mappings from the database to a virtual RDF graph according to the KLMS ontology.
MinIO serves as a storage layer for the files in the data lake.
Stelar Operator necessary to design and implement workflows inside the STELAR KLMS using the Apache Airflow workflow engine.
An instance of MLFlow maintains metadata regarding all executions in the same PostgreSQL database (the one also used by Data Catalog).
Dashboards offer a quick overview about datasets, workflows and tasks managed by the KLMS.
A RESTful Data API is used for managing and searching resources in the KLMS.

The STELAR KLMS supports two alternative workflow engines:

In its Community Edition, it supports Apache Airflow, which is a very popular open-source platform for this purpose.
In its Professional and Enterprise editions, it supports the RapidMiner Studio & AI Hub, which is a widely used commercial platform for machine learning and data science workflows.

KLMS tools

Synopses Data engine for Extreme Scale Analytics-as-a-Service.
GeoTriples for publishing geospatial data as Linked Geospatial Data in RDF.
pyJedAI for Schema Matching and Entity Linking.
JedAI-spatial for computing topological relations between datasets with geometric entities.
Correlation detective (CorDet) for finding interesting multivariate correlations in vector datasets.
Data Profiler, a library for profiling different types of data and files.
Data Selection interface for searching, ranking, and comparing datasets available in the KLMS Data Catalog.
GenericNER for named entity recognition (NER) on input texts.
FoodNER, a service for detecting and extracting Name Entities from Food Science text files.
Synthetic Data Generation for textual data in agri-food domain.
Hazard-classification from incidents reported in agri-food domain.

Examples

Orchestration of several KLMS components for entity extraction and linking over unstructured food safety data employing Airflow workflow engine and the Data API for publishing and searching in the Data Catalog.

License

The contents of this project are licensed under the GPL-2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.vscode		.vscode
SDE		SDE
apisix		apisix
demos/Eindhoven		demos/Eindhoven
environments		environments
examples/workflows		examples/workflows
keycloak		keycloak
lib		lib
minio		minio
misc		misc
ontop		ontop
podinit		podinit
registry		registry
src		src
superset		superset
tests		tests
testservice		testservice
.gitignore		.gitignore
.gitmodules		.gitmodules
DEPLOY.md		DEPLOY.md
LICENSE		LICENSE
README.md		README.md
addons.sh		addons.sh
bootstrap.py		bootstrap.py
chartfile.yaml		chartfile.yaml
jsonnetfile.json		jsonnetfile.json
jsonnetfile.lock.json		jsonnetfile.lock.json
minikube-ingres-dns-config.sh		minikube-ingres-dns-config.sh
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sample_input.json		sample_input.json
test_boot.py		test_boot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

KLMS core components

KLMS tools

Examples

License

About

Releases

Packages

Contributors 3

Languages

License

stelar-eu/klms-deploy

Folders and files

Latest commit

History

Repository files navigation

Overview

KLMS core components

KLMS tools

Examples

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages