Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating dev code #15

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 10 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,6 @@ The OpenLXP-XIA implements five core workflows after extracting metadata from th
`Python >=3.7` : Download and install python from here [Python](https://www.python.org/downloads/).


## Environment Variables

To run this package along with your project, you will need to add the following environment variables to your .env file


`BUCKET_NAME` - S3 Bucket name where schema files are stored

`AWS_ACCESS_KEY_ID` - AWS access keys

`AWS_SECRET_ACCESS_KEY` - AWS access password

`AWS_DEFAULT_REGION` - AWS region


# Installation

$ python -m pip install OpenLXP-XIA (use the latest package version)
Expand All @@ -61,13 +47,12 @@ INSTALLED_APPS = [

`Xis metadata api endpoint`: API endpoint for XIS where metadata will get stored.

Example:
Example:
`Xis metadata api endpoint`: http://localhost:8080/api/metadata/

`Xis supplemental api endpoint`: API endpoint for XIS where supplemental metadata will get stored.

Example:

Example:
`Xis supplemental api endpoint`: http://openlxp-xis:8020/api/supplemental-data/

(Note: Replace localhost with the XIS Host)
Expand All @@ -76,14 +61,17 @@ INSTALLED_APPS = [
3. `Add xia configuration` : Configure Experience Index Agents(XIA):

`Publisher`: Agent Name

`Xss api`: API endpoint for XSS where schemas will be retrieved from.

Example:
`Xss api`: https://localhost:8000/api/

`Source metadata schema`: Schema file name for source metadata validation

`Source target mapping`: Schema file name for source to target mapping schema file
`Source metadata schema`: Schema iri or name for source metadata validation

`Target metadata schema`: Schema file name for target metadata validation
`Target metadata schema`: Schema iri or name for target metadata validation

(Note: Please make sure to upload schema files in the Experience Schema Server (XSS). In this case, upload schema files into the S3 bucket. )
(Note: Please make sure to upload schema files in the Experience Schema Server (XSS). )


4. `Add metadata field overwrite`: Here, we can add new fields and their values or overwrite values for existing fields.
Expand Down
113 changes: 112 additions & 1 deletion app/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,112 @@
# openlxp-xia

# OPENLXP-XIA
## What is it?
OpenLXP XIA is a Python package that provides the validation functionality of extracted metadata from the source. In addition, it helps transform metadata into target metadata and further load it into indexing services. The OpenLXP XIA package does not include the metadata extraction process because extraction methods can be different for different sources. But once metadata gets extracted from the source, OpenLXP-XIA continues the V-T-V-L (Validate-Transform-Validate-Load) cycle.

The schema files used for validation can be placed on the schema server. Currently, OpenLXP-XIA uses AWS S3 buckets as a schema server.

Below are the workflow which are performed by the OpenLXP-XIA after package installation.


## Workflows
The OpenLXP-XIA implements five core workflows after extracting metadata from the Specifiec source, as follows:

1. `Validate`: Compares extracted learning experience metadata against the configured source metadata reference schema stored in the Experience Schema Service (XSS).

2. `Transform`: Transforms extracted+validated source learning experience metadata to the configured target schema using the "XSR-to-Target" transformation map stored in the Experience Schema Service (XSS)

3. `Validate`: Compares transformed learning experience metadata against the configured target metadata reference schema stored in the Experience Schema Service (XSS).

4. `Load`: Pushes transformed and validated learning experience metadata to the target Experience Index Service (XIS) for further processing.

5. `Log`: Records error, warning, informational, and debug events which can be reviewed and monitored.

## Prerequisites
`Python >=3.7` : Download and install python from here [Python](https://www.python.org/downloads/).


## Installation

$ python -m pip install OpenLXP-XIA (use the latest package version)

Add OpenLXP-XIA in the setting.py in your project.

INSTALLED_APPS = [
...

'openlxp_xia',

....
]

## Configuration

1. On the Admin page, log in with the admin credentials

2. `Add xis configuration`: Configure Experience Index Services (XIS):

`Xis metadata api endpoint`: API endpoint for XIS where metadata will get stored.

Example:
`Xis metadata api endpoint`: http://localhost:8080/api/metadata/

`Xis supplemental api endpoint`: API endpoint for XIS where supplemental metadata will get stored.

Example:
`Xis supplemental api endpoint`: http://openlxp-xis:8020/api/supplemental-data/

(Note: Replace localhost with the XIS Host)


3. `Add xia configuration` : Configure Experience Index Agents(XIA):

`Publisher`: Agent Name

`Xss api`: API endpoint for XSS where schemas will be retrieved from.

Example:
`Xss api`: https://localhost:8000/api/

`Source metadata schema`: Schema iri or name for source metadata validation

`Target metadata schema`: Schema iri or name for target metadata validation

(Note: Please make sure to upload schema files in the Experience Schema Server (XSS). )


4. `Add metadata field overwrite`: Here, we can add new fields and their values or overwrite values for existing fields.

`Field name`: Add new or existing field Name

`Field type`: Add date type of the field

`Field value`: Add corresponding value

`Overwrite`: Check the box if existing values need to be overwritten.

## Running ETL Pipeline:

ETL or EVTVL (Extract-Transform-Load) Pipeline can be run through two ways:

1. Through API Endpoint:
To run ETL tasks run below API:

http://localhost:8000/api/xia-workflow
(Note: Change localhost with XIA host)

2. Periodically through celery beat:
On the admin page add periodic task and it's schedule. On selected time interval celery task will run.


## Logs
To check the running of celery tasks, check the logs of application and celery container.

## Documentation

## Troubleshooting


## License

This project uses the [MIT](http://www.apache.org/licenses/LICENSE-2.0) license.

10 changes: 4 additions & 6 deletions app/openlxp_xia/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,11 @@ def unmarked_default(MetadataFieldOverwriteAdmin, request, queryset):
@admin.register(XIAConfiguration)
class XIAConfigurationAdmin(admin.ModelAdmin):
list_display = (
'publisher',
'publisher', 'xss_api',
'source_metadata_schema',
'source_target_mapping',
'target_metadata_schema',)
fields = ['publisher',
'source_metadata_schema',
('source_target_mapping',
fields = ['publisher', 'xss_api',
('source_metadata_schema',
'target_metadata_schema')]

def delete_queryset(self, request, queryset):
Expand All @@ -42,7 +40,7 @@ class XISConfigurationAdmin(admin.ModelAdmin):
list_display = ('xis_metadata_api_endpoint',
'xis_supplemental_api_endpoint',)
fields = ['xis_metadata_api_endpoint',
'xis_supplemental_api_endpoint']
'xis_supplemental_api_endpoint', 'xis_api_key']


@admin.register(MetadataFieldOverwrite)
Expand Down
45 changes: 45 additions & 0 deletions app/openlxp_xia/management/utils/model_help.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import bleach
import logging
from confusable_homoglyphs import categories, confusables

logger = logging.getLogger('dict_config_logger')


def bleach_data_to_json(rdata):
"""Recursive function to bleach/clean HTML tags from string
data and return dictionary data.

:param rdata: dictionary to clean.
WARNING rdata will be edited
:return: dict"""

# iterate over dict
for key in rdata:
# if string, clean
if isinstance(rdata[key], str):
rdata[key] = bleach.clean(rdata[key], tags={}, strip=True)
# if dict, enter dict
if isinstance(rdata[key], dict):
rdata[key] = bleach_data_to_json(rdata[key])

return rdata


def confusable_homoglyphs_check(data):
"""Checks for dangerous homoglyphs."""

data_is_safe = True
for key in data:

# if string, Check homoglyph
if isinstance(data[key], str) and bool(confusables.
is_dangerous(data[key])):
data_is_safe = False
logger.info("Homoglyphs does not have the expected prefered alias")
logger.error(categories.unique_aliases(data[key]))
# if dict, enter dict
if isinstance(data[key], dict):
ret_val = confusable_homoglyphs_check(data[key])
if not ret_val:
data_is_safe = False
return data_is_safe
17 changes: 15 additions & 2 deletions app/openlxp_xia/management/utils/xis_client.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import logging

import requests
from requests.auth import AuthBase

from openlxp_xia.models import XISConfiguration

Expand Down Expand Up @@ -32,7 +33,8 @@ def posting_metadata_ledger_to_xis(renamed_data):
headers = {'Content-Type': 'application/json'}

xis_response = requests.post(url=get_xis_metadata_api_endpoint(),
data=renamed_data, headers=headers)
data=renamed_data, headers=headers,
auth=TokenAuth())
return xis_response


Expand All @@ -43,5 +45,16 @@ def posting_supplemental_metadata_to_xis(renamed_data):

xis_response = requests.post(
url=get_xis_supplemental_metadata_api_endpoint(), data=renamed_data,
headers=headers)
headers=headers, auth=TokenAuth())
return xis_response


class TokenAuth(AuthBase):
"""Attaches HTTP Authentication Header to the given Request object."""

def __call__(self, r, token_name='token'):
# modify and return the request

r.headers['Authorization'] = token_name + ' ' + \
XISConfiguration.objects.first().xis_api_key
return r
58 changes: 36 additions & 22 deletions app/openlxp_xia/management/utils/xss_client.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,42 @@
import json
import logging
import os

import boto3
import requests

from openlxp_xia.management.utils.xia_internal import dict_flatten
from openlxp_xia.models import XIAConfiguration

logger = logging.getLogger('dict_config_logger')


def get_aws_bucket_name():
"""function returns the source bucket name"""
bucket = os.environ.get('BUCKET_NAME')
return bucket


def read_json_data(file_name):
"""Setting file path for json files and ingesting as dictionary values """
s3 = boto3.resource('s3')
bucket_name = get_aws_bucket_name()
# Read json file and store as a dictionary for processing
json_path = s3.Object(bucket_name, file_name)
json_content = json_path.get()['Body'].read().decode('utf-8')
data_dict = json.loads(json_content)
return data_dict
def xss_get():
"""Function to get xss configuration value"""
conf = XIAConfiguration.objects.first()
return conf.xss_api


def read_json_data(source_schema_ref, target_schema_ref=None):
"""get schema from xss and ingest as dictionary values"""
xss_host = xss_get()
request_path = xss_host
if(target_schema_ref is not None):
if(target_schema_ref.startswith('xss:')):
request_path += 'mappings/?targetIRI=' + target_schema_ref
else:
request_path += 'mappings/?targetName=' + target_schema_ref
if(source_schema_ref.startswith('xss:')):
request_path += '&sourceIRI=' + source_schema_ref
else:
request_path += '&sourceName=' + source_schema_ref
schema = requests.get(request_path, verify=True)
json_content = schema.json()['schema_mapping']
else:
if(source_schema_ref.startswith('xss:')):
request_path += 'schemas/?iri=' + source_schema_ref
else:
request_path += 'schemas/?name=' + source_schema_ref
schema = requests.get(request_path, verify=True)
json_content = schema.json()['schema']
return json_content


def get_source_validation_schema():
Expand Down Expand Up @@ -108,10 +120,12 @@ def get_target_metadata_for_transformation():
"""Retrieve target metadata schema from XIA configuration """
logger.info("Configuration of schemas and files for transformation")
xia_data = XIAConfiguration.objects.first()
target_metadata_schema = xia_data.source_target_mapping
if not target_metadata_schema:
logger.warning("Target metadata schema field name is empty!")
target_metadata_schema = xia_data.target_metadata_schema
source_metadata_schema = xia_data.source_metadata_schema
if not target_metadata_schema or not source_metadata_schema:
logger.warning("Metadata schema field name is empty!")
logger.info("Reading schema for transformation")
# Read source transformation schema as dictionary
target_mapping_dict = read_json_data(target_metadata_schema)
target_mapping_dict = read_json_data(
source_metadata_schema, target_metadata_schema)
return target_mapping_dict
Loading
Loading