Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADX-959 Make schemas in unaids-data-specification compatible with upstream frictionless/ckanext-validation #56

Closed
wants to merge 3 commits into from

Conversation

ChasNelson1990
Copy link
Member

@ChasNelson1990 ChasNelson1990 commented Feb 1, 2023

Description

https://fjelltopp.atlassian.net/browse/ADX-959

Note: keeping this as draft until our scheming and validation changes are ready to be merged

This updates our table schemas to match the latest Frictionless Framework v5 schemas: https://specs.frictionlessdata.io/table-schema/#language

Not all files requires changing, those that have have been updated with the existing version method.

Most changes were fields with a missing type but there was also a primary key typo and other small bits and bobs.

Testing

So... this is up for discussion. Currently we don't do any testing and actually I found at least one typo and several missing "type" declarations in our existing schemas.

The files I tested against where available in dev at: https://dev.adr.fjelltopp.org/country-estimates-23/united-kingdom-country-estimates-2023

I have been testing this with the below script (see code block at bottom) - we could easily turn this into an automated test that runs through a GitHub action on PR?

Checklist

  • The Jira ticket for this issue has been updated to "Ready to Review" or equivalent.
  • I have developed these changes in discussion with the appropriate project manager.
  • My code follows the general Fjelltopp documentation (see Confluence).
  • I have made corresponding changes to the Fjelltopp documentation (see Confluence).
  • I have rebased this branch with master.
  • New dependency changes have been committed.
  • I have added automated tests that prove my fix is effective or that my feature works.
  • New and existing tests pass locally with my changes.
  • My changes generate no new warnings.
  • I have performed a self-review of my own code.
  • I have assigned at least one reviewer.
"""
Helper script to validate table schemas and test data files.
To use this script set up an environment with frictionless installed.
Change "ROOT_FOLDER" to point to the folder containing the table schemas.
Run the script - like this it will validate the latest schema in each subfolder.
However, you can also add a test data file (csv or geojson) to any subfolder
and the script will validate that data file both with and without the schema definition.
"""

import os
from frictionless import validate
from pprint import pprint

ROOT_FOLDER = "./unaids_data_specifications/table_schemas/"

green_terminal_fg = "\033[32m"
red_terminal_fg = "\033[31m"
restore_terminal_fg = "\033[m"

if __name__ == "__main__":
    table_schemas = [f for f in os.listdir(ROOT_FOLDER)]
    for schema in table_schemas:
        # get latest schema version
        latest_version = sorted(
            [
                v
                for v in os.listdir(os.path.join(ROOT_FOLDER, schema))
                if v.endswith(".json")
            ]
        )[-1]

        print(f"Validating schema: {schema}, using version: {latest_version};")

        # is schema valid?
        schema_report = validate(
            os.path.join(ROOT_FOLDER, schema, latest_version), type="schema"
        )
        if schema_report.valid:
            print(f"\t Schema is {green_terminal_fg}valid{restore_terminal_fg}")
        else:
            print(f"\t Schema is {red_terminal_fg}invalid{restore_terminal_fg}")
            pprint(schema_report.stats)
            with open(
                os.path.join(ROOT_FOLDER, schema, "0_schema_report.json"), "w"
            ) as f:
                f.write(schema_report.to_json())

        # get test data
        data_files = sorted(
            [
                v
                for v in os.listdir(os.path.join(ROOT_FOLDER, schema))
                if not v.endswith(".json")
            ]
        )
        data_file = data_files[0] if len(data_files) else None
        if data_file and schema_report.valid:
            # is data valid?
            data_report = validate(os.path.join(ROOT_FOLDER, schema, data_file))
            if data_report.valid:
                print(f"\t Data is {green_terminal_fg}valid{restore_terminal_fg}")
            else:
                print(f"\t Data is {red_terminal_fg}invalid{restore_terminal_fg}")
                pprint(data_report.stats)
                with open(
                    os.path.join(ROOT_FOLDER, schema, "0_data_report.json"), "w"
                ) as f:
                    f.write(data_report.to_json())

            if data_report.valid:
                # is resource valid?
                resource_report = validate(
                    os.path.join(ROOT_FOLDER, schema, data_file),
                    schema=os.path.join(ROOT_FOLDER, schema, latest_version),
                    type="resource",
                )
                if data_report.valid:
                    print(
                        f"\t Resource is {green_terminal_fg}valid{restore_terminal_fg}"
                    )
                else:
                    print(
                        f"\t Resource is {red_terminal_fg}invalid{restore_terminal_fg}"
                    )
                    pprint(resource_report.stats)
                    with open(
                        os.path.join(ROOT_FOLDER, schema, "0_resource_report.json"), "w"
                    ) as f:
                        f.write(resource_report.to_json())

@ChasNelson1990 ChasNelson1990 added the enhancement New feature or request label Feb 1, 2023
@ChasNelson1990 ChasNelson1990 requested a review from fulior February 1, 2023 15:00
@ChasNelson1990 ChasNelson1990 self-assigned this Feb 1, 2023
@fulior
Copy link
Member

fulior commented Feb 2, 2023

Could we add some kind of a "linter" to validate those schemas? And run it with GitHub Actions? If you know the tool (frictionless cli?) then I can implement it.

edit: just realized that there's script attached below our checklist, I think I can start with it :)

@ChasNelson1990
Copy link
Member Author

Superseded by #58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants