diff --git a/docs/Projects/TAD/reporting_standard.md b/docs/Projects/TAD/reporting_standard.md new file mode 100644 index 00000000..6275f45d --- /dev/null +++ b/docs/Projects/TAD/reporting_standard.md @@ -0,0 +1,398 @@ +# TAD Reporting Standard + +**Version**: 0.1a1 + +This document describes the **T**ransparency of **A**lgorithmic **D**ecision making (TAD) Reporting Standard. + +For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a +reporting standard so that information about an algorithmic system can be shared. This reporting standard +describes how information about the different phases of an algorithm's life cycle can be reported. +It contains, among other things, descriptive information combined with information about the technical +tests and assessments applied. + +!!! warning "Disclaimer" + + The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal + and will change significantly in future versions. + +## Introduction + +Inspired by [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993) +and [Papers with Code Model Index](https://github.com/paperswithcode/model-index) this standard almost +[^1] [^2] [^3] [^4] extends the [Hugging Face model card metadata specification](https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1) +to allow for: + +1. More finegrained information on performance metrics, by extending the `metrics_field` from the Hugging +Face metadata specification. +2. Capturing additional *measurements* on fairness and bias, which can be partitioned into bar plot like +measurements (such as mean absolute SHAP values) and graph plot like measurements (such as partial dependence). This is achieved +by defining an new field `measurements`. +3. Capturing *assessments* (such as [IAMA](https://www.rijksoverheid.nl/documenten/rapporten/2021/02/25/impact-assessment-mensenrechten-en-algoritmes) +and [ALTAI](https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment)). +This is achieved by defining a new field `assessments`. + +Following Hugging Face, this proposed standard will be written in yaml. + +This standard does not contain all fields present in the Hugging Face metadata specification. The fields that +are optional in the HuggingFace specification and are specific to the HugginFace interface are ommited. + +Another difference is that we devide our implementation into three seperate parts. + +1. `system_card`, containing information about a group of ML-models which accomplish a specific task. +2. `model_card`, containing information about a specific data science model. +3. `assessment_card`, containing information about a regulatory assessment. + +!!! note "Include statements" + + These `model_card`s and `assessment_card`s can be included verbatim into a `system_card`, + or referenced with an `!include` statement, allowing for minimal cards to be compact in a single + file. Extensive cards can be split up for readability and maintainability. Our standard allows for + the `!include` to be used anywhere. + +## Specification of the standard + +The standard will be written in yaml. Example yaml files are given in the next section. The standard defines +three cards: a `system_card`, a `model_card` and an `assessment_card`. A `system_card` contains information +about an algorithmic system. It can have mutiple models and each of these models should have a `model_card`. +Regulatory assessments can be processed in an `assessment_card`. Note that `model_card`'s and +`assessment_card`'s can be included directly into the `system_card` or can be included as seperate yaml +files with help of a yaml-include mechanism. For clarity the latter is preffered and is also used in +the examples in the next section. + +### `system_card` + +A `system_card` contains the following information. + +1. `schema_version` (REQUIRED, string). Version of the schema used, for example "0.1a1". +2. `name` (OPTIONAL, string). Name used to describe the system. +3. `upl` (OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, + it should contain a URI from the [Uniform Product List](https://standaarden.overheid.nl/owms/oquery/UPL-actueel.plain). +4. `owners` (list). There can be multiple owners. For each owner the following fields are present. + + 1. `oin` (OPTIONAL, string). If applicable the [Organisatie-identificatienummer (OIN)](https://oinregister.logius.nl/oin-register). + 2. `organization` (OPTIONAL, string). Name of the organization that owns the model. If `ion` is + NOT provided this field is REQUIRED. + 3. `name` (OPTIONAL, string). Name of a contact person within the organisation. + 4. `email` (OPTIONAL, string). Email address of the contact person or organization. + 5. `role` (OPTIONAL, string). Role of the contact person. This field should only be set when the `name` field + is set. + +5. `description` (OPTIONAL, string). A short description of the system. +6. `labels` (OPTIONAL, list). This fields allows to store meta information about a system. There +can be multiple labels. For each label the following fields are present. + + 1. `name` (OPTIONAL, string). Name of the label. + 2. `value` (OPTIONAL, string). Value of the label. + +7. `status` (OPTIONAL, string). The status of the system. For example the status can be "production". +8. `publication_category` (OPTIONAL, enum[string]). The publication category of the algorithm should +be chosen from `["high_risk", other"]`. +9. `begin_date` (OPTIONAL, string). The first date the system was used. +Date should be given in [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) format, i.e. YYYY-MM-DD. +10. `end_date` (OPTIONAL, string). The last date the system was used. +Date should be given in [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) format, i.e. YYYY-MM-DD. +11. `goal_and_impact` (OPTIONAL, string). The purpose of the system and the impact it has on citizens +and companies. +12. `considerations` (OPTIONAL, string). The pro's and con's of using the system. +13. `risk_management` (OPTIONAL, string). Description of the risks associated with the system. +14. `human_intervention` (OPTIONAL, string). A description to want extend there is human involvement +in the system. +15. `legal_base` (OPTIONAL, list). If there exists a legal base for the process the system is embedded +in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each +legal base the following fields are present. + 1. `name` (OPTIONAL, string). Name of the law. + 2. `link` (OPTIONAL, string). URI pointing towards the contents of the law. +16. `used_data` (OPTIONAL, string). An overview of the data that is used in the system. +17. `technical_design` (OPTIONAL, string). Description on how the system works. +18. `external_providers` (OPTIONAL, list[string]). Name of an external provider, if relevant. There can +be multiple external providers. +19. `references` (OPTIONAL, list[string]). Additional reference URI's that point information about the system +and are relevant. + +#### 1. Models + +1. `models` (OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or `!include`s of a yaml +file containing a model card. This model card can for example be a model card described in the next section +or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used. + +#### 2. Assessments + +1. `assessments` (OPTIONAL, list[AssesmentCard]). A list of assessment cards (as defined below) or `!include`s of a yaml +file containing a assessment card. This assessment card is an assessment card described in the next section. +There can be multiple assessment cards, meaning multiple assessment were performed. + +### `model_card` + +A `model_card` contains the following information. + +1. `language` (OPTIONAL, list[string]). If relevant, the natural languages the model supports in [ISO 639](https://www.iso.org/iso-639-language-code). + There can be multiple languages. +2. `license`(REQUIRED, string). Any license from the [open source license list](https://opensource.org/license) +[^1]. If the license is NOT present in the license list this field must be set to 'other' and the following +two fields will be REQUIRED. + + 1. `license_name` (string). An id for the license. + 2. `license_link` (string). A link to a file of that name inside the repo, or a URL to a remote file containing the license + contents. + +3. `tags` (OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags. +4. `owners` (list). There can be multiple owners. For each owner the following fields are present. + + 1. `oin` (OPTIONAL, string). If applicable the [Organisatie-identificatienummer (OIN)](https://oinregister.logius.nl/oin-register). + 2. `organization` (OPTIONAL, string). Name of the organization that owns the model. If `ion` is + NOT provided this field is REQUIRED. + 3. `name` (OPTIONAL, string). Name of a contact person within the organisation. + 4. `email` (OPTIONAL, string). Email address of the contact person or organization. + 5. `role` (OPTIONAL, string). Role of the contact person. This field should only be set when the `name` field + is set. + +#### 1. Model Index + +There can be multiple models. For each model the following fields are present. + +1. `name` (REQUIRED, string). The name of the model. +2. `model` (REQUIRED, string). A URI pointing to a repository containing the model file. +3. `artifacts` (OPTIONAL, list[string]). A list of URI's where each URI refers to a relevant model +artifact, that cannot be captured by any other field, but are relevant to model. +4. `parameters` (list). There can be multiple parameters. For each parameter the following fields are present. + + 1. `name` (REQUIRED, string). The name of the parameter, for example "epochs". + 2. `dtype` (OPTIONAL, string). The datatype of the parameter, for example "int". + 3. `value` (OPTIONAL, string). The value of the parameter, for example 100. + 4. `labels` (list). This field allows to store meta information about a parameter. + There can be multiple labels. For each label the following fields are present. + + 1. `name` (OPTIONAL, string). The name of the label. + 2. `dtype` (OPTIONAL, string). The datatype of the feature. If `name` is set, this field + is REQUIRED. + 3. `value` (OPTIONAL, string). The value of the feature. If `name` is set, this field is REQUIRED. + +5. `results` (list). There can be multiple results. For each result the following fields are present. + + 1. `task` (OPTIONAL, list). + + 1. `task_type` (REQUIRED, string). The task of the model, for example "object-classifcation". + 2. `task_name` (OPTIONAL, string). A pretty name fo the model taks, for example "Object Classification". + + 2. `datasets` (list). There can be multiple datasets [^2]. For each dataset the following fields are present. + + 1. `type` (REQUIRED, string). The type of the dataset, can be a dataset id from [HuggingFace datasets](https://hf.co/datasets) + or any other link to a repository containing the dataset[^3], for example "common_voice". + 2. `name` (REQUIRED, string). Name pretty name for the dataset, for example "Common Voice (French)". + 3. `split` (OPTIONAL, string). The split of the dataset, for example "train". + 4. `features` (OPTIONAL, list[string]). List of feature names. + 5. `revision` (OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb. + + 3. `metrics` (list). There can be multiple metrics. For each metric the following fields are present. + + 1. `type` (REQUIRED, string). A metric-id from [Hugging Face metrics](https://hf.co//metrics)[^4], for example accuracy. + 2. `name` (REQUIRED, string). A descriptive name of the metric. For example "false positive rate" is + not a descriptive name, but "training false positive rate w.r.t class x" is. + 3. `dtype` (REQUIRED, string). The data type of the metric, for example `float`. + 4. `value` (REQUIRED, string). The value of the metric. + 5. `labels` (list). This field allows to store meta information about a metric. For example, + metrics can be computed for example on subgroups of specific features. + For example, one can compute the accuracy for examples where the feature "gender" is set to "male". + There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. + There can be multiple labels. For each label the following fields are present. + + 1. `name` (OPTIONAL, string). The name of the feature. For example: "gender". + 2. `type` (OPTIONAL, string). The type of the label. Can for example be set to "feature" or "output_class". + If `name` is set, this field is REQUIRED. + 3. `dtype` (OPTIONAL, string). The datatype of the feature, for example `float`. If `name` is set, this field + is REQUIRED. + 4. `value` (OPTIONAL, string). The value of the feature. If `name` is set, this field is REQUIRED. + For example: "male". + + 4. `measurements`. + + 1. `bar_plots` (list). The purpose of this field is to capute bar plot like measurements, for example SHAP values. + There can be multiple bar plots. For each bar plot the following fields are present. + + 1. `type` (REQUIRED, string). The type of bar plot, for example "SHAP". + 2. `name` (OPTIONAL, string). A pretty name for the plot, for example "Mean Absolute SHAP Values". + 3. `results` (list). The contents of the bar plot. A result represents a bar. There can be mutiple results. + For each result the following fields are present. + 1. `name` (REQUIRED, string). The name of bar. + 2. `value` (REQUIRED, float). The value of the corresponding bar. + + 2. `graph_plots` (list). The purpose of this field is to capture graph plot like measurements, such as partial dependence + plots. + There can be multiple graph plots. For each graph plot the following fields are present. + + 1. `type` (REQUIRED, string). The type of the graph plot, for example "partial_dependence". + 2. `name` (OPTONAL, string). A pretty name of the graph, for example "Partial Dependence Plot". + 3. `results` (list). Results contains the graph plot data. Each graph can depend on a specific output + class and feature. There can be multiple results. For each result the following fields are present. + 1. `class` (OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. + This field is not always present. + 2. `feature` (REQUIRED, string). The feature the graph corresponds to. This is required, since all + relevant graphs are dependend on features. + 3. `data` (list) + 1. `x_value` (REQUIRED, float). The $x$-value of the graph. + 2. `y_value` (REQUIRED, float). The $y$-value of the graph. + +### `assessment_card` + +An `assessment_card` contains the following information. + +1. `name` (REQUIRED, string). The name of the assessment. +2. `date` (REQUIRED, string). The date at which the assessment is completed. +Date should be given in [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) format, i.e. YYYY-MM-DD. +3. `contents` (list). There can be multiple items in contents. For each item the following fields are present: + + 1. `question` (REQUIRED, string). A question. + 2. `answer` (REQUIRED, string). An answer. + 3. `remarks` (OPTIONAL, string). A field to put relevant discussion remarks in. + 4. `authors`. There can be multiple names. For each name the following field is present. + 1. `name` (OPTIONAL, string). The name of the author of the question. + 5. `timestamp` (OPTIONAL, string). A timestamp of the date and time of the answer. + +## Example + +### System Card + +```yaml +version: {system_card_version} # Optional. Example: "0.1a1" +name: {system_name} # Optional. Example: "AangifteVertrekBuitenland" +upl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland +owners: +- oin: {oin} # Optional. Example: 00000001003214345000 + organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK + name: {owner_name} # Optional. Example: John Doe + email: {owner_email} # Optional. Example: johndoe@email.com + role: {owner_role} # Optional. Example: Data Scientist. +description: {system_description} # Optional. Short description of the system. +labels: # Optional labels to store metadata about the system. +- name: {label_name} # Optional. + value: {label_value} # Optional. +status: {system_status} # Optional. Example "production". +publication_category: {system_publication_cat} # Optional. Example: "impactfull_algorithm". +begin_date: {system_begin_date} # Optional. Example: 2025-1-1. +end_date: {system_end_date} # Optional. Example: 2025-12-1. +goal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system. +considerations: {system_considerations} # Optional. Considerations about the system. +risk_management: {system_risk_management} # Optional. Description of risks associated with the system. +human_intervention: {system_human_intervention} # Optional. Description of uman involvement in the system. +legal_base: +- name: {law_name} # Optional. Example: "AVG". + link: {law_uri} # Optional. Example: "https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046". +used_data: {system_used_data} # Optional. Description of the data used by the system. +technical_design: {technical_design} # Optional. Description of the technical design of the system. +external_providers: +- {system_external_provider} # Optional. Reference to used external providers. +references: +- {reference_uri} # Optional. Example: URI to codebase. + +models: + - !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml. + +assessments: +- !include {assessment_card_uri} # Required. Example: iama.yaml. +``` + +### Model Card + +```yaml +language: + - {lang_0} # Optional. Example nl. +license: {licence} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or "other". +license_name: {licence_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0' +license_link: {license_link} # Optional if license != other, Required otherwise. Specify "LICENSE" or "LICENSE.md" to link to a file of that name inside the repo, or a URL to a remote file. +tags: +- {tag_0} # Optional. Example: audio +- {tag_1} # Optional. Example: automatic-speech-recognition +owners: +- organization: {organization_name} # Required. Example: BZK + oin: {oin} # Optional. Example: 00000001003214345000 + name: {owner_name} # Optional. Example: John Doe + email: {owner_email} # Optional. Example: johndoe@email.com + role: {owner_role} # Optional. Example: Data Scientist. + +model-index: +- name: {model_id} # Required. Example: CatClassifier. + model: {model_uri} # Required. URI to a repository containing the model file. + artifacs: + - {model_artifact} # Optional. URI to relevant model artifacts, if applicable. + parameters: + - name: {parameter_name} # Optional. Example: "epochs". + dtype: {parameter_dtype} # Optional. Example "int". + value: {parameter_value} # Optional. Example 100. + labels: + - name: {label_name} # Optional. Example: "gender". + dtype: {label_type} # Optional. Example: "string". + value: {label_value} # Optional. Example: "female". + results: + - task: + type: {task_type} # Required. Example: image-classification. + name: {task_name} # Optional. Example: Image Classification. + datasets: + - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset + name: {dataset_name} # Required. Example: "Common Voice (French)". A pretty name for the dataset. + split: {split} # Optional. Example "train". + features: + - {feature_name} # Optional. Example: "gender". + revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb + metrics: + - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics. + name: {metric_name} # Required. Example: "FPR wrt class 0 restricted to feature gender:0 and age:21". + dtype: {metric_dtype} # Required. Example: "float". + value: {metric_value} # Required. Example: 0.75. + labels: + - name: {label_name} # Optional. Example: "gender". + type: {label_type} # Optional. Exmple "feature". + dtype: {label_type} # Optional. Example: "string". + value: {label_value} # Optional. Example: "female". + measurements: + # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify. + bar_plots: + - type: {measurement_type} # Required. Example: "SHAP". + name: {measurement_name} # Optional. Example: "Mean Absolute Shap Values". + results: + - name: {bar_name} # Required. The name of a bar. + value: {bar_value} # Required. The corresponding value. + # Graph plots should be able to capture graph based measurements such as partial dependence and accumalated local effect. + graph_plots: + - type: {measurement_type} # Required. Example: "partial_dependence". + name: {measurement_name} # Optional. Example: "Partial Dependence Plot". + # Results store the graph plot data. So far all plots are depenendend on a combination of a specific class (sometimes) and feature (always). + # For example partial dependence plots are made for each feature and class. + results: + - class: {class_name} # Optional. Name of the output class the graph depends on. + feature: {feature_name} # Required. Name of the feature the graph depends on. + data: + - x_value: {x_value} # Required. The x value of the graph data. + y_value: {y_value} # Required. The y value of the graph data. + +``` + +### Assessment Card + +```yaml +name: {assessment_name} # Required. Example: IAMA. +date: {assessment_date} # Required. Example: 25-03-2025. +contents: + - question: {question_text} # Required. Example: "Question 1: ...". + answer: {answer_text} # Required. Example: "Answer: ...". + remarks: {remarks_text} # Optional. Example: "Remarks: ...". + authors: # Optional. Example: "['John', 'Peter']". + - name: {author_name} + timestamp: {timestamp} # Optional. Example: 1711630721. +``` + +## Schema + +JSON schema will be added when we publish the first beta version. + +[^1]: Deviation from the HuggingFace specification is in the Licence field. HuggingFace only accepts +dataset id's from [HuggingFace license list](https://huggingface.co/docs/hub/repositories-licenses) while we +accept any license from [Open Source License List](https://opensource.org/license). + +[^2]: Deviation from the HuggingFace specification is in the `model_index:results:dataset` field. HuggingFace only accepts +one dataset, while we accept a list of datasets. + +[^3]: Deviation from the HuggingFace specification is in the Dataset Type field. HuggingFace only accepts +dataset id's from [HuggingFace datasets](https://hf.co/datasets) while we also allow for any url pointing to the dataset. + +[^4]: For this extension to work relevent metrics (such as for example false positive rate) have to be added to the +[Hugging Face metrics](https://hf.co//metrics), possibly this can be done in our organizational namespace. diff --git a/mkdocs.yml b/mkdocs.yml index 593d756e..4d36a1d5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -33,6 +33,7 @@ markdown_extensions: - pymdownx.tabbed: alternate_style: true - pymdownx.superfences + - footnotes - pymdownx.details - admonition - attr_list