Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tad reporting standard #93

Merged
merged 14 commits into from
Apr 12, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
305 changes: 305 additions & 0 deletions docs/Projects/TAD/reporting_standard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
# TAD Reporting Standard

This document describes the Transparancy of Algorithmic Decision Making (TAD) reporting standard. Assessing
fairness and bias in Machine Learning models can be done through various technical tests, but also by answering
regulatory assessments. The goal of this reporting standard is to capture these technical tests and assessments
in a standardised way.

## Introduction

Inspired by [Papers with Code Model Index](https://github.com/paperswithcode/model-index) and
[Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993) this standard almost [^1] [^2] [^3] [^4]
extends the [Hugging Face model card metadata specification](https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1)
to allow for:

1. More finegrained information on performance metrics, by extending the `metrics_field` from the Hugging
Face metadata specification.
2. Capturing additional *measurements* on fairness and bias, which can be partitioned into bar plot like
measurements (such as mean absolute SHAP values) and graph plot like measurements (such as partial dependence). This is achieved
by defining an new field `measurements`.
3. Capturing *assessments* (such as [IAMA](https://www.rijksoverheid.nl/documenten/rapporten/2021/02/25/impact-assessment-mensenrechten-en-algoritmes)
and [ALTAI](https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment)).
This is achieved by defining a new field `assessments`.

This standard does not contain all fields present in the Hugging Face metadata specification. The fields that
are optional in the HuggingFace specification AND are specific to the HugginFace interface are ommited.

Following Hugging Face, this proposed standard will be written in yaml.

Another difference is that we devide our implementation into two parts.

1. `systems card`, containing information about a group of ML-models which accomplish a specific task.
2. `model card`, containing information about a specific ML-model.

## Intended usage

The reporting standard defined in this document is intented to be given as a preamble in a README, similar to
the Hugging Face metadata.

## Specification of the standard

The standard will be written in yaml. An example yaml is given in the next section. The standard contains the
following information.

### `system_card`

A `system_card` contains the following information.

1. `upl` (OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government,
it should contain a URI from the [Uniform Product List](https://standaarden.overheid.nl/owms/oquery/UPL-actueel.plain).
2. `owners` (list). There can be multiple owners. For each owner the following fields are present.

1. `organization` (REQUIRED, string). Name of the organization that owns the model.
2. `oin` (OPTIONAL, string). If applicable the [Organisatie-identificatienummer (OIN)](https://www.logius.nl/domeinen/toegang/organisatie-identificatienummer/wat-is-het).
3. `name` (OPTIONAL, string). Name of a contact person within the organisation.
4. `email` (OPTIONAL, string). Email address of the contact person or organization.
5. `role` (OPTIONAL, string). Role of the contact person. This field should only be set when the `name` field
is set.

#### 2. Models

1. `models` (OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or `!include`s of a yaml
file containing a model card. This model card can for example be a model card described in the next section
or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.

#### 3. Assessments

There can be multiple assessments. For each assessment the following fields are present:

1. `name` (REQUIRED). The name of the assessment.
2. `date` (REQUIRED). The date at which the assessment is completed.
3. `contents`. There can be multiple items in contents. For each item the following fields are present:

1. `question` (REQUIRED). A question.
2. `answer` (REQUIRED). An answer.
3. `remarks` (OPTIONAL). A field to put relevant discussion remarks in.
4. `authors`. There can be multiple names. For each name the following field is present.
1. `name` (OPTIONAL). The name of the author of the question.
5. `timestamp` (OPTIONAL). A timestamp of the date and time of the answer.

### `model_card`

A `model_card` contains the following information.

1. `algorithm-registers` (OPTIONAL, list[string]). If this algorithm is registered in an algorithm register,
this field should contain a URI to the corresponding record in the register. There can be multiple registers.
A URI could look like `https://huggingface.co/org/modelid` or `https://algoritmes.overheid.nl/nl/algoritme/modelid`.
2. `language` (OPTIONAL, list[string]). If relevant, the natural languages the model supports in [ISO 639](https://www.iso.org/iso-639-language-code).
There can be multiple languages.
3. `license`(REQUIRED, string). Any license from the [open source license list](https://opensource.org/license)
[^1]. If the license is NOT present in the license list this field must be set to 'other' and the following
two fields will be REQUIRED.

1. `license_name` (string). An id for the license.
2. `license_link` (string). A link to a file of that name inside the repo, or a URL to a remote file containing the license
contents.

4. `library_name` (OPTIONAL, string). Any library from the [library list](https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/model-libraries.ts).
If the library is NOT present in the library list, the following fields CAN be set.
1. `library` (OPTIONAL, string). If a library is used that is not in the above list, you can provide
the name of the library used here.
2. `library_link` (OPTIONAL, string). If a libary is used that is not in the above list, you can
provide a URI to the repo of the library used here.
5. `tags` (OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.

### 2. `model_index`

There can be multiple models. For each model the following fields are present.

1. `name` (REQUIRED, string). The name of the model.
2. `model` (REQUIRED, string). A URI pointing to a repository containing the model file.
3. `artifacts` (OPTIONAL, list[string]). A list of URI's where each URI refers to a relevant model
artifact, that cannot be captured by any other field, but are relevant to model.
4. `parameters` (list). There can be multiple parameters. For each parameter the following fields are present.

1. `name` (REQUIRED, string). The name of the parameter, for example "epochs".
2. `dtype` (OPTIONAL, string). The datatype of the parameter, for example "int".
3. `value` (OPTIONAL, string). The value of the parameter, for example 100.

5. `results` (list). There can be multiple results. For each result the following fields are present.

1. `task` (list).

1. `task_type` (REQUIRED, string). The task of the model, for example "object-classifcation".
2. `task_name` (OPTIONAL, string). A pretty name fo the model taks, for example "Object Classification".

2. `dataset` (list). There can be multiple results [^2]. For each result the following fields are present.

1. `type` (REQUIRED, string). The type of the dataset, can be a dataset id from [HuggingFace datasets](https://hf.co/datasets)
or any other link to a repository containing the dataset[^3], for example "common_voice".
2. `name` (REQUIRED, string). Name pretty name for the dataset, for example "Common Voice (French)".
3. `split` (OPTIONAL, string). The split of the dataset, for example "train".
4. `features` (OPTIONAL, list[string]). List of feature names.
5. `revision` (OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.

3. `metrics` (list). There can be multiple metrics. For each metric the following fields are present.

1. `type` (REQUIRED, string). A metric-id from [Hugging Face metrics](https://hf.co//metrics)[^4], for example accuracy.
2. `name` (REQUIRED, string). A descriptive name of the metric. For example "false positive rate" is
not a descriptive name, but "training false positive rate w.r.t class x" is.
3. `dtype` (REQUIRED, string). The data type of the metric, for example `float`.
4. `value` (REQUIRED, float). The value of the metric.
5. `class` (OPTIONAL, string/int/float/bool). Some metrics (such as false positive rate for multiclass classification)
depend on a specific output class. In this field the output class can be specified. It must only be set for metrics
for which it makes sense.
6. `labels` (list). Metrics can be computed for example on subgroups of specific features.
For example one can compute the accuracy for examples where the feature "gender" is set to "male".
There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups.
For each subgroup the following fields are present.

1. `name` (OPTIONAL, string). The name of the feature. For example: "gender".
2. `type` (OPTIONAL, string). The datatype of the feature, for example `float`. If `name` is set, this field
must be set as well.
3. `value` (OPTIONAL, string). The value of the feature. If `name` is set, this field must be set as wel.
For example: "male".

4. `measurements`.

1. `bar_plots` (list). The purpose of this field is to capute bar plot like measurements, for example SHAP values.
There can be multiple bar plots. For each bar plot the following fields are present.

1. `type` (REQUIRED, string). The type of bar plot, for example "SHAP".
2. `name` (OPTIONAL, string). A pretty name for the plot, for example "Mean Absolute SHAP Values".
3. `results` (list). The contents of the bar plot. A result represents a bar. There can be mutiple results.
For each result the following fields are present.
1. `name` (REQUIRED, string). The name of bar.
2. `value` (REQUIRED, float). The value of the corresponding bar.

2. `graph_plots` (list). The purpose of this field is to capture graph plot like measurements, such as partial dependence
plots.
There can be multiple graph plots. For each graph plot the following fields are present.

1. `type` (REQUIRED, string). The type of the graph plot, for example "partial_dependence".
2. `name` (OPTONAL, string). A pretty name of the graph, for example "Partial Dependence Plot".
3. `results` (list). Results contains the graph plot data. Each graph can depend on a specific output
class and feature. There can be multiple results. For each result the following fields are present.
1. `class` (OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to.
This field is not always present.
2. `feature` (REQUIRED, string). The feature the graph corresponds to. This is required, since all
relevant graphs are dependend on features.
3. `data` (list)
1. `x_value` (REQUIRED, float). The $x$-value of the graph.
2. `y_value` (REQUIRED, float). The $y$-value of the graph.

## Example

### System Card

```yaml
upl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland
owners:
- organization: {organization_name} # Required. Example: BZK
oin: {oin} # Optional. Example: 00000001003214345000
name: {owner_name} # Optional. Example: John Doe
email: {owner_email} # Optional. Example: [email protected]
role: {owner_role} # Optional. Example: Data Scientist.

# Model Cards
models:
- !include {model_card_uri} # Optional. Example: model_card.yaml.

# Assessments like IAMA.
assessments:
- name: {assessment_name} # Required. Example: IAMA.
date: {assessment_date} # Required. Example: 25-03-2025.
contents:
- question: {question_text} # Required. Example: "Question 1: ...".
answer: {answer_text} # Required. Example: "Answer: ...".
remarks: {remarks_text} # Optional. Example: "Remarks: ...".
authors: # Optional. Example: "['John', 'Peter']".
- name: {author_name}
timestamp: {timestamp} # Optional. Example: 1711630721.
```

### Model Card

```yaml
algorithm-registers:
- {id} # Optional. Example: https://algoritmes.overheid.nl/nl/algoritme/45991543.
language:
- {lang_0} # Optional. Example nl.
license: {licence} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or "other".
license_name: {licence_name} # Required. if license = other, specify an id for the licence. Example: 'my-license-1.0'
license_link: {license_link} # Required. if license = other, specify "LICENSE" or "LICENSE.md" to link to a file of that name inside the repo, or a URL to a remote file.
library_name: {library_name} # Optional. Example: keras or any library from https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/model-libraries.ts.
library: {library_name} # Optional. Example: mylibrary
library_link: {library_link} # Optional. URI to user provided library.
tags:
- {tag_0} # Optional. Example: audio
- {tag_1} # Optional. Example: automatic-speech-recognition
owners:
- organization: {organization_name} # Required. Example: BZK
oin: {oin} # Optional. Example: 00000001003214345000
name: {owner_name} # Optional. Example: John Doe
email: {owner_email} # Optional. Example: [email protected]
role: {owner_role} # Optional. Example: Data Scientist.

model-index:
- name: {model_id} # Required. Example: CatClassifier.
model: {model_uri} # Required. URI to a repository containing the model file.
artifacs:
- {model_artifact} # Optional. URI to relevant model artifacts, if applicable.
parameters:
- name: {parameter_name} # Optional. Example: "epochs".
dtype: {parameter_dtype} # Optional. Example "int".
value: {parameter_value} # Optional. Example 100.
results:
- task:
type: {task_type} # Required. Example: image-classification.
name: {task_name} # Optional. Example: Image Classification.
dataset:
- type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset
name: {dataset_name} # Required. Example: "Common Voice (French)". A pretty name for the dataset.
split: {split} # Optional. Example "train".
features:
- {feature_name} # Optional. Example: "gender".
revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb
metrics:
- type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.
name: {metric_name} # Required. Example: "FPR wrt class 0 restricted to feature gender:0 and age:21".
dtype: {metric_dtype} # Required. Example: "float".
value: {metric_value} # Required. Example: 0.75.
class: {metric_output_class} # Optional. Example: "1". Only relevant for certain metrics such as for example false-positive-rate positive rate.
labels:
- name: {feature_name} # Optional. Example: "gender".
type: {feature_type} # Optional. Example: "string".
value: {feature_value} # Optional. Example: "female".
measurements:
# Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.
bar_plots:
- type: {measurement_type} # Required. Example: "SHAP".
name: {measurement_name} # Optional. Example: "Mean Absolute Shap Values".
results:
- name: {bar_name} # Required. The name of a bar.
value: {bar_value} # Required. The corresponding value.
# Graph plots should be able to capture graph based measurements such as partial dependence and accumalated local effect.
graph_plots:
- type: {measurement_type} # Required. Example: "partial_dependence".
name: {measurement_name} # Optional. Example: "Partial Dependence Plot".
# Results store the graph plot data. So far all plots are depenendend on a combination of a specific class (sometimes) and feature (always).
# For example partial dependence plots are made for each feature and class.
results:
- class: {class_name} # Optional. Name of the output class the graph depends on.
feature: {feature_name} # Required. Name of the feature the graph depends on.
data:
- x_value: {x_value} # Required. The x value of the graph data.
y_value: {y_value} # Required. The y value of the graph data.

```

## Schema

JSON schema will be added here.

[^1]: Deviation from the HuggingFace specification is in the Licence field. HuggingFace only accepts
dataset id's from [HuggingFace license list](https://huggingface.co/docs/hub/repositories-licenses) while we
accept any license from [Open Source License List](https://opensource.org/license).

[^2]: Deviation from the HuggingFace specification is in the `model_index:results:dataset` field. HuggingFace only accepts
one dataset, while we accept a list of datasets.

[^3]: Deviation from the HuggingFace specification is in the Dataset Type field. HuggingFace only accepts
dataset id's from [HuggingFace datasets](https://hf.co/datasets) while we also allow for any url pointing to the dataset.

[^4]: For this extension to work relevent metrics (such as for example false positive rate) have to be added to the
[Hugging Face metrics](https://hf.co//metrics), possibly this can be done in our organizational namespace.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ markdown_extensions:
- pymdownx.tabbed:
alternate_style: true
- pymdownx.superfences
- footnotes
- pymdownx.details
- admonition
- attr_list
Expand Down