Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May28 changes #254

Merged
merged 8 commits into from
May 29, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 26 additions & 29 deletions docs/submission/dictionary-overview.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,55 @@
---
id: dictionary-overview
title: Dictionary Overview
title: Data Dictionary
---

The ICGC ARGO [Data Dictionary](/dictionary) expresses the details of the ARGO data model, which adheres to specific formats and restrictions to ensure a standard of data quality. The Data Dictionary defines a set of files, each related to a clinical concept, that can be submitted to the ARGO Data Platform.

To follow the evolution of the data dictionary, check out the [Dictionary Release Notes] (../release-notes/dictionary-releases).
For recent updates, check the [Dictionary Release Notes](../release-notes/dictionary-releases).

## Understanding the Data Dictionary Table View
## Understanding the Data Dictionary

The [dictionary table view](/dictionary) lists all of the clinical fields that the ARGO Data Platform accepts, separated by clinical TSV file.

The first list is for the Sample Registration file, which is the only file to be uploaded in the **Register Samples** section. All other lists outline the fields in the clinical TSV files that will be uploaded in the **Submit Clinical Data** section. A list of links on the right side of the page makes it very simple to navigate the clinical files. Each file starts with the name, field count, description, and a file name example to help ensure proper names are used.
Field listings can be filtered by Data Tier and Attribute, which can help identify which fields are necessary for [clinical data completion](clinical-data-validation-rules).

The lists can be filtered by Data Tier and Attribute, which can help a data submitter identify which fields are necessary for [clinical data completion](clinical-data-validation-rules).
You can explore previous dictionary versions using the dropdown at the top of the dictionary. Using the latest version of the dictionary is required during data submission.

The dictionary version appears in a dropdown above the lists, as well as a date of when it was last updated. You can explore previous dictionary versions by choosing another version number from the dropdown. This dictionary version will also be reflected in the names of the file templates that are downloaded from the Platform. It is important to use the most current files for the clinical validation to run smoothly.
### Field Descriptors

### Field Name & Description
Each field has a name and a description, the name being the same label that appears in the headers of each TSV file.
Each field has a data tier and an attribute classification, which reflects the importance of the field in terms of clinical data completion.

### Data Tier & Attributes
Each field has a data tier and an attribute classification, which reflects the importance of the field in terms of clinical data completion.
![ID](/assets/submission/dictionary-id.svg) attribute indicates:
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

#### ID Fields
An ![ID](/assets/submission/dictionary-id.svg) field is a unique identifier that is used for cross file validation. Most ids have a ![Required](/assets/submission/dictionary-required.svg) attribute, which means they are required to be provided in an uploaded clinical file in order for a submission to be valid.
- An field is a unique identifier that is used for cross file validation.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
- This field is a primary or foreign key.

#### Core Fields
![Core](/assets/submission/dictionary-core.svg) clinical fields are required for each donor in order to be accepted as an ICGC ARGO case. The set of core clinical fields were defined by the [Tissue & Clinical Annotations Working Group](http://www.icgc-argo.org/page/84/tissue-clinical-annotation-working-group) which involved regular discussions with members of the working group and ARGO Programs.
![Conditional](/assets/submission/dictionary-conditional.svg) attribute indicates:

These core clinical fields are commonly acquired in cohort-based studies and clinical trials and are required to address clinically relevant topics by cross entity analyses, and therefore constitute a critical element in the analysis of diverse ARGO Programs.
- Field must meet certain conditions, depending on the value of another field.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
- Conditions described in the data dictionary scripts & notes.

As seen in the dictionary, most core fields are ![Required](/assets/submission/dictionary-required.svg) for [clinical data completion](clinical-data-validation-rules), with the exception of some conditional fields that are dependent on the values of other fields. Upon upload, a validation error will occur, such as *"vital_status is a required field"*, if a core field is missing.
![Required](/assets/submission/dictionary-required.svg) attribute indicates:

If the field is ![Core](/assets/submission/dictionary-core.svg) + ![Conditional](/assets/submission/dictionary-conditional.svg), then it is required for clinical data completion, only if certain conditions are met. The conditions for the permissible value will be described in the notes column for the conditional field.
- Field must be filled in the submitted TSV file.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
- When pared with the `Conditional` attribute, the field is only required if conditional requirements are met.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

#### Extended Fields
![Extended](/assets/submission/dictionary-extended.svg) fields are not required for clinical data completion but it is strongly encouraged to provide as many extended fields as possible to help ensure data quality. In most cases, extended fields have a blank attribute, which means they are not required to be submitted.
![Core](/assets/submission/dictionary-core.svg) attribute indicates:
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

If a field is classified as ![Extended](/assets/submission/dictionary-extended.svg) + ![ Conditional](/assets/submission/dictionary-conditional.svg), this means you can provide a value for this field only if the condition in the notes is met.
- Field is part of the mandatory minimum set of clinical data that must be submitted.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
- The set of core clinical fields were defined by the [Tissue & Clinical Annotations Working Group](http://www.icgc-argo.org/page/84/tissue-clinical-annotation-working-group) which involved regular discussions with members of the working group and ARGO Programs.
- Core clinical fields are commonly acquired in cohort-based studies and clinical trials and are required to address clinically relevant topics by cross entity analyses, and therefore constitute a critical element in the analysis of diverse ARGO Programs.

rosibaj marked this conversation as resolved.
Show resolved Hide resolved
### Field Type
A field can be of type: Text, Integer, Number, or an Array of any of these types (array values are to be separated with a comma ","). An error will occur, such as *"The value is not permissible for this field"* if the incorrect field type is provided when uploading a file.
![Extended](/assets/submission/dictionary-extended.svg) attribute indicates:
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

### Permissible Values
Some fields will only accept certain values from a list that is provided in the permissible values column of the dictionary tables. It is mandatory that the value is written exactly as it is in the dictionary. Each value list is provided in alphabetical order and some are collapsed because they are quite long; please click the "# more" link to see the full list.

Other fields, such as IDs, are required to be written in a certain format. In this case, a regular expression is provided in this column. Some examples are also provided that link out to a regular expression resource that can be used to test if your value meets the regular expression.
- Field is not required for clinical data completion.
- It is _strongly encouraged_ to provide as many extended fields as possible as collected by your research project to provide valuable data.

rosibaj marked this conversation as resolved.
Show resolved Hide resolved
The error will occur, "The value is not permissible for this field" if you do not provide a correct permissible value or your value does not meet the provided regular expression.
### Permissible Values

### Notes & Scripts
This column includes important notes about certain fields for further conditions and descriptions. Some notes contain a "View Script" button, which opens a window with the script restrictions for that field. This code will provide the validations that are being done on this field, so you can test with your clinical values.
- Some fields will only accept certain values from a list that is provided in the permissible values column of the dictionary tables. Terms must match the dictionary spelling exactly, but can be submitted case-insensitive.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

- Other fields must meet a `regular expression` for their value.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

## Dictionary Reference Databases

Expand Down
45 changes: 24 additions & 21 deletions docs/submission/registering-samples.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,57 +3,60 @@ id: registering-samples
title: Registering Samples
---

In the ARGO Data Platform, clinical and molecular data objects are assigned ARGO Identifiers (**ARGO IDs**) used to track the data through the Platform. Each **Donor**, **Specimen**, and **Sample** entity will be assigned an **ARGO ID** that maps to your program's internal identifier.
It is important that the relationships between entities are maintained across all data submissions, as they are fundamental to data integrity across the ARGO Data Platform. Thus, during sample registration, each **Donor**, **Specimen**, and **Sample** entity will be assigned an **ARGO ID** that maps to your program's internal identifier, (also referred to as **submitter_id**).

It is important that the relationships between entities are maintained across all data submissions, as they are fundamental to data integrity across the ARGO Data Platform.

> Registration is the first step in the data submission life cycle. You **must** register samples before submitting any clinical or molecular data.
> Registration is the first step in the data submission life cycle. You **must** register samples before submitting any clinical or molecular data.

The basic set of data that must be registered for each sample consists of:
* `program_id`
* `submitter_donor_id`
* `gender`
* `submitter_specimen_id`
* `specimen_tissue_source`
* `tumour_normal_designation`
* `specimen_type`
* `submitter_sample_id`
* `sample_type`

During sample registration, **ARGO IDs** will be assigned to your program entities. Any attempts to submit data that does not refer to a registered donor or sample will result in an error. You will be prompted to complete sample registration before any clinical or molecular data is submitted to your program.
- `program_id`
- `submitter_donor_id`
- `gender`
- `submitter_specimen_id`
- `specimen_tissue_source`
- `tumour_normal_designation`
- `specimen_type`
- `submitter_sample_id`
- `sample_type`

During sample registration, **ARGO IDs** will be assigned to your program entities. Any attempts to submit data that does not refer to a registered donor, specimen, or sample will result in an error. You will be prompted to complete sample registration before any clinical or molecular data is submitted to your program.

## Multiple Data Submitters
## Multiple Data Submitters

There is only one Sample Registration workspace for each program. All program data submitters will be using the same workspace, and you will see which member has been working in this space by looking at the file upload info above the preview table. Please communicate with your team if you see a sample registration in progress.
There is only one Sample Registration workspace for each program. You can check is sample registration is in progress on the [Dashboard> Program Workspace](../submission/submitted-data#program-workspace-status) card. All program data submitters will be using the same workspace, and you will see which member has been working in this space by looking at the file upload info above the preview table. Please communicate with your team if you see a sample registration in progress.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

![Multiple Data Submitters](/assets/submission/registration-multiple-submitters.png)
![Multiple Data Submitters](/assets/submission/registration-multiple-submitters.png)

## How to Register Samples

### Step 1: Download and Format the Sample Registration File

![Download and Format File](/assets/submission/register-1-download.png)

1. When logged in, navigate to the **Program Services** area in the top menu.
1. Click on the **Register Samples** section in the left menu for your program.
1. Download the **TSV Template** for the sample registration file and format it according to the current [Data Dictionary](/dictionary) specifications.

For help with formatting this file, please refer to [Tips for Formatting your TSV files]( submitting-clinical-data#tips-for-formatting-tsv-files)
For help with formatting this file, please refer to [Tips for Formatting your TSV files](submitting-clinical-data#tips-for-formatting-tsv-files)

### Step 2: Upload Sample Registration TSV File

![Upload Files](/assets/submission/register-2-upload.png)

1. Once your file is formatted, click the **Upload File** button and select your file from the browser window. Only TSV file types are supported, and the file name must begin with the *sample-registration* and end with _.tsv_.
1. Once your file is formatted, click the **Upload File** button and select your file from the browser window. Only TSV file type is supported, and the file name must begin with the _sample-registration_ and end with _.tsv_.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
1. Upon uploading, if there are any errors in your file they will be displayed within the Sample Registration workspace. The error report will also be available for download. You must fix all of the errors that are listed within your sample registration file and then reupload it.
1. Valid files will be available for preview in the Sample Registration workspace. You can review the new samples (purple star) versus previously registered samples (grey star) by filtering the table on the star column.
1. Valid files will be available for preview in the Sample Registration workspace. You can review the new samples (purple star) versus previously registered samples (grey star) by filtering the table on the star column.

### Step 3: Register Samples

![Register Samples](/assets/submission/register-3-register.png)

1. Once you have reviewed the file preview, click on the **Register Samples** button to submit your samples.

Once registered, donors, along with specimen and sample counts, will be visible on your [Program Dashboard](submitted-data).

## Correcting Already Registered Data

Once samples are registered and data processing and analysis proceeds, it can be difficult to correct the data manually.

If you have made an error with registered sample data, please [contact the DCC](https://platform.icgc-argo.org/contact) and they will assist in correcting your registered data.
If you have made an error with registered sample data, please [contact the DCC](https://platform.icgc-argo.org/contact) and they will assist in correcting your registered data.
6 changes: 3 additions & 3 deletions docs/submission/submission-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@ Participating programs submit clinical data through the **[ARGO Data Platform](h

## Data Dictionary

To support the gathering of high-quality data, a clinical data dictionary has been developed that performs rigorous validation on submitted data at the time of submission. Clinical data is defined in logical groupings, which will be submitted as TSV files in the ARGO Program Services section.
To support the gathering of high-quality data, a clinical data dictionary has been defines the ARGO data model, as well as some data validations that are performed at the time of submission. Clinical data is defined in logical groupings, which will be submitted as TSV files in the ARGO Program Services section.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
rosibaj marked this conversation as resolved.
Show resolved Hide resolved

Explore the details of the ARGO clinical dataset in the **[ARGO Data Dictionary Viewer](/dictionary).**

## Data Submission and Release Process

The ARGO Data Platform has been optimized to ensure that clinical and molecular data upload is intuitive and efficient for data submitters. ARGO data submission and release happens in 4 major steps:

1. **Register Molecular Samples:** [Registering samples](registering-samples) within the Program Services section in the ARGO Data Platform initializes the ARGO primary identifiers that will be assigned to submitted data entities. Registering samples with specimen and donor identifiers upfront maintains data integrity across the data submission and processing pipelines.
1. **Register Molecular Samples:** [Registering samples](registering-samples) within the Program Services section in the ARGO Data Platform initializes the ARGO primary identifiers that will be assigned to submitted data entities. Registering samples and associated specimen and donor identifiers upfront maintains data integrity across the data submission and processing pipelines.
1. **Submit Donor Clinical and Molecular Data:** Clinical Data Submitters can use the Program Services section in the ARGO Data Platform to submit clinical data. [Submitting clinical data](submitting-clinical-data) is easy and intuitive, facilitated by a guided clinical submission interface. In parallel, Molecular Data Submitters can begin submitting molecular data. [Submitting molecular data](submitting-molecular-data), using the Song and Score clients, is fast and secure.
1. **Molecular Data Processing:** Once raw molecular data has been submitted, [analytic workflows](../analysis-workflows/dna-pipeline) will be automatically kicked off for uniform analysis of all donor samples.
1. **Molecular Data Processing:** Once raw molecular data has been submitted, [analytic workflows](../analysis-workflows/analysis-overview) will be automatically kicked off for uniform analysis of all donor samples.
rosibaj marked this conversation as resolved.
Show resolved Hide resolved
1. **Data QC and Release:** Once **both** core clinical data and raw molecular data have been submitted, and analysis is complete for a donor, the donor data will be released to the program for quality control and the [Data Release](../release-notes/data-releases) process will begin. After an embargo period, donor data will be publically available on the ARGO Data Platform.

![ARGO Submission Process](/assets/submission/ARGO-submission-process.svg)
Expand Down
2 changes: 1 addition & 1 deletion docs/submission/submitting-clinical-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ There is only one Clinical Submission workspace for each program. All program da

## How to Submit Clinical Data

### Step 1: Download and Format Clinical Files
### Step 1: Download Templates and Format Clinical Files

![Download and Format Files](/assets/submission/clinical-1-dowload-templates.png)

Expand Down
Loading