Documentation walkthrough: feedback part 1 #249

junjun-zhang · 2020-05-28T21:36:00Z

Page: https://content--argo-docs.netlify.app/docs/submission/submission-overview

To support the gathering of high-quality data, a clinical data dictionary has been developed that performs rigorous validation on submitted data at the time of submission.

Please consider revise this sentence. Dictionary is to define data model, referential integrity and other rules

Registering samples with specimen and donor identifiers upfront maintains data integrity

Consider: Registering samples and associated specimen and donor with identifiers upfront

Once raw molecular data has been submitted, analytic workflows will be automatically kicked off for uniform analysis of all donor samples

should the hyperLink for analytic workflows be: docs/analysis-workflows/analysis-overview

Page: https://content--argo-docs.netlify.app/docs/submission/dictionary-overview

Probably don't need much text here. Adding a link to the actual Data Dictionary page (/dictionary) would be helpful.

Page: https://content--argo-docs.netlify.app/docs/submission/registering-samples

In the ARGO Data Platform, clinical and molecular data objects are assigned ARGO Identifiers (ARGO IDs) used to track the data through the Platform.

consider rewrite this sentence.

Each Donor, Specimen, and Sample entity will be assigned an ARGO ID that maps to your program's internal identifier.

... that maps to your program's internal identifier (also referred as submitter ID).

Any attempts to submit data that does not refer to a registered donor or sample will result in an error

... not refer to a registered donor, specimen or sample will result in an error

Please communicate with your team if you see a sample registration in progress.

What does this entail?
Also, how will the program team members know what entities have been registered in addition to an in-progress registration (I imagine the latter would be brief in existence)?

Only TSV file types are supported

Only TSV file type is supported

Page: https://content--argo-docs.netlify.app/docs/submission/submitting-clinical-data

Step 1: Download and Format Clinical Files

Consider: Download and Format Clinical Files to Download Templates and Format Clinical Files

If you have made any updates to already submitted data

It's not explicitly mentioned about deletion of clinical data, maybe another sentence could be added to inform submitters to contact DCC if they'd like to delete any submitted clinical data. I imagine there would be such use cases although may not be at the beginning.

Page: https://content--argo-docs.netlify.app/docs/submission/submitting-molecular-data

Molecular data will be submitted to your local Regional Data Processing Centre (RDPC).

This guide will describe how to submit molecular data to the ARGO Data Platform.

It could be confusing to read the above two sentences. What's the relationship between RDPC and Data Platform? It would be helpful to have some sort of diagram illustrating how they are related.
RB - removed extraneous information for clariity.

Score securely and quickly manages upload and download of files to cloud repositories.

consider: ... files to cloud repositories managed by RDPCs.

Once you have unzipped the tarball, update the /conf/application.yaml configuration file

change /conf/application.yaml to conf/application.yaml
The leading '/' means root directory, obviously 'conf' is not directly under root.

update the /conf/application.properties

change /conf/application.properties to conf/application.properties

Song accepts data in JSON format, which is validated against

Consider: Song accepts metadata (also referred as Song payload) in JSON format

**experimental_strategy: Descriptor of the read domain experiment method

Not sure what this means. Maybe: Descriptor of primary experimental method. For sequencing data it refers to how sequencing library was made. Permissible values: WGS, WXS, RNA-Seq, Bisulfite-Seq etc.

**platform_unit:

According to SAM format specification: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier. (not really a good definition)

Consider: **platform_unit: a sequencing unit in a sequencing machine, typically a lane in a flowcell. Each lane has a unique identifier. Platform unit and read group have a one-to-one relationship.

**file_r1: Name of the read group file to be submitted.

Consider: Name of the sequencing file containing reads from the first end of a sequencing run.

file_r2: Name of the read group file to be submitted.

Consider: Name of the sequencing file containing reads from the second end of a paired end sequencing run.

read_length_r1: Average length of reads in file_r1.

Consider: length of sequencing reads in file_r1, this corresponds to the number of sequencing cycles of the first end.

note: all reads for the same end have the same length

read_length_r2: Average length of reads in file_r2.

Similar to read_lenght_r1

insert_size:

Consider: for paired end sequencing the average size of sequences between two sequencing ends, required for paired end sequencing.

sample_barcode:

Consider: This value is the expected barcode bases as read by the sequencing machine in the absence of errors.
Note: according to SAM specification

**library_name:

Consider: name of a sequencing library made from a molecular sample or a sample pool (multiplex sequencing).

exception of - , . , and _

exception of - , . , and _

If multiple read groups were sequenced, then multiple files should be listed as objects in the payload.

Consider removing this sentence. One single BAM file can contain multiple read groups, submitting one BAM is valid and likely a typical scenario for many submitters.

Compression of FASTQ files is encouraged

Compression of FASTQ files is required.

**dataType: Set to submitted_reads

submitted_reads => Submitted Reads

"dataType": "submitted_reads"

"dataType": "Submitted Reads"

Wrote manifest file 'tumor_manifest.txt' for analysisId 'a4142a01-1274-45b4-942a-01127465b422'

change tumor_manifest.txt to manifest.txt

Once your sequencing_experiment analysis has been successfully submitted

Consider: Once your sequencing_experiment analysis has been successfully submitted and published.

The text was updated successfully, but these errors were encountered:

rosibaj · 2020-05-29T01:36:06Z

Moved Image changes to this ticket:
#252

rosibaj · 2020-05-29T02:20:53Z

It's not explicitly mentioned about deletion of clinical data, maybe another sentence could be added to inform submitters to contact DCC if they'd like to delete any submitted clinical data. I imagine there would be such use cases although may not be at the beginning.

Is this something that we want to encourage/write about or should be handle that as one-off scenarios?

rosibaj · 2020-05-29T02:31:42Z

exception of - , . , and _

@junjun-zhang not sure what you are saying here?

rosibaj · 2020-05-29T02:35:35Z

#253

Moved note about seq-tools to its own ticket

junjun-zhang · 2020-05-29T13:19:51Z

exception of - , . , and \_

there is a backslash infornt of _ needs to be removed

rosibaj added this to the Donkey Kong - Sprint 27 [PRODUCTION KANBAN] milestone May 29, 2020

rosibaj closed this as completed Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation walkthrough: feedback part 1 #249

Documentation walkthrough: feedback part 1 #249

junjun-zhang commented May 28, 2020 •

edited by rosibaj

Loading

rosibaj commented May 29, 2020

rosibaj commented May 29, 2020

rosibaj commented May 29, 2020

rosibaj commented May 29, 2020

junjun-zhang commented May 29, 2020

Documentation walkthrough: feedback part 1 #249

Documentation walkthrough: feedback part 1 #249

Comments

junjun-zhang commented May 28, 2020 • edited by rosibaj Loading

rosibaj commented May 29, 2020

rosibaj commented May 29, 2020

rosibaj commented May 29, 2020

rosibaj commented May 29, 2020

junjun-zhang commented May 29, 2020

junjun-zhang commented May 28, 2020 •

edited by rosibaj

Loading