Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation walkthrough: feedback part 1 #249

Closed
32 of 35 tasks
junjun-zhang opened this issue May 28, 2020 · 5 comments
Closed
32 of 35 tasks

Documentation walkthrough: feedback part 1 #249

junjun-zhang opened this issue May 28, 2020 · 5 comments

Comments

@junjun-zhang
Copy link

junjun-zhang commented May 28, 2020

Page: https://content--argo-docs.netlify.app/docs/submission/submission-overview

To support the gathering of high-quality data, a clinical data dictionary has been developed that performs rigorous validation on submitted data at the time of submission.

  • Please consider revise this sentence. Dictionary is to define data model, referential integrity and other rules

Registering samples with specimen and donor identifiers upfront maintains data integrity

  • Consider: Registering samples and associated specimen and donor with identifiers upfront

Once raw molecular data has been submitted, analytic workflows will be automatically kicked off for uniform analysis of all donor samples

  • should the hyperLink for analytic workflows be: docs/analysis-workflows/analysis-overview

Page: https://content--argo-docs.netlify.app/docs/submission/dictionary-overview

  • Probably don't need much text here. Adding a link to the actual Data Dictionary page (/dictionary) would be helpful.

Page: https://content--argo-docs.netlify.app/docs/submission/registering-samples

In the ARGO Data Platform, clinical and molecular data objects are assigned ARGO Identifiers (ARGO IDs) used to track the data through the Platform.

  • consider rewrite this sentence.

Each Donor, Specimen, and Sample entity will be assigned an ARGO ID that maps to your program's internal identifier.

  • ... that maps to your program's internal identifier (also referred as submitter ID).

Any attempts to submit data that does not refer to a registered donor or sample will result in an error

  • ... not refer to a registered donor, specimen or sample will result in an error

Please communicate with your team if you see a sample registration in progress.

  • What does this entail?

  • Also, how will the program team members know what entities have been registered in addition to an in-progress registration (I imagine the latter would be brief in existence)?

Only TSV file types are supported

  • Only TSV file type is supported

Page: https://content--argo-docs.netlify.app/docs/submission/submitting-clinical-data

Step 1: Download and Format Clinical Files

  • Consider: Download and Format Clinical Files to Download Templates and Format Clinical Files

If you have made any updates to already submitted data

  • It's not explicitly mentioned about deletion of clinical data, maybe another sentence could be added to inform submitters to contact DCC if they'd like to delete any submitted clinical data. I imagine there would be such use cases although may not be at the beginning.

Page: https://content--argo-docs.netlify.app/docs/submission/submitting-molecular-data

Molecular data will be submitted to your local Regional Data Processing Centre (RDPC). 

This guide will describe how to submit molecular data to the ARGO Data Platform.

  • It could be confusing to read the above two sentences. What's the relationship between RDPC and Data Platform? It would be helpful to have some sort of diagram illustrating how they are related.
    RB - removed extraneous information for clariity.

Score securely and quickly manages upload and download of files to cloud repositories.

  • consider: ... files to cloud repositories managed by RDPCs.

Once you have unzipped the tarball, update the /conf/application.yaml configuration file

  • change /conf/application.yaml  to conf/application.yaml

  • The leading '/' means root directory, obviously 'conf' is not directly under root.

update the /conf/application.properties

  • change /conf/application.properties to conf/application.properties

Song accepts data in JSON format, which is validated against

  • Consider: Song accepts metadata (also referred as Song payload) in JSON format

**experimental_strategy: Descriptor of the read domain experiment method

  • Not sure what this means. Maybe: Descriptor of primary experimental method. For sequencing data it refers to how sequencing library was made. Permissible values: WGS, WXS, RNA-Seq, Bisulfite-Seq etc.

**platform_unit:

According to SAM format specification: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier. (not really a good definition)

  • Consider: **platform_unit: a sequencing unit in a sequencing machine, typically a lane in a flowcell. Each lane has a unique identifier. Platform unit and read group have a one-to-one relationship.

**file_r1: Name of the read group file to be submitted.

  • Consider: Name of the sequencing file containing reads from the first end of a sequencing run.

file_r2: Name of the read group file to be submitted.

  • Consider: Name of the sequencing file containing reads from the second end of a paired end sequencing run.

read_length_r1: Average length of reads in file_r1.

  • Consider: length of sequencing reads in file_r1, this corresponds to the number of sequencing cycles of the first end.

note: all reads for the same end have the same length

read_length_r2: Average length of reads in file_r2.

  • Similar to read_lenght_r1

insert_size:

  • Consider: for paired end sequencing the average size of sequences between two sequencing ends, required for paired end sequencing.

sample_barcode:

  • Consider: This value is the expected barcode bases as read by the sequencing machine in the absence of errors.

  • Note: according to SAM specification

**library_name:

  • Consider: name of a sequencing library made from a molecular sample or a sample pool (multiplex sequencing).

exception of - , . , and _ 

  • exception of - , . , and _ 

If multiple read groups were sequenced, then multiple files should be listed as objects in the payload.

  • Consider removing this sentence. One single BAM file can contain multiple read groups, submitting one BAM is valid and likely a typical scenario for many submitters.

Compression of FASTQ files is encouraged

  • Compression of FASTQ files is required.

**dataType: Set to submitted_reads

  • submitted_reads => Submitted Reads

"dataType": "submitted_reads"

  • "dataType": "Submitted Reads"

Wrote manifest file 'tumor_manifest.txt' for analysisId 'a4142a01-1274-45b4-942a-01127465b422'

  • change tumor_manifest.txt to manifest.txt

Once your sequencing_experiment analysis has been successfully submitted

  • Consider: Once your sequencing_experiment analysis has been successfully submitted and published.
@rosibaj
Copy link
Contributor

rosibaj commented May 29, 2020

Moved Image changes to this ticket:
#252

@rosibaj
Copy link
Contributor

rosibaj commented May 29, 2020

It's not explicitly mentioned about deletion of clinical data, maybe another sentence could be added to inform submitters to contact DCC if they'd like to delete any submitted clinical data. I imagine there would be such use cases although may not be at the beginning.

Is this something that we want to encourage/write about or should be handle that as one-off scenarios?

@rosibaj
Copy link
Contributor

rosibaj commented May 29, 2020

exception of - , . , and _

exception of - , . , and _

@junjun-zhang not sure what you are saying here?

@rosibaj
Copy link
Contributor

rosibaj commented May 29, 2020

#253

Moved note about seq-tools to its own ticket

@junjun-zhang
Copy link
Author

exception of - , . , and \_

there is a backslash infornt of _ needs to be removed

@rosibaj rosibaj closed this as completed Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants