You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To support the gathering of high-quality data, a clinical data dictionary has been developed that performs rigorous validation on submitted data at the time of submission.
Please consider revise this sentence. Dictionary is to define data model, referential integrity and other rules
Registering samples with specimen and donor identifiers upfront maintains data integrity
Consider: Registering samples and associated specimen and donor with identifiers upfront
Once raw molecular data has been submitted, analytic workflows will be automatically kicked off for uniform analysis of all donor samples
should the hyperLink for analytic workflows be: docs/analysis-workflows/analysis-overview
In the ARGO Data Platform, clinical and molecular data objects are assigned ARGO Identifiers (ARGO IDs) used to track the data through the Platform.
consider rewrite this sentence.
Each Donor, Specimen, and Sample entity will be assigned an ARGO ID that maps to your program's internal identifier.
... that maps to your program's internal identifier (also referred as submitter ID).
Any attempts to submit data that does not refer to a registered donor or sample will result in an error
... not refer to a registered donor, specimen or sample will result in an error
Please communicate with your team if you see a sample registration in progress.
What does this entail?
Also, how will the program team members know what entities have been registered in addition to an in-progress registration (I imagine the latter would be brief in existence)?
Consider: Download and Format Clinical Files to Download Templates and Format Clinical Files
If you have made any updates to already submitted data
It's not explicitly mentioned about deletion of clinical data, maybe another sentence could be added to inform submitters to contact DCC if they'd like to delete any submitted clinical data. I imagine there would be such use cases although may not be at the beginning.
Molecular data will be submitted to your local Regional Data Processing Centre (RDPC).
This guide will describe how to submit molecular data to the ARGO Data Platform.
It could be confusing to read the above two sentences. What's the relationship between RDPC and Data Platform? It would be helpful to have some sort of diagram illustrating how they are related.
RB - removed extraneous information for clariity.
Score securely and quickly manages upload and download of files to cloud repositories.
consider: ... files to cloud repositories managed by RDPCs.
Once you have unzipped the tarball, update the /conf/application.yaml configuration file
change /conf/application.yaml to conf/application.yaml
The leading '/' means root directory, obviously 'conf' is not directly under root.
update the /conf/application.properties
change /conf/application.properties to conf/application.properties
Song accepts data in JSON format, which is validated against
Consider: Song accepts metadata (also referred as Song payload) in JSON format
**experimental_strategy: Descriptor of the read domain experiment method
Not sure what this means. Maybe: Descriptor of primary experimental method. For sequencing data it refers to how sequencing library was made. Permissible values: WGS, WXS, RNA-Seq, Bisulfite-Seq etc.
**platform_unit:
According to SAM format specification: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier. (not really a good definition)
Consider: **platform_unit: a sequencing unit in a sequencing machine, typically a lane in a flowcell. Each lane has a unique identifier. Platform unit and read group have a one-to-one relationship.
**file_r1: Name of the read group file to be submitted.
Consider: Name of the sequencing file containing reads from the first end of a sequencing run.
file_r2: Name of the read group file to be submitted.
Consider: Name of the sequencing file containing reads from the second end of a paired end sequencing run.
read_length_r1: Average length of reads in file_r1.
Consider: length of sequencing reads in file_r1, this corresponds to the number of sequencing cycles of the first end.
note: all reads for the same end have the same length
read_length_r2: Average length of reads in file_r2.
Similar to read_lenght_r1
insert_size:
Consider: for paired end sequencing the average size of sequences between two sequencing ends, required for paired end sequencing.
sample_barcode:
Consider: This value is the expected barcode bases as read by the sequencing machine in the absence of errors.
Note: according to SAM specification
**library_name:
Consider: name of a sequencing library made from a molecular sample or a sample pool (multiplex sequencing).
exception of - , . , and _
exception of - , . , and _
If multiple read groups were sequenced, then multiple files should be listed as objects in the payload.
Consider removing this sentence. One single BAM file can contain multiple read groups, submitting one BAM is valid and likely a typical scenario for many submitters.
Compression of FASTQ files is encouraged
Compression of FASTQ files is required.
**dataType: Set to submitted_reads
submitted_reads => Submitted Reads
"dataType": "submitted_reads"
"dataType": "Submitted Reads"
Wrote manifest file 'tumor_manifest.txt' for analysisId 'a4142a01-1274-45b4-942a-01127465b422'
change tumor_manifest.txt to manifest.txt
Once your sequencing_experiment analysis has been successfully submitted
Consider: Once your sequencing_experiment analysis has been successfully submitted and published.
The text was updated successfully, but these errors were encountered:
It's not explicitly mentioned about deletion of clinical data, maybe another sentence could be added to inform submitters to contact DCC if they'd like to delete any submitted clinical data. I imagine there would be such use cases although may not be at the beginning.
Is this something that we want to encourage/write about or should be handle that as one-off scenarios?
Page: https://content--argo-docs.netlify.app/docs/submission/submission-overview
Page: https://content--argo-docs.netlify.app/docs/submission/dictionary-overview
/dictionary
) would be helpful.Page: https://content--argo-docs.netlify.app/docs/submission/registering-samples
What does this entail?
Also, how will the program team members know what entities have been registered in addition to an in-progress registration (I imagine the latter would be brief in existence)?
Page: https://content--argo-docs.netlify.app/docs/submission/submitting-clinical-data
Download and Format Clinical Files
toDownload Templates and Format Clinical Files
Page: https://content--argo-docs.netlify.app/docs/submission/submitting-molecular-data
RB - removed extraneous information for clariity.
change
/conf/application.yaml
toconf/application.yaml
The leading '/' means root directory, obviously 'conf' is not directly under root.
/conf/application.properties
toconf/application.properties
Descriptor of primary experimental method. For sequencing data it refers to how sequencing library was made. Permissible values: WGS, WXS, RNA-Seq, Bisulfite-Seq etc.
According to SAM format specification: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier. (not really a good definition)
note: all reads for the same end have the same length
Consider: This value is the expected barcode bases as read by the sequencing machine in the absence of errors.
Note: according to SAM specification
tumor_manifest.txt
tomanifest.txt
The text was updated successfully, but these errors were encountered: