-
Notifications
You must be signed in to change notification settings - Fork 22
Beginner's Guide to Data Import
This guide describes how to import instrument data into GMS using the downsampled TST1 dataset (TST1ds) as an example. You can follow these steps to import TST1ds yourself. Before continuing you must have already run the ./setup/prime-system.pl
command as documented in the installation instructions.
Instead of completing the following steps by hand, you may choose to run the already scripted version of these steps here: ./downsampled-demo-data/example-import-data.sh
. Or you may choose to follow the instructions in the rest of this guide, which describes the commands executed by example-import-data.sh
. Even if you choose not to execute example-import-data.sh
as-is, it provides a complete example of importing data into GMS.
In order to import an instrument data file, there must exist a record in GMS of the library which was sequenced, the sample from which the library was derived, and the individual from which the sample was collected. So, in order to import a set of instrument data files, an individual, one or more samples, and one or more libraries must first exist or be created in GMS.
genome individual create \
--name="H_NJ-HCC1395ds" \
--upn='HCC1395ds' \
--common-name="TST1ds" \
--gender=female \
--taxon="name=human"
The value of the --name
argument is arbitrary, but it should be something representative of the individual because you'll reference this name later when adding samples for this individual to GMS.
genome sample create \
--name='H_NJ-HCC1395-HCC1395_BL_RNAds' \
--source="name=H_NJ-HCC1395ds" \
--common-name='normal' \
--tissue-desc='b lymphoblast' \
--extraction-type='rna' \
--extraction-label='HCC1395 BL_RNA' \
--cell-type='primary'
The --source
parameter references the individual created in the previous step. Notice how the value of the --source
parameter here matches the --name
parameter given to genome individual create
.
The argument to the --extraction-type
parameter can be either rna
or genomic dna
.
Repeat the genome sample create
command for each sample in your dataset. The TST1ds dataset has four samples.
genome library create \
--name="Pooled_RNA_2891006726-mR1-cD1-lg1-lib1ds" \
--sample="H_NJ-HCC1395-HCC1395_BL_RNAds" \
--protocol='Illumina Library Construction' \
--original-insert-size='364' \
--library-insert-size='483' \
--transcript-strand='unstranded'
The genome library create
command creates a record of a library in GMS. The library links to the sample from which it was created, so it is important that the argument to --sample
here matches the argument to --name
given to the genome sample create
command above, to indicate which sample record in GMS this library record should link to.
The --transcript-strand
parameter is specified because the extraction-type of this library's sample is rna. The --transcript-strand
parameter accepts one of three values as its argument. Possible values are 'unstranded', 'firststrand', or 'secondstrand'. This parameter should not be used for samples with extraction type genomic dna
.
If your library is a capture library, create it just as you would a library for genomic dna. When you import the instrument data, you will have the opportunity to specify a capture set during import.
genome instrument-data import basic \
--import-source-name='GMS' \
--instrument-data-properties='read_length=100,clusters=170049877' \
--source-files="gerald_C2DBEACXX_3.bam" \
--library="Pooled_RNA_2891006726-mR1-cD1-lg1-lib1ds"
The genome instrument-data import basic
command imports sequence reads. This step depends on already having a library created, as in the previous step, and it requires the argument to --library
match the argument to --name
given to genome library create
. This step imports reads from a bam file, which in this example, exists in a file named gerald_C2DBEACXX_3.bam
in the current working directory.
To make use of the instrument data, genome models must be defined which use the instrument data, and builds must be executed for those models. Once data has been imported, the genome clin-seq advise --allow-imported --individual='TST1ds'
command can be used to guide you through the process of defining models and running builds on those models.
Home | Install | Tutorials | FAQ |
---|