Skip to content

Commit

Permalink
docs, single end
Browse files Browse the repository at this point in the history
  • Loading branch information
sanjaynagi committed Dec 1, 2022
1 parent b89f335 commit b7e0d32
Showing 1 changed file with 7 additions and 9 deletions.
16 changes: 7 additions & 9 deletions docs/rna-seq-pop-book/notebooks/input_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,21 @@
"\n",
"In the `config.yaml`, we will use the treatment column to specify our comparative groups for analysis.\n",
"\n",
"\n",
"If the strain information is not relevant to your study organism, please use the same values as for species. The strain column is used to define smaller groups within the data for principal components analysis (PCA), and is useful when analysing datasets with multiple strains.\n",
"\n",
"--- \n",
"\n",
"\n",
"**Paired-end RNA-Sequencing fastq reads**\n",
"**Single or Paired-end RNA-Sequencing fastq reads**\n",
"\n",
"Two gzipped fastq files for each sample are required, corresponding to the forward and reverse read. Reads can be already trimmed or RNA-Seq-Pop can trim them, using the cutadapt module.\n",
"One or two gzipped fastq files for each sample are required, depending on whether the user is using single-end or paired-end fastq files. Reads can be already trimmed or RNA-Seq-Pop can trim them, using the cutadapt module.\n",
"\n",
"The read location may be specified in two ways: \n",
"\n",
"1. Reads can be named as `{sampleID}_1.fastq.gz , {sampleID}_2.fastq.gz` and stored in `resources/reads/`. In the config.yaml, `fastq['auto'] == True`.\n",
"1. Reads can be named as `{sampleID}_1.fastq.gz`, `{sampleID}_2.fastq.gz` and stored in `resources/reads/`. In the config.yaml, `fastq['auto'] == True`, meaning snakemake will look for files in this folder which follow this naming pattern. For single-end reads, only the first `_1.fastq.gz` file is required.\n",
"\n",
" \n",
"2. The user can add \"fq1\" and \"fq2\" columns to the `samples.tsv` metadata file, containing the path to each fastq file from the root rna-seq-pop directory. This means the fastq files can be stored anywhere accessible and have arbitrary naming. In the config.yaml, this option is `fastq['auto'] == False`.\n",
"2. The user can add \"fq1\" and \"fq2\" columns to the `samples.tsv` metadata file, containing the path to each fastq file from the root rna-seq-pop directory. This allows the fastq files to be stored anywhere that is accessible and have arbitrary naming. In the config.yaml, this option is `fastq['auto'] == False`. For single-end reads, only the \"fq1\" column is required.\n",
"\n",
"---\n",
"\n",
Expand All @@ -57,11 +56,10 @@
"\n",
"The user provides the path to the reference files in the configuration file (`config.yaml`).\n",
"\n",
"1. **Genome chromosomes reference file (.fa/.fa.gz)**. Contains the DNA sequence for the genome.\n",
"2. **Transcriptome reference file (.fa/fa.gz)**. Contains the DNA sequence for each transcript.\n",
"1. **Genome chromosomes reference file (.fa/.fa.gz)**. Contains the DNA sequence for the genome in fasta format. \n",
"2. **Transcriptome reference file (.fa/fa.gz)**. Contains the DNA sequence for each transcript in fasta format.\n",
"3. **Genome feature file (.gff3 format)**. \n",
"4. **Genes to Transcript mapping file (.tsv)**. An example is provided in the github repo (`resources/exampleGene2TranscriptMap.tsv`). This should contain four columns, GeneID, TranscriptID, GeneName, and GeneDescription, and is necessary for connecting transcripts to their parent genes, as well as adding gene annotations to results.\n",
" \n",
"4. **Genes to Transcript mapping file (.tsv)**. An example is provided in the github repo (`resources/exampleGene2TranscriptMap.tsv`). This should contain four columns, GeneID, TranscriptID, GeneName, and GeneDescription, and is necessary for connecting transcripts to their parent genes, as well as adding gene annotations to results. Files for *Anopheles gambiae, funestus* and *Aedes aegypti* are provided in the github repo.\n",
"5. **SnpEff database name** (if performing variant calling).\n",
"\n",
"---\n",
Expand Down

0 comments on commit b7e0d32

Please sign in to comment.