This repository archives the data analysis scripts and input/output files associated with the manuscript of "Zhang Y., Arora-Williams K., Turnham A.E., Preheim S.P., 2022. Influence of artificial aeration on the composition, diversity and potential function of microbial communities in a hypoxic estuarine system. Applied and Environmental Microbiology, in preparation".
The raw data (de-multiplexed .fastq files) associated with this manuscript can be found in NCBI GenBank under BioProject ID PRJNA869137.
- qiime2_and_picrust2 Shell scripts for calling QIIME2 and PICURSt2 to analyze the 16S rRNA gene amplicon sequences. In its sub-directory scripts/: a.) qiime2_merge_sequence_2_runs.csh was used to merge the qiime2 feature tables and representative sequences b.) moving_picture_analysis.csh was used to call QIIME2 function to perform data analysis, including rarefaction, generation of a rooted phylogenetic tree and taxonomy classification, as well as alpha and beta diversity analysis; some of the output files were used as input to the R data analysis files c.) picrust2_unstratified_new_group.csh was used to call picrust2 to infer functional gene abundance. Its output file was used as the input of the unrarefied_asv_table_new_group_manually_saved.R d.) zymo.sh was used to blast the reads of positive control samples against the customized database of 16s rRNA gene sequences of the strains included in the control. Its output was used as the input of the zymo_16s_theoretical_bestblast_results_parsing.R. The configuration files associated with qiime2_merge_sequence_2_runs.csh and moving_picture_analysis.csh were included in the sub-directory config_files/. The input files associated with zymo.sh were placed under sub-directory database/.
- data_analysis_with_R This directory contains the R scripts used to analyze QIIME2 output (.qza files) a.) rarefied_table_115_samples.R R codes used for data analysis and visualization for 115 samples selected to represent three aeration states ('on', 'altered' and 'off'). All the input files of this script can be found in directory R_analysis_input . The output files can be found in directory R_analysis_output (sub-directory 85_samples). b.) unrarefied_asv_table_new_group_manually_saved.R R codes used for data analysis and visualization for 85 samples selected to represent two aeration phases ('on and high' and 'off or low' aeration). This file also contains the codes used to analyze negative control and technical replicate samples. All the input files of this script can be found in directory R_analysis_input. The output files can be found in directory R_analysis_output (sub-directory 15_samples and positive_control). c.) zymo_16s_theoretical_bestblast_results_parsing.R R codes used for parsing the best BLAST results (output of zymo.sh) d.) zymo_compare_blast_to_greengene.R R codes used for comparing the community composition of the positive control samples (zymo community) across: #1) theoretical values provided by zymo #2) values generated by blasting reads against the 16s rRNA gene sequences of the strains included #3) values generated using the same classification procedure as the other samples (Greengenes)
- R_analysis_input Input files of the R scripts under directory data_analysis_with_R
- R_analysis_output Output files of the R scripts under directory data_analysis_with_R
- RockCreek_16S_scripts_archive.Rproj R studio project file