Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/fhdsl/AnVIL_BioDIGS_Book in…
Browse files Browse the repository at this point in the history
…to main
  • Loading branch information
jhudsl-robot committed Jan 29, 2024
2 parents 267ea77 + 225e5e0 commit b81dca5
Show file tree
Hide file tree
Showing 169 changed files with 4,056 additions and 6,426 deletions.
66 changes: 66 additions & 0 deletions docs/no_toc/01-BioDIGS_project_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@

# (PART\*) BioDIGS Overview {-}

# Background

One critical aspect of an undergraduate STEM education is hands-on research. Undergraduate research experiences enhance what students learn in the classroom as well as increase a student's interest in pursuing STEM careers ^1. It can also lead to improved scientific reasoning and increased academic performance overall ^2. However, many students at underresourced institutions like community colleges, Historically Black Colleges and Universities (HBCUs), tribal colleges and universities, and Hispanic-serving institutions have limited access to research opportunities compared to their cohorts at larger four-year colleges and R1 institutions. These students are also more likely to belong to groups that are already under-represented in STEM disciplines, particularly genomics and data science ^3 ^4.

The BioDIGS Project aims to be at the intersection of genomics, data science, cloud computing, and education.


## What is genomics?

Genomics broadly refers to the study of genomes, which are an organism's complete set of DNA. This includes both genes and non-coding regions of DNA. Traditional genomics involves sequencing and analyzing the genome of individual species.

Metagenomics expands genomics to look at the collective genomes of entire communities of organisms in an environmental sample, like soil. It allows researchers to study not just the genes of culturable or isolated organisms, but the entirety of genetic material present in a given environment. By using genomic techniques to survey the soil microbes, we can identify everything in the soil, including microbes that no one has identified before.

We are doing both traditional genomics and metagenomics as part of BioDIGS.

## What is data science?

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. It includes collecting, cleaning, and combining data from multiple databases, exploring data and developing statistical and machine learning models to identify patterns in complex datasets, and creating tools to efficiently store, process, and access large amounts of data.

## What is cloud computing?

Cloud computing just means using the internet to get access to powerful computer resources like storage, servers, databases, networking tools, and specialized software programs. Instead of having to buy and maintain their own powerful computers, storage servers, and other systems, users can pay to use them through an internet connection as needed. Users only pay for what they need, when they actually use it, and professionals update and maintain the systems in large data centers. It is a particularly useful tool for researchers and students at smaller institutions with limited computational services, especially when working with complex databases.

The genome assembly and analyses for BioDIGS have been done using the NHGRI [AnVIL](https://anvilproject.org/) cloud computing platform, as well as [Galaxy](usegalaxy.org).

## Why soil microbes?

It can be challenging to include undergraduates in human genomic and health research, especially in a classroom context. Both human genetic data and human health data are protected data, which limits the sort of information students can access without undergoing specialized ethics training. However, the same sorts of data cleaning and analysis methods used for human genomic data are also used for microbial genomic data, which does not have the same sort of legal protections as human genetic data. This makes it ideal for training undergraduate students at the beginning of their careers and can be used to prepare students for future research in human genomics and health ^5. Additionally, the microbes in the soil can have big impacts on our health ^6.

## Heavy metals and human health

Human activities that change the landscape can also change what sorts of inorganic and abiotic compounds we find in the soil, particularly increasing the amount of heavy metals ^7. When cars drive on roads, compounds from the exhaust, oil, and other fluids might settle onto the roads and be washed into the soil. When we put salt on roads, parking lots, and sidewalks, the salts themselves will eventually be washed away and enter the ecosystem through both water and soil. Chemicals from factories and other businesses also leech into our environment. Previous research has demonstrated that in areas with more human activity, like cities, soils include greater concentrations of heavy metals than found in rural areas with limited human populations ^8 ^9. Increased heavy metal concentrations also disproportionately affect lower-income and predominantly minority areas ^10.

Research suggests that increased heavy metal concentration in soils has major impacts on the soil microbial community. In particular, increased heavy metal concentration is associated with an increase in soil bacteria that have antibiotic resistance markers ^11 ^12 ^13.

## References

1: Russell et al. 2007: [https://doi.org/10.1126/science.1140384](https://doi.org/10.1126/science.1140384)

2: Buffalari et al. 2020: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040836/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040836/)

3: Canner et al. 2017: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5398168/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5398168/)

4: GDSCN 2022: [https://doi.org/10.1101/gr.276496.121](https://doi.org/10.1101/gr.276496.121)

5: Jurkowski et al. 2007: [https://doi.org/10.1187/cbe.07-09-0075](https://doi.org/10.1187/cbe.07-09-0075)

6: Brevik and Burgess 2004: [https://www.nature.com/scitable/knowledge/library/the-influence-of-soils-on-human-health-127878980/](https://www.nature.com/scitable/knowledge/library/the-influence-of-soils-on-human-health-127878980/)

7: Yan et al. 2020: [https://doi.org/10.1016/j.scitotenv.2019.136116](https://doi.org/10.1016/j.scitotenv.2019.136116)

8: Khan et al. 2023: [https://pubmed.ncbi.nlm.nih.gov/36907936/](https://pubmed.ncbi.nlm.nih.gov/36907936/)

9: Wang et al. 2022: [https://pubmed.ncbi.nlm.nih.gov/35240153/](https://pubmed.ncbi.nlm.nih.gov/35240153/)

10: Jones et al. 2022: {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8834334/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8834334/)

11: Gorovtsov et al. 2018: [https://doi.org/10.1007/s11356-018-1465-9](https://doi.org/10.1007/s11356-018-1465-9)

12: Nguyen et al. 2019: [https://doi.org/10.1007/s11783-019-1129-0](https://doi.org/10.1007/s11783-019-1129-0)

13: Sun et al. 2021: [https://doi.org/10.1016/j.jenvman.2021.113754](https://doi.org/10.1016/j.jenvman.2021.113754)

36 changes: 36 additions & 0 deletions docs/no_toc/02-research_team.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Research Team

This project is coordinated by the Genomics Data Science Community Network (GDSCN). You can read more about the GDSCN and its mission at the network [website](https://www.gdscn.org/home).

## Soil sampling

Soil sampling for this project was done by both faculty and student volunteers from schools that aren't traditional R1 research institutions. Many of the faculty are also members of the GDSCN.

- **Annandale, VA**: Northern Virginia Community College
- **Atlanta, GA**: Spelman College
- **Baltimore, MD**: College of Southern Maryland, Notre Dame College of Maryland, Towson University
- **Bismark, ND**: United Tribes Technical College
- **El Paso, TX**: El Paso Community College, The University of Texas at El Paso
- **Fresno, CA**: Clovis Community College
- **Greensboro, NC**: North Carolina A&T State University
- **Harrisonburg, VA**: James Madison University
- **Honolulu, Hawai'i**: University of Hawai'i at Mānoa
- **Las Cruces, NM**: Doña Ana Community College
- **Montgomery County, MD**: Montgomery College, Towson University
- **Nashville, TN**: Meharry Medical College
- **New York, NY**: Guttman Community COllege CUNY
- **Petersburg, VA**: Virginia State University
- **Seattle, WA**: North Seattle College, Pierce College
- **Tsaile, AZ**: Diné College

## Funding

Funding for this project has been provided by the [National Human Genome Research Institute](https://www.genome.gov/) (Contract # 75N92022P00232 awarded to Johns Hopkins University), as well as by donations from [PacBio](https://www.pacb.com/) and [CosmosID](https://www.cosmosid.com/).

[Advances in Genome Biology and Technology](https://www.agbt.org/) provided funding support for several team members to attend AGBT 2024.

## Analytical and Computational Support

Computational support has been provided by NHGRI's [AnVIL](https://anvilproject.org/) cloud computing platform and [Galaxy](usegalaxy.org).


23 changes: 23 additions & 0 deletions docs/no_toc/03-data_tour.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Data

There are currently three major kinds of data available from BioDIGS: sample metadata, soil testing data, and genomics and metagenomics data. All of these are available for use in your classroom.

## Sample Metadata

This dataset contains information about the samples themselves, including GPS coordinates for the sample location, date the sample was taken, and the site name. This dataset is also available from the [BioDIGS website](https://biodigs.org/#site_data)

You can also see images of each sampling site and soil characteristics at the [sample map](https://biodigs.org/#sample_map).

## Soil Testing Data

This dataset includes basic information about the soil itself like pH, percentage of organic matter, variety of soil metal concentrations. The complete data dictionary is available [here](https://docs.google.com/spreadsheets/d/109xYUM48rjj33B76hZ3bNlrm8u-_S6uyoE_3wSCp0r0/edit#gid=188448677). The dataset is available at the [BioDIGS website](https://biodigs.org/#soil_data).

This dataset was generated by the [Delaware Soil Testing Program](https://www.udel.edu/canr/cooperative-extension/environmental-stewardship/soil-testing/) at the University of Delaware.

## Genomics and Metagenomics Data

You can access this data in both raw and processed forms.

The Illumina and Nanopore sequences were generated at the [Johns Hopkins University Genetic Resources Core Facility](https://grcf.jhmi.edu/). PacBio sequencing was done by [PacBio](https://www.pacb.com/) directly.

More information coming soon!
Loading

0 comments on commit b81dca5

Please sign in to comment.