generated from jhudsl/OTTR_Template
-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy path05-general-data-analysis-tools.Rmd
77 lines (47 loc) · 4.76 KB
/
05-general-data-analysis-tools.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
```{r, include = FALSE}
ottrpal::set_knitr_image_path()
```
# General Data Analysis Tools
## Learning Objectives
```{r, fig.alt = "This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python", out.width = "100%", echo = FALSE}
ottrpal::include_slide("https://docs.google.com/presentation/d/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY/edit#slide=id.g20fbd76736e_0_0")
```
## Command Line vs GUI
When using computers there are two different ways you can tell a computer program what you want it to do. You can use a a Graphics User Interface (abbreviated as GUI) where you point and click buttons or you can use a Command Line Interface where you type in commands and write scripts that tell the program what you want it to do.
Command Line Interfaces require a bit more time to learn and get used to, but they are generally easier to make more reproducible, because every step that you are using an analysis can be written in a script. Graphics User Interfaces can be more intuitive to use more quickly, but they can be difficult to repeat the analysis in the exact same way. If you know you will be doing the same analysis many times (either with different or the same samples), it is a good use of your time to make sure that you learn how to use Command Line tools. We will discuss some of the most commonly used Command line tools here.
### Bash
Bash is a command language used by a lot of computers and programs. Many of the same items that you might do every day on your computer by clicking on various items on your desktop and menus, you can also perform using bash.
On a Mac computer, you can use bash commands by finding your `Terminal` window. Go to your search bar and search for the `Terminal`. You may want to keep this application handy.
In Windows, you can use bash commands by search for `Command Prompt` application. Go to your search bar and search for `Command Prompt`. You may want to keep this application handy.
### R
R is a program commonly used for statistics and data analysis. It's free and has lots of R packages built for genomics analysis purposes. Many of these packages have been highlighted in this course or otherwise listed in our [tool glossary](http://hutchdatascience.org/Choosing_Genomics_Tools/genomic-tool-glossary.html).
#### Resources for learning R
##### R and Tidyverse
+ [Swirl, an interactive tutorial](https://swirlstats.com/)
+ [R for Data Science](https://r4ds.had.co.nz/)
+ [Tidyverse skills for Data Science](http://jhudatascience.org/tidyversecourse/) by Carrie Wright.
+ [Handy R cheatsheets](https://www.rstudio.com/resources/cheatsheets/)
+ [R Cookbook Second Edition](https://rc2e.com/)
+ [Advanced R](https://adv-r.hadley.nz/)
+ [R for Epidemiology](https://www.r4epi.com/) - has generally good R advice
+ [O'Reilly books](https://www.spl.org/books-and-media/books-and-ebooks/safari-books-online) available through Seattle Public Library
##### R notebooks
+ [R Markdown](http://rmarkdown.rstudio.com)
+ [Tutorial on R, RStudio and R Markdown](https://ismayc.github.io/rbasics-book/)
+ [Handy R cheatsheets](https://www.rstudio.com/resources/cheatsheets/)
+ [R Notebooks tutorial](https://bookdown.org/yihui/rmarkdown/)
##### R and Genomics
+ [Intro to R and Tidyverse course and exercises](https://github.com/AlexsLemonade/training-modules/tree/master/intro-to-R-tidyverse) from the Childhood Cancer Data Lab.
+ [Refine.bio examples](https://alexslemonade.github.io/refinebio-examples/index.html) from the Childhood Cancer Data Lab.
+ [Biostar Handbook: A Beginner's Guide to Bioinformatics](https://www.biostarhandbook.com)
### Python
Python is a program that also is used for data analysis among many other items. It can be a very powerful development tool. Some of the packages that have been highlighted in this course or otherwise are listed in our [tool glossary](http://hutchdatascience.org/Choosing_Genomics_Tools/genomic-tool-glossary.html).
#### Resources for learning python
- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
- [Python for Biologists](https://www.pythonforbiologists.org/)
## More resources
- [A longer list of tools and resources can be found here](https://hutchdatascience.org/code_review/more_resources.html)
- [DataTrail curriculum](https://datatrail-jhu.github.io/DataTrail/index.html)
- [Introduction to Reproducibility](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/introduction.html)
- [Advanced Reproducibility in Cancer Informatics](https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/introduction.html)
- [Computing in Cancer Informatics](https://jhudatascience.org/Computing_for_Cancer_Informatics/)