forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro.rmd
164 lines (105 loc) · 10.1 KB
/
intro.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
layout: default
title: Introduction
output: bookdown::html_chapter
---
# Introduction
## Why build a package?
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation and tests, and it's easy to share with others. A package is just a set of conventions. Each convention is an instruction for organising your code, and each chapter of this book describes one convention. For example, in the first two chapters you'll learn to:
* Put your R code in a directory called `R/`.
* Describe what your package needs in order to work in a `DESCRIPTION` file.
If the package will be shared with others, you'll also describe what it does,
who can use it (the license) and who to contact if things go wrong.
In later chapters, you'll learn about:
* Function documentation, which lives in `man/`.
* Vignettes, long-form tutorials, which live in `vignettes/`.
* The `NAMESPACE` file which describes how your package connects to other
packages.
* Testing your code (`tests/`), including C and C++ code (in `src/`),
packaging data (in `data/`), and much much more.
These conventions are helpful because:
* They save you time --- you don't need to think about the best way to organise
a project, you can just follow a template.
* Standardised conventions lead to standardised tools --- if you buy into
R's package conventions, you get many tools for free.
As Hilary Parker puts it in her [introduction to packages](http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/): "Seriously, it doesn't have to be about sharing your code (although that is an added benefit!). It is about saving yourself time."
A package doesn't need to be complicated. You can start with a minimal subset of useful features and slowly build up over time. While there are strict requirements if you want to publish a package to the world on CRAN (and many of those requirements are useful even for your own packages), most packages won't end up on CRAN. You can even use packages to structure your data analyses, as Robert M Flight discusses in a [series of blog posts](http://rmflight.github.io/posts/2014/07/analyses_as_packages.html). Packages are really easy to create and use once you have the right set of tools.
Anytime you create some reusable set of functions you should put it in a package. It's the easiest path because packages come with conventions: you don't need to figure them out for yourself. You'll start with just your R code in the `R/` directory, and over time you can flesh it out with documentation (in `man/`), vignettes (in `vignettes/`), compiled code (in `src/`), data sets (in `data/`), and tests (in `inst/tests`).
The most accurate resource for up-to-date details on package development is always the official [writing R extensions][r-ext] guide. However, it's rather hard to understand if you're not already familiar with the basics of packages. It's also exhaustive, covering every possible package component, rather than focussing on the most common and useful components, as this book does. Writing R extensions is a useful resource once you've mastered the basics and need to check on more esoteric facts.
## Philosophy
This book espouses my philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.
This philosophy is realised primarily through the devtools package, a suite of R functions that I wrote to automate common development tasks. The goal of devtools is to make package development as painless as possible. It does this by encapsulating all of the best practices of package development that I've learned over the years. Devtools protects you from many potential mistakes, allowing you to focus on the problem at hand, not the mechanics of package creation. And, as I discover and incorporate even better ways of doing things, all you'll need to do to reap the benefits is to simply upgrade devtools.
Developing packages with devtools may take some getting used to. However, I strongly believe that the time invested in mastering it will, in the long run, save you much, much more time. Devtools is a key component of my productivity in package development.
Devtools works hand-in-hand with RStudio, which I believe is the best development environment for most R users. The only real competitor is [ESS](http://ess.r-project.org/), emacs speaks statistics, which is a rewarding environment if you're willing to put in the time to learn emacs and customise it to your needs.
## Getting started
To get started, make sure you have the latest version of R (at least 3.1.0, which was current when I wrote this book), then run the following code to get the packages you'll need:
```{r, eval = FALSE}
install.packages(c("devtools", "roxygen2", "testthat", "knitr"))
```
Make sure you have a recent version of RStudio. Since you're making the jump from an R user to a package developer, I recommend that you download the preview version from <http://www.rstudio.com/products/rstudio/download/preview/>. This gives you access to the latest and greatest features, and only slightly increases your chances of finding a bug.
Use the following code to access new devtools functions as I develop them. This is particularly important during the development of the book.
```{r, eval = FALSE}
devtools::install_github("hadley/devtools")
```
You'll also need a C compiler and a few other command line tools. If you're on Windows or Mac and you don't already have them, RStudio will install them for you. Otherwise:
* On Windows, download and install [Rtools](http://cran.r-project.org/bin/windows/Rtools/).
NB: this is not an R package!
* On Mac, make sure you have either XCode (available for free in the App Store)
or the ["Command Line Tools for Xcode"](http://developer.apple.com/downloads).
NB: you'll need to have an Apple ID (available for free).
* On Linux, make sure you've installed not only R, but also the R development
tools. For example, on ubuntu (and debian) you need to install the
`r-base-dev` package.
You can check that you have everything installed and working by running the following code:
```{r, eval = FALSE}
library(devtools)
has_devel()
#> '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD SHLIB foo.c
#>
#> clang -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG
#> -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include
#> -fPIC -Wall -mtune=core2 -g -O2 -c foo.c -o foo.o
#> clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup
#> -single_module -multiply_defined suppress -L/usr/local/lib -o foo.so foo.o
#> -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework
#> -Wl,CoreFoundation
[1] TRUE
```
This will print out some compilation code (this is needed to help diagnose problems). But you're really only interested in whether it returns `TRUE` (everything's OK) or an error (further investigation needed).
## Package basics
This book is arranged according to the structure of an R package. It starts with the most commonly used components, and then gets more esoteric.
* `R/`: where your R code lives in `.R` files
* `DESCRIPTION`: metadata about the package
* `man/`: function documentation
* `vignettes/`: long-form documentation which show how to combine multiple
parts of your package to solve real problems.
* `NAMESPACE`: ensures that your package plays nicely with other packages.
* `tests/`: stores unit tests that ensure that your package is operating as
designed.
* `data/`: sample datasets (or other R objects)
* `src/`: compiled C, C++ and Fortran source code
## Acknowledgments {#intro-ack}
I would like to thank the tireless contributors to R-devel and, more recently, [stackoverflow](http://stackoverflow.com/questions/tagged/r). There are too many to name individually, but I'd particularly like to thank Dirk Eddelbuettel, Martin Morgan, Luke Tierney, and Brian Ripley for generously giving their time and correcting my countless misunderstandings.
This book was [written in the open](https://github.com/hadley/r-pkgs/), and chapters were advertised on [twitter](https://twitter.com/hadleywickham) when complete. It is truly a community effort: many people read drafts, fixed typos, suggested improvements, and contributed content. Without those contributors, the book wouldn't be nearly as good as it is, and I'm deeply grateful for their help.
```{r, results = "asis", echo = FALSE, eval = FALSE}
#contribs <- system("git --no-pager shortlog -ns > contribs.txt", intern = T)
contribs <- read.delim("contribs.txt", header = FALSE,
stringsAsFactors = FALSE)[-1, ]
names(contribs) <- c("n", "name")
contribs <- contribs[order(contribs$name), ]
contribs$uname <- ifelse(!grepl(" ", contribs$name),
paste0("@", contribs$name), contribs$name)
cat("Thanks go to all contributers in alphabetical order: ")
cat(paste0(contribs$uname, collapse = ", "))
cat(".\n")
```
## Conventions
Throughout this book I use `f()` to refer to functions, `g` to refer to variables and function parameters, and `h/` to paths.
Larger code blocks intermingle input and output. Output is commented so that if you have an electronic version of the book, e.g., <http://r-pkgs.had.co.nz>, you can easily copy and paste examples into R. Output comments look like `#>` to distinguish them from regular comments.
## Colophon
This book was written in [Rmarkdown](http://rmarkdown.rstudio.com/) inside [RStudio](http://www.rstudio.com/ide/). [knitr](http://yihui.name/knitr/) and [pandoc](http://johnmacfarlane.net/pandoc/) converted the raw Rmarkdown to html and pdf. The [website](http://adv-r.had.co.nz) was made with [jekyll](http://jekyllrb.com/), styled with [bootstrap](http://getbootstrap.com/), and automatically published to Amazon's [S3](http://aws.amazon.com/s3/) by [travis-ci](https://travis-ci.org/). The complete source is available from [github](https://github.com/hadley/r-pkgs).
This version of the book was built with:
```{r}
devtools::session_info()
```
[r-ext]:http://cran.r-project.org/doc/manuals/R-exts.html#Creating-R-packages