-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathweb-query.Rmd
79 lines (61 loc) · 2.87 KB
/
web-query.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# Querying Web Resources
Packages that rely on access to web resources need to be written
carefully. Web resources can change location, can be temporarily
unavailable, or can be very slow to access and retrieve. Functions
that query web resources, should anticipate and handle such situations
gracefully -- failing quickly and clearly when the resource is not
available in a reasonable time frame. Some avoidable problems seen in
_Bioconductor_ package code include infinite loops, use of all
available _R_ connections, and unclear error messages.
## Guiding Principles
Remember the _Bioconductor_ packages are built nightly across multiple
operating systems, and that users benefit from easy-to-run vignettes
and examples.
1. Download files of reasonable size. Use `system.time()` to estimate the
download time. Remember the package should require less than 10 minutes to
run `R CMD check --no-build-vignettes` with an upper limit of 15 minutes.
2. Set a limit on the number of times the function tries a URL. Avoid
`while()` statements that have no guaranteed termination. These
become infinite loops and eventually result in build-system `TIMEOUT`s.
3. Supply informative error messages.
## Template for Resource Queries
This function can serve as a template for appropriate resource
retrieval. It tries to retrieve the resource one or several times before
failing, and takes as arguments:
- `URL`, the resource to be queried, typically `character(1)` or
`url()`.
- `FUN`, the function to be used to query the resource. Examples might
include `readLines()`, `download.file()`, `httr::GET()`,
`RCurl::getURL()`.
- `...`: additional arguments used by `FUN`.
- `N.TRIES`: the number of times the URL will be attempted; only under
exceptional circumstances might this differ from its default value.
The return value is the retrieved resource. If resource retrieval
fails, the function indicates the failure, including the condition
(error) message on the last attempt. Warnings propagate to the user in
the normal way.
getURL <- function(URL, FUN, ..., N.TRIES=1L) {
N.TRIES <- as.integer(N.TRIES)
stopifnot(length(N.TRIES) == 1L, !is.na(N.TRIES))
while (N.TRIES > 0L) {
result <- tryCatch(FUN(URL, ...), error=identity)
if (!inherits(result, "error"))
break
N.TRIES <- N.TRIES - 1L
}
if (N.TRIES == 0L) {
stop("'getURL()' failed:",
"\n URL: ", URL,
"\n error: ", conditionMessage(result))
}
result
}
Base _R_ functions using `url()` connections respect
`getOption("timeout")`; see `?url` for details.
`FUN` might be implemented to retrieve the resource and test for
status, e.g.,
FUN <- function(URL, ...) {
response <- httr::GET(URL, timeout(getOption("timeout")), ...)
stop_for_status(response)
response
}