forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathConnections.Rmd
37 lines (26 loc) · 1.44 KB
/
Connections.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Connections
In R, every time you read data in or write data out, you are using a connection behind the scenes. Connections abstract away the underlying implementation so that you can read and write data the same way, regardless of whether you're writing to a file, an HTTP connection, a pipe, or something more exotic.
* http://biostatmatt.com/R/R-conn-ints/index.html#Top
* `?file`
* https://cran.r-project.org/doc/Rnews/Rnews_2001-1.pdf
* https://cran.r-project.org/doc/manuals/r-release/R-data.html#Connections
## Basics
* default connections: stdin, stderr, stdout
* `cat()` + `cat_line()`
* survey of base connections: file, compressed file, url, pipe, socket, text
* important packages: curl
* blocking vs non-blocking
* pattern: `close()` with `on.exit()` if you opened
## Reading and writing binary data
* `raw()`
* `readBin()` vs `writeBin()`
* text vs binary (newlines and nulls)
## Reading and writing text data
Reading and writing text is more complicated than reading and writing binary data because as soon as you move beyond regular ASCII characters (e.g. a-z, 0-9) there are many different ways of representing the same text. The way in which text data is stored in binary is known as the __encoding__.
* Encodings
* https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/
* in general vs `Encoding`
* `encoding` vs `fileEncoding`
* converting with iconv
* UTF-8 everywhere
* Reliably reading and writing UTF-8