-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathdataio.Rmd
179 lines (126 loc) · 4.28 KB
/
dataio.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
title: "DATA I/O"
author: "`r Sys.Date()`<br><p style='color:royalblue'>[Mansun Kuo](https://tw.linkedin.com/pub/mansun-kuo/82/3b4/344)</p>"
date: '`r Sys.Date()`'
#ratio: 4x3
ratio: 16x10
output:
rmdshower::shower:
self_contained: true
katex: false
theme: material
css: css/shower.css
params:
refresh: no
---
```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Data I/O {.white .Section}
<img src="img/bali.jpg" class="cover">
# Useful data I/O in R
- Text:
- Plain text: **writeLines**, **readLines**, scan, ...
- JSON: **jsonlite::fromJSON**, **jsonlite::toJSON**
- Table:
- **data.table::fwrite**, **data.table::fread**, write.csv, read.csv, ...
- Binary file: writeBin, readBin, writeRDS, readRDS, ...
- Database:
- **[RSQLite](https://cran.r-project.org/web/packages/RSQLite/index.html)**,
[RODBC](https://cran.r-project.org/web/packages/RODBC/index.html),
[RJDBC](https://cran.r-project.org/web/packages/RJDBC/index.html),
[RMySQL](https://cran.r-project.org/web/packages/RMySQL/index.html),
...
# Quoting in R
Beside standard characters, there are several
escaped characters that have special meaning.
For example:
- \\n: newline
- \\t: tab
A character with newline and tab:
```{r}
greeting = "Hi!\n\tHow are you? \n\tI'm fine, thank you!"
cat(greeting)
```
# About Encoding
- The most frustrating part for using R (in Windows)
- Use Linux or Mac can avoid a lot of encoding issue of R
- You may need to assign appropriate encoding when reading data
- Useful functions to deal with encoding issues
- **iconv**: Convert a character vector between encodings
- **iconvlist**: list all available encoding
- **Sys.getlocale**: Get your system's locale
- **Encoding**: Read or set the declared encodings for a character vector
# Highlevel I/O for a text file
- **writeLines**: Write text lines to a connection
- **readLines**: Read text lines from a connection
```{r}
writeLines(greeting, con = "dataio/greeting.txt")
readLines(con = "dataio/greeting.txt")
```
# JSON
The most popular format to exchange unstructure data. For example:
```{r results='asis', echo=FALSE}
cat("```json",
readLines("dataio/intro.json"),
"```", sep = "\n")
```
# jsonlite
Provide JSON parseing/generating utilities in R
- **fromJSON**: Parse a JSON string into a R object
- name-value pair -> list
- JSON array -> vector, matrix or data.frame
- **toJSON**: Convert a R object into JSON string
```{r}
library(jsonlite)
intro = fromJSON("dataio/intro.json", simplifyVector = TRUE)
str(intro)
```
# jsonlite - 2
```{r}
intro$license = NULL
toJSON(intro, pretty = TRUE) # convert a list as a pretty JSON
```
# Table
In most situation, you can use
**data.table::fwrite** and **data.table::fread**
to do such tasks much easier and faster.
Here is an example to read a file generated by excel on Windows(Traditional Chinese):
```{r}
library(data.table)
from_cp950 = fread("dataio/cp950.csv", encoding = "UTF-8")
from_cp950$name = iconv(from_cp950$name, from = "CP950", to = "UTF-8")
from_cp950
```
# Table -2
Write the CP950 file into another file with UTF-8 encoding.
Read the UTF-8 file again.
```{r}
fwrite(from_cp950, "dataio/utf8.csv", sep = "\t") # default write as UTF-8
from_utf8 = fread("dataio/utf8.csv", encoding = "UTF-8")
from_utf8
identical(from_cp950, from_utf8)
```
# Database
- In most production environment, you won't load all of your company's data
into R
- R works well with most of popular databases
- [Database management](https://awesome-r.com/#awesome-r-database-management)
- Relation databases are wonderful choices for analysts
A typical SQL query string:
```sql
SELECT <columns you want to retrieve>
FROM <table>
WHERE <comparison of columns>
GROUP BY <columns you want to compute the aggregation result>
ORDER BY <columns you want to order>
HAVING <conditions to filter the result of GROUP BY>
```
# RSQLite
A light-weight database engine interface in R
- [PChome Example](dataio/pchome.html)
# References
- [Quotes](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html)
- [SQL](https://en.wikipedia.org/wiki/SQL)
- [jsonlite](https://cran.r-project.org/web/packages/jsonlite/index.html)
- [data.table](https://cran.r-project.org/web/packages/data.table/index.html)