forked from farach/huggingfaceR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
119 lines (91 loc) · 3.48 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# huggingfaceR
<!-- badges: start -->
<!-- badges: end -->
The goal of `huggingfaceR` is to to bring state-of-the-art NLP models to R. `huggingfaceR` is built on top of Hugging Face's [transformers](https://huggingface.co/docs/transformers/index) library; and has support for navigating the Hugging Face Hub [The Hub](https://huggingface.co/models).
## Installation
Prior to installing `huggingfaceR` please be sure to have your python environment set up correctly.
```{r eval = FALSE}
install.packages("reticulate")
library(reticulate)
install_miniconda()
```
If you are having issues, more detailed instructions on how to install and configure python can be found [here](https://support.rstudio.com/hc/en-us/articles/360023654474-Installing-and-Configuring-Python-with-RStudio).
After that you can install the development version of huggingfaceR from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("farach/huggingfaceR")
```
## Example
`huggingfaceR` makes use of the `transformers` `pipline()` abstraction to quickly make pre-trained language models available for use in R. In this example we will load the `distilbert-base-uncased-finetuned-sst-2-english` model and its tokenizer into a pipeline object to obtain sentiment scores.
```{r example}
library(huggingfaceR)
distilBERT <- hf_load_pipeline(
model_id = "distilbert-base-uncased-finetuned-sst-2-english",
task = "text-classification"
)
distilBERT
```
With the pipeline now loaded, we can begin using the model.
```{r}
distilBERT("I like you. I love you")
```
We can use this pipeline in a typical tidyverse processing chunk. First we load the `tidyverse`.
```{r}
library(tidyverse)
```
We can use the `huggingfaceR` `hf_load_dataset()` function to pull in the [emotion](https://huggingface.co/datasets/emotion) Hugging Face dataset. This dataset contains English Twitter messages with six basic emotions: anger, fear, love, sadness, and surprise. We are interested in how well the Distilbert model classifies these emotions as either a positive or a negative sentiment.
```{r}
emo <- hf_load_dataset(
dataset = "emo",
split = "train",
as_tibble = TRUE,
label_name = "int2str"
)
emo_model <- emo %>%
sample_n(100) %>%
transmute(
text,
emotion_id = label,
emotion_name = label_name,
distilBERT_sent = distilBERT(text)
) %>%
unnest_wider(distilBERT_sent)
glimpse(emo_model)
```
We can use `ggplot2` to visualize the results.
```{r}
emo_model |>
mutate(
label = paste0("Distilbert class:\n", label),
emotion_name = str_to_title(emotion_name)
) |>
ggplot(aes(x = emotion_name, y = score, color = label)) +
geom_boxplot(show.legend = FALSE, outlier.alpha = 0.4, ) +
scale_color_manual(values = c("#D55E00", "#6699CC")) +
facet_wrap(~ label) +
labs(
title = "Reviewing Distilbert classification predictions",
x = "Original label",
y = "Model score",
caption = "source:\nhttps://huggingface.co/datasets/emo"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0))
)
```