-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathREADME.Rmd
90 lines (66 loc) · 3.77 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit this file. -->
```{r count_table_prep, echo = FALSE, message = FALSE}
library(dplyr)
library(glue)
lang_counts <- load_langs()
```
[![Join the chat at https://gitter.im/pdrhlik/sweary](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/swearyr)
[![Build Status](https://travis-ci.org/pdrhlik/sweary.svg?branch=master)](https://travis-ci.org/pdrhlik/sweary)
# sweary <img src="sticker/sweary-sticker.png" align="right" width="150" />
Sweary is an R package that contains a database of swear words from different languages, cherry picked by native speakers.
## Installation
The development version of this package can be installed using [devtools](https://github.com/r-lib/devtools):
```
devtools::install_github("pdrhlik/sweary")
```
## Current swear word lists
| Language | Language code | Number of swear words |
| ------------- | ------------- | --------------------- |
`r glue_collapse(lang_counts$label_row, sep = "\n")`
| **Total** | **`r nrow(lang_counts)` langs** | **`r sum(lang_counts$n)`** |
## Examples
All languages are stored in a `swear_words` data frame.
```{r ex_all_langs}
library(sweary)
head(swear_words)
```
You can only extract one language that you are interested in.
```{r ex_one_lang}
en_swear_words <- get_swearwords("en")
head(en_swear_words)
```
## Add (modify) a language
If you are not comfortable with `git` and pull requests, you can just follow steps **1-3**. After you create the file, send it to me via [email](mailto:[email protected]) with a subject **New sweary language: {LANG_CODE}**. We will acknowledge you in the README after we approve of the changes.
1. **Choose a new language.**\
Find its two letter [ISO 639-1 code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes).\
If the language you are creating is a certain dialect (e.g. Canadian French), find its [IETF language tag](https://en.wikipedia.org/wiki/IETF_language_tag) in this [language code table](http://www.lingoes.net/en/translator/langcode.htm).
2. **Create a language file.**\
Place the file in `data-raw/swear-word-lists/{LANG_CODE}_{LANG_NAME}`.\
Examples:\
+ English: `data-raw/swear-word-lists/en_English`
+ Canadian French: `data-raw/swear-word-lists/fr-CA_French (Canada)`\
Note that spaces and parentheses in file names are allowed.
3. **Fill in the file with swear words.** Following rules must apply:
+ **One** swear word per line with no trailing whitespace.
+ All words must be **lowercase**.
+ The list must only contain **unique** words.
+ The list must be **sorted** alphabetically.
4. **Make sure all the tests pass.**\
You can do that using a development function called `build_sweary()`. It becomes available when you `git clone` the repository and call `devtools::load_all()`. Or pressing `Ctrl+Shift+L` in RStudio. Learn more about calling this function using `?build_sweary`.
5. **Create a pull request.**
## Origin
The idea first appeared after the [South Park text analysis lightning talk](https://github.com/pdrhlik/southparktalk-whyr2018) at the [Why R? 2018 conference](http://whyr2018.pl/) in Wrocław. All the contributors will be acknowledged as the work progresses.
## Acknowledgments
Here we would like to say **BIG THANKS** to native speakers that help us with swear words dictionaries:
* Czech - [Patrik Drhlík](https://github.com/pdrhlik)
* English - [Patrik Drhlík](https://github.com/pdrhlik)
* French (Canada) - [Marc-André Désautels](https://github.com/desautm)
* German - [Peter Meißner](https://github.com/petermeissner)
* Greek - Anonymous
* Macedonian - [novica](https://github.com/novica)
* Polish - [Michal Czyz](https://github.com/mczyzj)
* Romanian - Alexandru Supeanu
* Slovak - Šimon Žďárský