forked from OpenAPC/openapc-de
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
381 lines (289 loc) · 20.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
```{r, echo = FALSE, warning = TRUE}
knitr::opts_knit$set(base.url = "/")
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE,
echo = FALSE,
fig.width = 9,
fig.height = 6
)
options(scipen = 999, digits = 2, tibble.width = Inf, tibble.print_max = Inf)
knitr::knit_hooks$set(inline = function(x) {
prettyNum(x, big.mark = ",")
})
```
```{r}
require(dplyr)
require(stringr)
require(ggplot2)
```
```{r, echo=FALSE, cache = FALSE}
my_apc <- readr::read_csv("data/apc_de.csv")
my_bpc <- readr::read_csv("data/bpc.csv")
```
## About
The aim of this repository is:
- to release data sets on fees paid for Open Access journal articles and monographs by Universities and Research Society Funds under an Open Database License
- to demonstrate how reporting on fee-based Open Access publishing can be made more transparent and reproducible across institutions.
At the moment this project provides the following cost data:
| Publication Type | Count | Aggregated Sum (€) | Contributing Institutions |
|------------------|-----------------|-------------------------|----------------------------------------|
| Articles |`r nrow(my_apc)` | `r sum(my_apc$euro)` | `r length(unique(my_apc$institution))` |
| Monographs |`r nrow(my_bpc)` | `r sum(my_bpc$euro)` | `r length(unique(my_bpc$institution))` |
## How to access the data?
There are several options. You may simply download the the raw data sets in CSV format, query our [OLAP server](https://github.com/OpenAPC/openapc-olap/blob/master/HOWTO.md) or use our [Treemap site](https://treemaps.openapc.net/) for visual data exploration.
| Dataset | CSV File | OLAP Cube | Treemap |
|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|-----------------------------------------------------------------------|
| articles | [APC file](https://github.com/OpenAPC/openapc-de/blob/master/data/apc_de.csv), [data schema](https://github.com/OpenAPC/openapc-de/wiki/schema#openapc-data-set) | [APC cube](https://olap.openapc.net/cube/openapc/aggregate) | [APC treemap](https://treemaps.openapc.net/apcdata/openapc/) |
| monographs | [BPC file](https://github.com/OpenAPC/openapc-de/blob/master/data/bpc.csv), [data schema](https://github.com/OpenAPC/openapc-de/wiki/schema#bpc-data-set) | [BPC cube](https://olap.openapc.net/cube/bpc/aggregate) | [BPC treemap](https://treemaps.openapc.net/apcdata/bpc/) |
Our latest data release can always be accessed via the following DOI:
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6883472.svg)](https://doi.org/10.5281/zenodo.6883472)
## How to contribute?
Any academic institution or research funder paying for Article Process Charges (APCs) or Book Processing Charges (BPCs) can contribute to OpenAPC, no formal registration is required.
This [page](https://github.com/OpenAPC/openapc-de/wiki/Data-Submission-Handout) ([German version](https://github.com/OpenAPC/openapc-de/wiki/Handreichung-Dateneingabe)) explains the details. The following institutions have contributed to OpenAPC so far:
```{r, echo=FALSE, results = 'asis'}
ins <- readr::read_csv("data/institutions.csv")
countries <- readr::read_csv("data/translation_countries.csv")
ins_types <- readr::read_csv("data/translation_institution_types.csv")
ins_groups <- readr::read_csv("data/translation_institution_groups.csv")
for (i in seq_along(countries$country)) {
ctry_full_name <- countries[[i, "country_full_name"]]
ctry <- countries[[i, "country"]]
cat(paste("## Institutions from", ctry_full_name, "\n\n"))
for (j in seq_along(ins_types$institution_type)) {
ins_type <- ins_types[[j, "institution_type"]]
if (!is.na(ins_type)) {
institutions_with_type <- filter(ins, country == ctry, institution_type == ins_type)
}
else {
institutions_with_type <- filter(ins, country == ctry, is.na(institution_type))
}
if (nrow(institutions_with_type) > 0) {
if (!is.na(ins_type)) {
cat(paste("### ", ins_types[[j, "institution_type_full_name"]], "\n\n"))
}
for (k in seq_along(ins_groups$institution_group)) {
ins_group <- ins_groups[[k, "institution_group"]]
if (!is.na(ins_group)) {
institutions_with_group <- filter(institutions_with_type, institution_group == ins_group)
}
else {
institutions_with_group <- filter(institutions_with_type, is.na(institution_group))
}
institutions_with_group = arrange(institutions_with_group, institution_full_name)
if (nrow(institutions_with_group) > 0) {
if (!is.na(ins_group)) {
cat(paste("#### ", ins_groups[[k, "institution_group_full_name"]], "\n\n"))
}
for (l in seq_along(institutions_with_group$institution)) {
comment <- ""
if (!is.na(institutions_with_group[[l, "comment"]])) {
comment <- paste(" (", institutions_with_group[[l, "comment"]], ")", sep="")
}
name <- institutions_with_group[[l, "institution_full_name"]]
if (!is.na(institutions_with_group[[l, "info_url"]])) {
name <- paste("[", name, "](", institutions_with_group[[l, "info_url"]], ")", sep="")
}
cat(paste("- ", name, comment, "\n", sep=""))
}
cat("\n")
}
}
}
}
}
```
## Data sets
*Note: The following numbers and plots are always based on the [latest revision](https://github.com/OpenAPC/openapc-de/releases/latest) of the OpenAPC data set. The underlying code can be found in the associated [R Markdown template](README.Rmd).*
### Articles (APCs)
```{r, echo=FALSE}
fully_oa <- my_apc %>% filter(is_hybrid == FALSE)
hybrid <- my_apc %>% filter(is_hybrid == TRUE)
```
The article data set contains information on `r nrow(my_apc)` open access journal articles being published in fully and hybrid open access journal. Publication fees for these articles were supported financially by `r length(unique(my_apc$institution))` research performing institutions and research funders.
In total, article publication fee spending covered by the OpenAPC initiative amounted to € `r sum(my_apc$euro)`. The average payment was € `r mean(my_apc$euro)` and the median was € `r median(my_apc$euro)`.
`r nrow(fully_oa)` articles in the data set were published in fully open access journals. Total spending on publication fees for these articles amounts to € `r sum(fully_oa$euro)`, including value-added tax; the average payment was € `r mean(fully_oa$euro)` (median = € `r median(fully_oa$euro)`, SD = € `r sd(fully_oa$euro)`).
Hybrid open access journals rely on both publication fees and subscriptions as revenue source. `r nrow(hybrid)` articles in the data set were published in hybrid journals. Total expenditure amounts to `r sum(hybrid$euro)` €; the average fee was € `r mean(hybrid$euro)` (median = € `r median(hybrid$euro)`, SD = € `r sd(hybrid$euro)`).
#### Spending distribution over fully and hybrid open access journals
```{r, echo=FALSE}
p <- my_apc %>%
mutate(is_hybrid = ifelse(.$is_hybrid, "Hybrid OA journals", "Fully OA journals")) %>%
mutate(short_period = str_sub(period, 3 ,4)) %>%
ggplot(aes(factor(short_period), euro)) +
geom_boxplot(outlier.size = 0.5) +
xlab("Year") +
ylab("Publication fee paid in Euro") +
scale_y_continuous(limits = c(0,8000)) +
facet_grid(~is_hybrid) +
theme_bw()
ggsave(p, path = "figure/", filename = "boxplot_oa.png", width=9, height=4.5, units="in", device = "png")
```
![](figure/boxplot_oa.png)
#### Spending distribution details
```{r, echo=FALSE, results='asis', message = FALSE}
oa_grouped <- my_apc %>%
filter(is_hybrid == FALSE) %>%
select(period, euro) %>%
group_by(period) %>%
summarise(
n = n(),
mean = mean(euro),
median = median(euro),
min = min(euro),
max = max(euro)
) %>%
mutate(n = format(n, big.mark=","),
mean = format(mean, big.mark=","),
median = format(median, big.mark=","),
min = format(min, big.mark=","),
max = format(max, big.mark=","),
minmax = str_c(min, " - ", max)
) %>%
select(period, oa_n = n, oa_mean = mean, oa_median = median, oa_min_max = 'minmax')
hyb_grouped <- my_apc %>%
filter(is_hybrid == TRUE) %>%
select(period, euro) %>%
group_by(period) %>%
summarise(
n = n(),
mean = mean(euro),
median = median(euro),
min = min(euro),
max = max(euro)
) %>%
mutate(n = format(n, big.mark=","),
mean = format(mean, big.mark=","),
median = format(median, big.mark=","),
min = format(min, big.mark=","),
max = format(max, big.mark=","),
minmax = str_c(min, " - ", max)
) %>%
select(period, hyb_n = n, hyb_mean = mean, hyb_median = median, hyb_min_max = 'minmax')
full_join(oa_grouped, hyb_grouped) %>%
knitr::kable(col.names = c("period", "OA articles", "OA mean", "OA median", "OA min - max", "Hybrid Articles", "Hybrid mean", "Hybrid median", "Hybrid min - max"), align = c("l","r", "r", "r", "r", "r", "r", "r", "r"))
```
### Books (BPCs)
The book data set contains information on `r nrow(my_bpc)` open access books. Publication fees were supported financially by `r length(unique(my_bpc$institution))` research performing institutions and funders.
In total, book processing charges covered by the OpenAPC initiative amounted to € `r sum(my_bpc$euro)`. The average payment was € `r mean(my_bpc$euro)` and the median was € `r median(my_bpc$euro)`.
Books can be made Open Access right from the beginning ("frontlist") or only retroactively after having been published traditionally in the first place ("backlist"), which can have a big influence on the paid BPCs.
#### Spending distribution over frontlist and backlist OA books
```{r, echo=FALSE}
p <- my_bpc %>%
mutate(backlist_oa = ifelse(.$backlist_oa, "Backlist OA", "Frontlist OA")) %>%
mutate(short_period = str_sub(period, 3 ,4)) %>%
ggplot(aes(factor(short_period), euro)) +
geom_boxplot(outlier.size = 0.5) +
xlab("Year") +
ylab("Publication fee paid in Euro") +
scale_y_continuous(limits = c(0,12000)) +
facet_grid(~backlist_oa) +
theme_bw()
ggsave(p, path = "figure/", filename = "boxplot_bpcs.png", width=9, height=4.5, units="in", device = "png")
```
![](figure/boxplot_bpcs.png)
#### Spending distribution details
```{r, echo=FALSE, results='asis', message = FALSE}
frontlist_grouped <- my_bpc %>%
filter(backlist_oa == FALSE) %>%
select(period, euro) %>%
group_by(period) %>%
summarise(
n = n(),
mean = mean(euro),
median = median(euro),
min = min(euro),
max = max(euro)
) %>%
mutate(n = format(n, big.mark=","),
mean = format(mean, big.mark=","),
median = format(median, big.mark=","),
min = format(min, big.mark=","),
max = format(max, big.mark=","),
minmax = str_c(min, " - ", max)
) %>%
select(period,frontlist_n = n,frontlist_mean = mean,frontlist_median = median,frontlist_min_max = minmax)
backlist_grouped <- my_bpc %>%
filter(backlist_oa == TRUE) %>%
select(period, euro) %>%
group_by(period) %>%
summarise(
n = n(),
mean = mean(euro),
median = median(euro),
min = min(euro),
max = max(euro)
) %>%
mutate(n = format(n, big.mark=","),
mean = format(mean, big.mark=","),
median = format(median, big.mark=","),
min = format(min, big.mark=","),
max = format(max, big.mark=","),
minmax = str_c(min, " - ", max)
) %>%
select(period, backlist_n = n, backlist_mean = mean, backlist_median = median, backlist_min_max = minmax)
full_join(frontlist_grouped, backlist_grouped) %>%
knitr::kable(col.names = c("period", "Frontlist books", "mean BPC", "median BPC", "BPC min - max", "Backlist books", "mean BPC", "median BPC", "BPC min - max"), align = c("l","r", "r", "r", "r", "r", "r", "r", "r"))
```
## Use of external sources
Metadata representing publication titles or publisher names is obtained from Crossref in order to avoid extensive validation of records. Cases where we don't re-use information from Crossref to disambiguate the spending metadata are documented [here](python/test/test_apc_csv.py). Moreover, indexing coverage in Europe PMC and the Web of science is automatically checked.
### Articles
|Source |Variable |Description |
|:--------------|:---------|:-----------------------------------------------|
|CrossRef |`publisher` |Title of Publisher |
|CrossRef |`journal_full_title` |Full title of the journal |
|CrossRef |`issn` |International Standard Serial Numbers (collapsed) |
|CrossRef |`issn_print` |ISSN print |
|CrossRef |`issn_electronic` |ISSN electronic |
|CrossRef |`license_ref` |License of the article |
|CrossRef |`indexed_in_crossref` |Is the article metadata registered with CrossRef? (logical) |
|EuropePMC |`pmid` |PubMed ID |
|EuropePMC |`pmcid` |PubMed Central ID |
|Web of Science |`ut` |Web of Science record ID |
|DOAJ |`doaj` |Is the journal indexed in the DOAJ? (logical) |
### Books
|Source |Variable |Description |
|:--------------|:---------|:-----------------------------------------------|
|CrossRef |`publisher` |Title of Publisher |
|CrossRef |`book_title` |Full Title of a Book |
|CrossRef |`isbn` |International Standard Book Number |
|CrossRef |`isbn_print` |ISBN print |
|CrossRef |`isbn_electronic` |ISBN electronic |
|CrossRef |`license_ref` |License of the article |
|CrossRef |`indexed_in_crossref` |Is the article metadata registered with CrossRef? (logical) |
|DOAB |`doab` |Is the book indexed in the DOAB? (logical) |
```{r, echo=FALSE, cache = FALSE}
my.apc <- readr::read_csv("data/apc_de.csv")
my.apc_doi <- my.apc[!is.na(my.apc$doi), ]
my.apc_pmid <- my.apc[!is.na(my.apc$pmid), ]
my.apc_pmcid <- my.apc[!is.na(my.apc$pmcid), ]
my.apc_ut <- my.apc[!is.na(my.apc$ut), ]
my.bpc <- readr::read_csv("data/bpc.csv")
my.bpc_doi <- my.bpc[!is.na(my.bpc$doi), ]
```
### Indexing coverage
|Identifier | Coverage (articles) | Coverage (Books) |
|:--------------------------|:------------------------------------------------------------------|-------------------------------------------------------------|
|DOI | `r format(nrow(my.apc_doi)*100/nrow(my.apc), digits = 4)`% |`r format(nrow(my.bpc_doi)*100/nrow(my.bpc), digits = 4)`% |
|PubMed ID | `r format(nrow(my.apc_pmid)*100/nrow(my.apc), digits = 4)`% | NA |
|PubMed Central ID | `r format(nrow(my.apc_pmcid)*100/nrow(my.apc), digits = 4)`% | NA |
|Web of Science record ID | `r format(nrow(my.apc_ut)*100/nrow(my.apc), digits = 4)`% | NA |
## License
The data sets are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
This work is licensed under the Creative Commons Attribution 4.0 Unported License.
## How to cite?
When citing this data set, please indicate the [release](https://github.com/OpenAPC/openapc-de/releases/) you are referring to. The releases also contain information on contributors relating to the respective release.
Please do not cite the master branch of the Github repository (https://github.com/OpenAPC/openapc-de/tree/master/), but use the release numbers/tags.
Bielefeld University Library archives a copy (including commit history). To cite:
{Contributors:} *Datasets on fee-based Open Access publishing across German Institutions*. Bielefeld University. [10.4119/UNIBI/UB.2014.18](http://dx.doi.org/10.4119/UNIBI/UB.2014.18)
## Acknowledgement
This project was set up in collaboration with the [DINI working group Electronic Publishing](http://dini.de/ag/e-pub1/). It follows [Wellcome Trust example to share data on paid APCs](http://blog.wellcome.ac.uk/2014/03/28/the-cost-of-open-access-publishing-a-progress-report/) and recognises efforts from [JISC](https://www.jisc-collections.ac.uk/News/Releasing-open-data-about-Total-Cost-of-Ownership/) and the [ESAC initative](http://esac-initiative.org/) to standardise APC reporting.
## Contributors
Jens Harald Aasheim, Sarah Abusaada, Benjamin Ahlborn, Chelsea Ambler, Magdalena Andrae, Jochen Apel, Karina Barros Ferradás, Myriam Bastin, Hans-Georg Becker, Roland Bertelmann, Daniel Beucke, Manuela Bielow, Peter Blume, Ute Blumtritt, Sabine Boccalini, Stefanie Bollin, Katrin Bosselmann, Valentina Bozzato, Kim Braun, Christoph Broschinski, Paolo Buoso, Cliff Buschhart, Dorothea Busjahn, Pablo de Castro, Ann-Kathrin Christann, Roberto Cozatl, Micaela Crespo Quesada, Amanda Cullin, Patrick Danowski, Gernot Deinzer, Julia Dickel, Andrea Dorner, Stefan Drößler, Karin Eckert, Carsten Elsner, Clemens Engelhardt, Olli Eskola, Katrin Falkenstein-Feldhoff, Ashley Farley, Inken Feldsien-Sudhaus, Silke Frank, Fabian Franke, Claudia Frick, Marléne Friedrich, Paola Galimberti, Agnes Geißelmann, Kai Karin Geschuhn, Silvia Giannini, Marianna Gnoato, Steffi Grimm, Birgit Hablizel, Florian Hagen, Christina Hemme, Ulrich Herb, Elfi Hesse, Dana Horch, Larissa Gordon, Ute Grimmel-Holzwarth, Evgenia Grishina, Christian Gutknecht, Uli Hahn, Kristina Hanig, Margit L. Hartung, Dominik Hell, Eike Hentschel, Ulrich Herb, Stephanie Herzog, Kathrin Höhner, Conrad Hübler, Christie Hurrell, Arto Ikonen, Doris Jaeger, Najko Jahn, Alexandra Jobmann, Daniela Jost, Tiina Jounio, Juho Jussila, Nadja Kalinna, Mirjam Kant, Andreas Kennecke, Robert Kiley, Ilka Kleinod, Lydia Koglin, Nives Korrodi, Biljana Kosanovic, Stephanie Kroiß, Gerrit Kuehle, Stefanie Kutz, Marjo Kuusela, Anna Laakkonen, Ignasi Labastida i Juan, Gerald Langhanke, Inga Larres, Stuart Lawson, Anne Lehto, Sari Leppänen, Camilla Lindelöw, Maria Löffler, Jutta Lotz, Kathrin Lucht-Roussel, Susanne Luger, Jan Lüth, Frank Lützenkirchen, Steffen Malo, Anna Marini, Manuel Moch, Vlatko Momirovski, Andrea Moritz, Max Mosterd, Marcel Nieme, Anja Oberländer, Martina Obst, Jere Odell, Linda Ohrtmann, Vitali Peil, Gabriele Pendorf, Mikko Pennanen, Dirk Pieper, Tobias Pohlmann, Thomas Porquet, Markus Putnings, Andrée Rathemacher, Rainer Rees-Mertins, Edith Reschke, Ulrike Richter, Katharina Rieck, Friedrich Riedel, Simone Rosenkranz, Florian Ruckelshausen, Steffen Rudolph, Ilka Rudolf, Pavla Rygelová, Lea Satzinger, Annette Scheiner, Isabo Schick, Michael Schlachter, Birgit Schlegel, Andreas Schmid, Barbara Schmidt, Katharina Schulz, Stefanie Seeh, Barbara Senkbeil-Stoffels, Adriana Sikora, Tereza Simandlová, Stefanie Söhnitz, Jana Sonnenstuhl, Lisa Spindler, Susanne Stemmler, Matti Stöhr, Eva Stopková, Marius Stricker, Andrea Stühn, Kálmán Szőke, Linda Thomas, Anne Timm, Laura Tobler, Johanna Tönsing, Marco Tullney, Milan Vasiljevic, Astrid Vieler, Lena Vinnemann, Viola Voß, Christin Wacke, Roland Wagner, Agnieszka Wenninger, Kerstin Werth, Martin Wimmer, Marco Winkler, Sabine Witt, Michael Wohlgemuth, Verena Wohlleben, Qingbo Xu, Philip Young, Esther Zaugg, Miriam Zeunert, Philipp Zumstein
## Contact
For bugs, feature requests and other issues, please submit an issue via [Github](https://github.com/OpenAPC/openapc-de/issues/new).
For general comments, email openapc at uni-bielefeld.de
## Disclaimer
People, who are looking for "Open Advanced Process Control Software" for automation, visualization and process control tasks from home control up to industrial automation, please follow <http://www.openapc.com> (2015-09-30)