Skip to content

Commit

Permalink
Merge pull request #44 from panukatan/dev
Browse files Browse the repository at this point in the history
add 2021 typhoon data; fix #42
  • Loading branch information
ernestguevarra authored Jul 27, 2024
2 parents d46981a + 118e80a commit 0b04d9e
Show file tree
Hide file tree
Showing 14 changed files with 130 additions and 26 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@
.Ruserdata
docs
inst/doc
data-raw/*.pdf
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# bagyo v0.1.1.9000

* added CRAN DOI badge
* added 2021 typhoon data


# bagyo 0.1.1

Expand Down
4 changes: 2 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Oceans and seas significantly impact continental weather, with evaporation from

The Philippines frequently experiences tropical cyclones (called ***bagyo*** - pronounced /baɡˈjo/, [bɐɡˈjo] - in the Filipino language) because of its geographical position. These cyclones typically bring heavy rainfall, leading to widespread flooding, as well as strong winds that cause significant damage to human life, crops, and property. Data on cyclones are collected and curated by the [Philippine Atmospheric, Geophysical, and Astronomical Services Administration (PAGASA)](https://www.pagasa.dost.gov.ph/).

This package contains Philippine tropical cyclones data from 2017 to 2020 in a machine-readable format. It is hoped that this data package provides an interesting and unique dataset for data exploration and visualisation as an adjunct to the traditional [`iris`](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/iris.html) dataset and to the current [`palmerpenguins`](https://allisonhorst.github.io/palmerpenguins/) dataset.
This package contains Philippine tropical cyclones data from 2017 to 2021 in a machine-readable format. It is hoped that this data package provides an interesting and unique dataset for data exploration and visualisation as an adjunct to the traditional [`iris`](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/iris.html) dataset and to the current [`palmerpenguins`](https://allisonhorst.github.io/palmerpenguins/) dataset.

## Installation

Expand Down Expand Up @@ -140,7 +140,7 @@ bagyo |>
x = "wind speed (km/h)",
y = "central pressure (hPa)"
) +
facet_wrap(. ~ year, ncol = 4) +
facet_wrap(. ~ year, ncol = 5) +
theme_bw() +
theme(
legend.position = "top",
Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ library(bagyo)
data(package = "bagyo")

bagyo
#> # A tibble: 86 × 9
#> # A tibble: 101 × 9
#> year category_code category_name name rsmc_name start
#> <dbl> <fct> <fct> <chr> <chr> <dttm>
#> 1 2017 TD Tropical Depression Auri… <NA> 2017-01-07 08:00:00
Expand All @@ -95,7 +95,7 @@ bagyo
#> 8 2017 TS Tropical Storm Huan… Haitang 2017-07-30 02:00:00
#> 9 2017 STS Severe Tropical Storm Isang Hato 2017-08-20 08:00:00
#> 10 2017 TS Tropical Storm Joli… Pakhar 2017-08-24 14:00:00
#> # ℹ 76 more rows
#> # ℹ 91 more rows
#> # ℹ 3 more variables: end <dttm>, pressure <int>, speed <int>
```

Expand All @@ -117,11 +117,11 @@ bagyo |>
#> # A tibble: 5 × 4
#> category_name n mean_pressure mean_speed
#> <fct> <int> <dbl> <dbl>
#> 1 Tropical Depression 23 996. 39.8
#> 2 Tropical Storm 25 986. 61.6
#> 3 Severe Tropical Storm 15 978. 75
#> 4 Typhoon 21 941. 102.
#> 5 Super Typhoon 2 908. 112.
#> 1 Tropical Depression 27 995. 39.3
#> 2 Tropical Storm 29 987. 58.8
#> 3 Severe Tropical Storm 17 979. 72.6
#> 4 Typhoon 23 943. 99.1
#> 5 Super Typhoon 5 907 113
```

### `bagyo` is useful in learning how to work with dates
Expand All @@ -135,11 +135,11 @@ bagyo |>
#> # A tibble: 5 × 2
#> category_name mean_duration
#> <fct> <drtn>
#> 1 Tropical Depression 46.69565 hours
#> 2 Tropical Storm 57.48000 hours
#> 3 Severe Tropical Storm 79.13333 hours
#> 4 Typhoon 106.66667 hours
#> 5 Super Typhoon 77.50000 hours
#> 1 Tropical Depression 45.29630 hours
#> 2 Tropical Storm 61.03448 hours
#> 3 Severe Tropical Storm 81.29412 hours
#> 4 Typhoon 110.04348 hours
#> 5 Super Typhoon 115.60000 hours
```

### `bagyo` is great to visualise
Expand Down
Binary file removed data-raw/2017.pdf
Binary file not shown.
Binary file removed data-raw/2018.pdf
Binary file not shown.
Binary file removed data-raw/2019.pdf
Binary file not shown.
Binary file removed data-raw/2020.pdf
Binary file not shown.
106 changes: 103 additions & 3 deletions data-raw/process_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ library(tibble)
## Download PDFs ----
urls <- paste0(
"https://pubfiles.pagasa.dost.gov.ph/pagasaweb/files/tamss/weather/tcsummary/PAGASA_ARTC_",
2017:2020, ".pdf"
2017:2021, ".pdf"
)

Map(
f = download.file,
url = as.list(urls),
destfile = as.list(
paste0("data-raw/", 2017:2020, ".pdf")
paste0("data-raw/", 2017:2021, ".pdf")
)
)

Expand Down Expand Up @@ -372,8 +372,108 @@ df_2020 <- set1_2020 |>
)


## 2021 data ----

#df_2021 <- pdftools::pdf_data("data-raw/2021.pdf")[[42]]

set1_2021 <- pdftools::pdf_data("data-raw/2021.pdf")[[41]] |>
dplyr::filter(y %in% 409:616) |>
dplyr::pull(text) |>
(\(x) x[x != "to"])() |>
(\(x) c(
x[1:24], NA_character_, x[25:43], x[46:47], NA_character_,
x[48:88], x[91:112], x[115:149], NA_character_, x[150:168]
))() |>
matrix(ncol = 11, byrow = TRUE) |>
data.frame() |>
tibble::tibble() |>
dplyr::rename_with(
.fn = function(x)
c("domestic_name", "international_name", "international_code",
"warning_start_date", "warning_start_time", "warning_end_date",
"warning_end_time", "peak_speed", "peak_pressure", "peak_date",
"peak_time")
) |>
dplyr::mutate(
international_name = ifelse(
international_name == "Unnamed", NA_character_, international_name
),
international_code = stringr::str_remove_all(
string = international_code, pattern = "\\(|\\)"
),
warning_start = paste0(warning_start_date, "/2020 ", warning_start_time) |>
strptime(format = "%m/%d/%Y %H", tz = "UTC"),
warning_end = paste0(warning_end_date, "/2020 ", warning_end_time) |>
strptime(format = "%m/%d/%Y %H", tz = "UTC"),
dplyr::across(.cols = peak_pressure:peak_speed, .fns = ~as.integer(.x)),
peak_time = paste0(peak_date, "/2020 ", peak_time) |>
strptime(format = "%m/%d/%Y %H", tz = "UTC")
)

set2_2021 <- pdftools::pdf_data("data-raw/2021.pdf")[[42]] |>
dplyr::filter(y %in% 168:328) |>
dplyr::pull(text) |>
(\(x) x[x != "to"])() |>
(\(x) c(
x[1:19], NA_character_, x[20:23], NA_character_,
x[24:28], NA_character_, x[29:43], NA_character_,
x[44:69], NA_character_, x[70:79], NA_character_,
x[80:112], paste(x[113:114], collapse = ""), x[115:140], NA_character_,
x[141:length(x)]
))() |>
matrix(ncol = 11, byrow = TRUE) |>
data.frame() |>
tibble::tibble() |>
dplyr::rename_with(
.fn = function(x)
c("domestic_name", "international_name", "international_code",
"warning_start_date", "warning_start_time", "warning_end_date",
"warning_end_time", "duration_days", "duration_hours", "category_code",
"landfall")
) |>
dplyr::mutate(
international_name = ifelse(
international_name == "Unnamed", NA_character_, international_name
),
international_code = stringr::str_remove_all(
string = international_code, pattern = "\\(|\\)"
),
warning_start = paste0(warning_start_date, "/2021 ", warning_start_time) |>
strptime(format = "%m/%d/%Y %H", tz = "UTC"),
warning_end = paste0(warning_end_date, "/2021 ", warning_end_time) |>
strptime(format = "%m/%d/%Y %H", tz = "UTC"),
category_code = factor(
category_code,
levels = c("TD", "TS", "STS", "TY", "STY")
),
category_name = factor(
category_code,
levels = c("TD", "TS", "STS", "TY", "STY"),
labels = c(
"Tropical Depression", "Tropical Storm", "Severe Tropical Storm",
"Typhoon", "Super Typhoon"
)
)
)

df_2021 <- set1_2021 |>
dplyr::mutate(
warning_start = set2_2021$warning_start,
warning_end = set2_2021$warning_end,
category_code = set2_2021$category_code,
category_name = set2_2021$category_name
) |>
dplyr::select(
category_code, category_name, domestic_name, international_name,
warning_start, warning_end, peak_pressure, peak_speed
) |>
dplyr::mutate(
domestic_name = stringr::str_to_title(domestic_name),
international_name = stringr::str_to_title(international_name)
)

## Concatenate ----
bagyo <- rbind(df_2017, df_2018, df_2019, df_2020) |>
bagyo <- rbind(df_2017, df_2018, df_2019, df_2020, df_2021) |>
dplyr::mutate(
year = lubridate::year(warning_start), .before = category_code
) |>
Expand Down
Binary file modified data/bagyo.rda
Binary file not shown.
Binary file modified man/figures/README-barplot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-scatterplot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions vignettes/bagyo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ The following information is available from the dataset:

This metadata can be viewed in R through a call to `?bagyo` in the R console.

Whilst tropical cyclones have affected the Philippines far earlier than 2017 and more currently than 2020, official and publicly available data for the information described above is only available in the reports for years 2017 to 2020. Earlier documents of this annual reporting pre-2017 have been produced but are not available on the [PAGASA](https://www.pagasa.dost.gov.ph/) website. These reports of the tropical cyclone season (re-started in 2019) are published within two years after the termination of the season. Hence, the most recent report is only up to 2020 for now.
Whilst tropical cyclones have affected the Philippines far earlier than 2017 and more currently than 2021, official and publicly available data for the information described above is only available in the reports for years 2017 to 2020. Earlier documents of this annual reporting pre-2017 have been produced but are not available on the [PAGASA](https://www.pagasa.dost.gov.ph/) website. These reports of the tropical cyclone season (re-started in 2019) are published within two years after the termination of the season. Hence, the most recent report is only up to 2021 for now.

It is expected that reports for 2021 onwards will continue to be published and made available by PAGASA. As such, the `bagyo` package and the `bagyo` dataset within it will be updated accordingly. Continued efforts are also being taken to find sources of information for years preceding 2017.
It is expected that reports for 2022 onwards will continue to be published and made available by PAGASA. As such, the `bagyo` package and the `bagyo` dataset within it will be updated accordingly. Continued efforts are also being taken to find sources of information for years preceding 2017.

<br>
<br>
14 changes: 7 additions & 7 deletions vignettes/visualisation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -151,12 +151,12 @@ bagyo |>
mapping = aes(colour = year), method = "lm", se = FALSE, linewidth = 0.75
) +
scale_colour_manual(
values = c("#9c5e60", "#4b876e", "#465b92", "#e5be72")
values = c("#9c5e60", "#4b876e", "#465b92", "#e5be72", "#5d0505")
) +
scale_shape_manual(values = 15:18) +
scale_shape_manual(values = 15:19) +
labs(
title = "Maximum sustained wind speed by duration of cyclones",
subtitle = "2017-2020",
subtitle = "2017-2021",
x = "speed (km/h)", y = "duration (hours)",
colour = "Year", shape = "Year"
) +
Expand All @@ -180,7 +180,7 @@ bagyo |>
scale_y_continuous(breaks = seq(from = 0, to = 6, by = 1)) +
labs(
title = "Number of cyclones over time",
subtitle = "2017-2020",
subtitle = "2017-2021",
x = NULL,
y = "n"
) +
Expand All @@ -206,7 +206,7 @@ bagyo |>
geom_boxplot(colour = "#4b876e", fill = "#4b876e", alpha = 0.5) +
labs(
title = "Distribution of tropical cyclone maximum sustained wind speed",
subtitle = "2017-2022",
subtitle = "2017-2021",
x = NULL, y = "speed (km/h)"
) +
theme_minimal() +
Expand All @@ -225,7 +225,7 @@ bagyo |>
) +
labs(
title = "Distribution of tropical cyclone maximum sustained wind speed",
subtitle = "2017-2022",
subtitle = "2017-2021",
x = NULL, y = "speed (km/h)"
) +
theme_minimal() +
Expand All @@ -240,7 +240,7 @@ bagyo |>
geom_jitter(colour = "#4b876e", size = 3, width = 0.2) +
labs(
title = "Distribution of tropical cyclone maximum sustained wind speed",
subtitle = "2017-2022",
subtitle = "2017-2021",
x = NULL, y = "speed (km/h)"
) +
theme_minimal() +
Expand Down

0 comments on commit 0b04d9e

Please sign in to comment.