-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme volatility in USA: Cumulative Number Of Performed CV Tests #189
Comments
Hello @DataOps-epam, thank you for your feedback. We will also check on our side if we have an issue with the cumulative sum of the source data. data_US <- readr::read_csv(
"https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/testing_data/time_series_covid19_US.csv",
col_names = TRUE,
show_col_types = FALSE) |>
mutate(date = as.Date(date, format = "%m/%d/%Y"))|>
select(date, tests_combined_total) |>
dplyr::group_by(date) |>
dplyr::summarise(cum_tests = sum(tests_combined_total, na.rm = T))
tail(data_US, 20)
# # A tibble: 20 × 2
# date cum_tests
# <date> <dbl>
# 1 2022-03-29 852124868
# 2 2022-03-30 853232274
# 3 2022-03-31 854242710
# 4 2022-04-01 845136012
# 5 2022-04-02 845397095
# 6 2022-04-03 845637721
# 7 2022-04-04 846394199
# 8 2022-04-05 847525133
# 9 2022-04-06 848362052
# 10 2022-04-07 849031771
# 11 2022-04-08 849609426
# 12 2022-04-09 850821150
# 13 2022-04-10 851036553
# 14 2022-04-11 851595486
# 15 2022-04-12 854936478
# 16 2022-04-13 852836553
# 17 2022-04-14 854146287
# 18 2022-04-15 836263614
# 19 2022-04-16 778594875
# 20 2022-04-17 778666699 |
Yes we've been working on this in the past few days. There were some revisions at the source. But there are also some missing states. We can't reproduce this with R because, the Python scraping tool uses a pandas library(dplyr)
#>
usa <- readr::read_csv("https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/testing_data/time_series_covid19_US.csv", show_col_types = FALSE) |>
select(date, state, tests_combined_total)
data_sum <-
usa |>
mutate(date = as.Date(date, "%m/%d/%Y")) |>
#na.locf does something similar to pandas fillna(method = "ffill")
#mutate(tests_combined_total = zoo::na.locf(tests_combined_total)) |>
group_by(date) |>
summarize(tests_total = sum(tests_combined_total))
missing <-
usa |>
filter(state %in% c("MN", "KS"),
date %in% c("4/11/2022","4/12/2022","4/13/2022", "4/14/2022", "4/15/2022", "4/16/2022", "4/17/2022"))
print(missing)
#> # A tibble: 14 x 3
#> date state tests_combined_total
#> <chr> <chr> <dbl>
#> 1 4/11/2022 KS 2269976
#> 2 4/11/2022 MN 16379616
#> 3 4/12/2022 KS 2269976
#> 4 4/12/2022 MN 16402859
#> 5 4/13/2022 KS 2269976
#> 6 4/13/2022 MN 16402859
#> 7 4/14/2022 KS 2269976
#> 8 4/14/2022 MN 16402859
#> 9 4/15/2022 KS NA
#> 10 4/15/2022 MN NA
#> 11 4/16/2022 KS NA
#> 12 4/16/2022 MN NA
#> 13 4/17/2022 KS NA
#> 14 4/17/2022 MN NA We are working on a fix now at our end. |
Thank you @benubah, looking forward to the updates! |
Hi @findanna, |
Dear FIND,
Could you please check your time series "USA: Cumulative Number Of Performed CV Tests" here: https://raw.githubusercontent.com/dsbbfinddx/FINDCov19TrackerData/master/processed/coronavirus_tests.csv
We can see extreme volatility in 04/15/2022 period with latest release.
Could you please advise on this.
Thanks.
#USA.COVID19.B@FIND
The text was updated successfully, but these errors were encountered: