Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove false positives in pass variable #479

Merged
merged 5 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: nflfastR
Title: Functions to Efficiently Access NFL Play by Play Data
Version: 4.6.1.9011
Version: 4.6.1.9012
Authors@R:
c(person(given = "Sebastian",
family = "Carl",
Expand Down Expand Up @@ -71,6 +71,6 @@ Suggests:
testthat (>= 3.0.0)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
Config/testthat/edition: 3
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import(dplyr)
import(fastrmodels)
importFrom(cli,rule)
importFrom(curl,curl_fetch_memory)
importFrom(data.table,"%chin%")
importFrom(data.table,setDT)
importFrom(furrr,future_map)
importFrom(furrr,future_map_chr)
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
- `punter_player_id`, and `punter_player_name` are filled for blocked punt attempts. (#463)
- Fixed an issue affecting scores of 2022 games involving a return touchdown (#466)
- Added identification of scrambles from 1999 through 2004 with thank to Aaron Schatz (#468)
- nflfastR tried to fix bugs in the underlying pbp data of JAX home games prior to the 2016 season. An update of the raw pbp data resolved those bugs so nflfastR needs to remove the hard coded adjustments. This means that nflfastR <= v4.6.1 will return incorrect pbp data for all Jacksonville home games prior to the 2016 season!
- nflfastR tried to fix bugs in the underlying pbp data of JAX home games prior to the 2016 season. An update of the raw pbp data resolved those bugs so nflfastR needs to remove the hard coded adjustments. This means that nflfastR <= v4.6.1 will return incorrect pbp data for all Jacksonville home games prior to the 2016 season! (#478)
- Fixed a problem where `clean_pbp()` returned `pass = 1` in actual rush plays in very rare cases. (#479)

# nflfastR 4.6.1

Expand Down
32 changes: 29 additions & 3 deletions R/helper_additional_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,19 @@ clean_pbp <- function(pbp, ...) {
),
# if there's a pass, sack, or scramble, it's a pass play...
pass = dplyr::if_else(stringr::str_detect(.data$desc, "( pass )|(sacked)|(scramble)") | .data$qb_scramble == 1, 1, 0),
# ...unless it says "backwards pass" and there's a rusher
# ...unless it says "backward(s) pass" or "lateral pass" and there's a rusher
pass = dplyr::if_else(
stringr::str_detect(.data$desc, "(backward pass)|(Backward pass)") & !is.na(.data$rusher),
stringr::str_detect(stringr::str_to_lower(.data$desc), "(backward pass)|(backwards pass)|(lateral pass)") & !is.na(.data$rusher),
0, .data$pass
),
# and make sure there's no pass on a kickoff (sometimes there's forward pass on kickoff but that's not a pass play)
pass = dplyr::case_when(
.data$kickoff_attempt == 1 ~ 0,
TRUE ~ .data$pass
),
# in very rare cases, the pass logic can fail. We do a hard coded overwrite here because it's not worth the time
# to overthink the logic to catch weird play descriptions.
pass = fix_werid_pass_plays(.data$pass, .data$game_id, .data$play_id),
#if there's a rusher and it wasn't a QB kneel or pass play, it's a run play
rush = dplyr::if_else(!is.na(.data$rusher) & .data$qb_kneel == 0 & .data$pass == 0, 1, 0),
#fix some common QBs with inconsistent names
Expand Down Expand Up @@ -281,7 +284,7 @@ clean_pbp <- function(pbp, ...) {
big_parser <- "(?<=)[A-Z][A-z]*+(\\.|\\s)+[A-Z][A-z]*+\\'*\\-*[A-Z]*+[a-z]*+(\\s((Jr.)|(Sr.)|I{2,3})|(IV))?"
# maybe some spaces and letters, and then a rush direction unless they fumbled
rush_finder <- "(?=\\s*[a-z]*+\\s*((FUMBLES) | (left end)|(left tackle)|(left guard)|(up the middle)|(right guard)|(right tackle)|(right end)))"
# maybe some spaces and leters, and then pass / sack / scramble
# maybe some spaces and letters, and then pass / sack / scramble
pass_finder <- "(?=\\s*[a-z]*+\\s*(( pass)|(sack)|(scramble)))"
# to or for, maybe a jersey number and a dash
receiver_finder <- "(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})"
Expand Down Expand Up @@ -401,3 +404,26 @@ add_qb_epa <- function(pbp, ...) {
return(pbp)
}

# Function that fixes false "pass" positives in some hard coded plays where
# the parser logic reached its limit
fix_werid_pass_plays <- function(pass, game_id, play_id){
combined_id <- paste(game_id, play_id, sep = "_")
false_positives <- c(
"1999_01_ARI_PHI_1611",
"1999_01_SF_JAX_1788",
"1999_01_SF_JAX_2081",
"1999_11_ATL_TB_1740",
"2001_09_MIN_PHI_1307",
"2001_14_NE_BUF_452",
"2002_16_PIT_TB_527",
"2003_02_HOU_NO_3924",
"2003_15_PIT_NYJ_873",
"2004_05_BUF_NYJ_2555",
"2005_07_SD_PHI_321",
"2011_02_STL_NYG_1369",
"2016_05_NE_CLE_912",
"2016_06_CAR_NO_2690",
"2020_10_BAL_NE_2013"
)
data.table::fifelse(combined_id %chin% false_positives, 0, pass, pass)
}
2 changes: 1 addition & 1 deletion R/nflfastR-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
#' @import dplyr
#' @importFrom cli rule
#' @importFrom curl curl_fetch_memory
#' @importFrom data.table setDT
#' @importFrom data.table setDT %chin%
#' @import fastrmodels
#' @importFrom furrr future_map_chr future_map_dfr future_map
#' @importFrom future plan
Expand Down
Loading