library(hockeyR)
load_pbp()
As mentioned on the home page, the main function of the hockeyR
package is to load raw NHL play-by-play data without having to scrape it and clean it yourself. The load_pbp()
function will do that for you. The season
argument in load_pbp()
is very accepting. You may use any of the following syntax when loading play-by-play data for the 2020-21 NHL season:
To load more than one season, wrap your desired years in c()
. That is, to get data for the last two years, one could enter load_pbp(c(2020,2021))
.
get_game_ids()
If you want to load play-by-play data for a game that isn’t in the data repository, or perhaps you just want a single game and don’t need to load a full season, you’ll first need to find the numeric game ID. The get_game_ids()
function can find it for you as long as you supply it with the date of the game in YYY-MM-DD
format. The function defaults to the current date as defined by your operating system.
# get single day ids
get_game_ids(day = "2017-10-17")
#> # A tibble: 11 x 8
#> game_id season_full date home_name away_name home_final_score
#> <int> <chr> <chr> <chr> <chr> <int>
#> 1 2017020082 20172018 2017-10-17 New York Rangers Pittsbur~ 4
#> 2 2017020083 20172018 2017-10-17 Philadelphia Flyers Florida ~ 5
#> 3 2017020084 20172018 2017-10-17 Washington Capitals Toronto ~ 0
#> 4 2017020081 20172018 2017-10-17 New Jersey Devils Tampa Ba~ 5
#> 5 2017020085 20172018 2017-10-17 Ottawa Senators Vancouve~ 0
#> 6 2017020086 20172018 2017-10-17 Nashville Predators Colorado~ 4
#> 7 2017020087 20172018 2017-10-17 Winnipeg Jets Columbus~ 2
#> 8 2017020088 20172018 2017-10-17 Dallas Stars Arizona ~ 3
#> 9 2017020089 20172018 2017-10-17 Edmonton Oilers Carolina~ 3
#> 10 2017020090 20172018 2017-10-17 Vegas Golden Knights Buffalo ~ 5
#> 11 2017020091 20172018 2017-10-17 San Jose Sharks Montréal~ 5
#> # ... with 2 more variables: away_final_score <int>, game_type <chr>
You can instead supply a season to get_game_ids()
to grab a full year’s worth of IDs as well as final scores, home and road teams, and game dates for each game in the given season.
scrape_game()
This function scrapes a single game with a supplied game ID, which can be retrieved with get_game_ids()
. Live game scraping has yet to undergo testing.
scrape_game(game_id = 2020030175)
#> # A tibble: 718 x 104
#> event_type event secondary_type event_team event_team_type description period
#> <chr> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 GAME_SCHE~ Game~ <NA> <NA> <NA> Game Sched~ 1
#> 2 CHANGE Chan~ <NA> Montréal ~ away ON: Shea W~ 1
#> 3 CHANGE Chan~ Line change Toronto M~ home ON: Wayne ~ 1
#> 4 FACEOFF Face~ <NA> Toronto M~ home Auston Mat~ 1
#> 5 HIT Hit <NA> Toronto M~ home Zach Hyman~ 1
#> 6 CHANGE Chan~ On the fly Montréal ~ away ON: Jeff P~ 1
#> 7 CHANGE Chan~ On the fly Toronto M~ home ON: Alex G~ 1
#> 8 CHANGE Chan~ On the fly Montréal ~ away ON: Cole C~ 1
#> 9 SHOT Shot Wrist Shot Toronto M~ home Alex Galch~ 1
#> 10 CHANGE Chan~ On the fly Toronto M~ home ON: Jake M~ 1
#> # ... with 708 more rows, and 97 more variables: period_seconds <dbl>,
#> # period_seconds_remaining <dbl>, game_seconds <dbl>,
#> # game_seconds_remaining <dbl>, home_score <dbl>, away_score <dbl>,
#> # event_player_1_name <chr>, event_player_1_type <chr>,
#> # event_player_2_name <chr>, event_player_2_type <chr>,
#> # event_player_3_name <chr>, event_player_3_type <chr>,
#> # event_goalie_name <chr>, strength_state <glue>, strength_code <chr>, ...
scrape_day()
This is the backbone function that keeps the hockeyR-data repository up to date during the season. Supply a date (YYY-MM-DD
) and it will scrape play-by-play data for all games on that day. Live game scraping is still awaiting testing.
scrape_day("2015-01-06")
#> # A tibble: 6,472 x 105
#> event_type event secondary_type event_team event_team_type description period
#> <chr> <chr> <chr> <chr> <chr> <chr> <int>
#> 1 GAME_SCHE~ Game~ <NA> <NA> <NA> Game Sched~ 1
#> 2 CHANGE Chan~ <NA> Buffalo S~ away ON: Josh G~ 1
#> 3 CHANGE Chan~ Line change New Jerse~ home ON: Patrik~ 1
#> 4 FACEOFF Face~ <NA> Buffalo S~ away Zemgus Gir~ 1
#> 5 BLOCKED_S~ Bloc~ <NA> Buffalo S~ away Andy Green~ 1
#> 6 CHANGE Chan~ On the fly Buffalo S~ away ON: Chris ~ 1
#> 7 GIVEAWAY Give~ <NA> New Jerse~ home Giveaway b~ 1
#> 8 TAKEAWAY Take~ <NA> New Jerse~ home Takeaway b~ 1
#> 9 CHANGE Chan~ On the fly New Jerse~ home ON: Mark F~ 1
#> 10 CHANGE Chan~ On the fly New Jerse~ home ON: Jaromi~ 1
#> # ... with 6,462 more rows, and 98 more variables: period_seconds <dbl>,
#> # period_seconds_remaining <dbl>, game_seconds <dbl>,
#> # game_seconds_remaining <dbl>, home_score <dbl>, away_score <dbl>,
#> # event_player_1_name <chr>, event_player_1_type <chr>,
#> # event_player_2_name <chr>, event_player_2_type <chr>,
#> # event_player_3_name <chr>, event_player_3_type <chr>,
#> # event_goalie_name <chr>, strength_state <glue>, strength_code <chr>, ...
If you can wait until the day after a game, the load_pbp()
function is the only one you’ll need. If you’d like to scrape the data yourself immediately following a game, the other functions discussed here will do the job for you.