hockeyR

Getting started

library(hockeyR)

load_pbp()

As mentioned on the home page, the main function of the hockeyR package is to load raw NHL play-by-play data without having to scrape it and clean it yourself. The load_pbp() function will do that for you. The season argument in load_pbp() is very accepting. You may use any of the following syntax when loading play-by-play data for the 2020-21 NHL season:

To load more than one season, wrap your desired years in c(). That is, to get data for the last two years, one could enter load_pbp(c(2020,2021)).

get_game_ids()

If you want to load play-by-play data for a game that isn’t in the data repository, or perhaps you just want a single game and don’t need to load a full season, you’ll first need to find the numeric game ID. The get_game_ids() function can find it for you as long as you supply it with the date of the game in YYY-MM-DD format. The function defaults to the current date as defined by your operating system.

# get single day ids
get_game_ids(day = "2017-10-17")
#> # A tibble: 11 x 8
#>       game_id season_full date       home_name            away_name home_final_score
#>         <int> <chr>       <chr>      <chr>                <chr>                <int>
#>  1 2017020082 20172018    2017-10-17 New York Rangers     Pittsbur~                4
#>  2 2017020083 20172018    2017-10-17 Philadelphia Flyers  Florida ~                5
#>  3 2017020084 20172018    2017-10-17 Washington Capitals  Toronto ~                0
#>  4 2017020081 20172018    2017-10-17 New Jersey Devils    Tampa Ba~                5
#>  5 2017020085 20172018    2017-10-17 Ottawa Senators      Vancouve~                0
#>  6 2017020086 20172018    2017-10-17 Nashville Predators  Colorado~                4
#>  7 2017020087 20172018    2017-10-17 Winnipeg Jets        Columbus~                2
#>  8 2017020088 20172018    2017-10-17 Dallas Stars         Arizona ~                3
#>  9 2017020089 20172018    2017-10-17 Edmonton Oilers      Carolina~                3
#> 10 2017020090 20172018    2017-10-17 Vegas Golden Knights Buffalo ~                5
#> 11 2017020091 20172018    2017-10-17 San Jose Sharks      Montréal~                5
#> # ... with 2 more variables: away_final_score <int>, game_type <chr>

You can instead supply a season to get_game_ids() to grab a full year’s worth of IDs as well as final scores, home and road teams, and game dates for each game in the given season.

scrape_game()

This function scrapes a single game with a supplied game ID, which can be retrieved with get_game_ids(). Live game scraping has yet to undergo testing.

scrape_game(game_id = 2020030175)
#> # A tibble: 718 x 104
#>    event_type event secondary_type event_team event_team_type description period
#>    <chr>      <chr> <chr>          <chr>      <chr>           <chr>        <int>
#>  1 GAME_SCHE~ Game~ <NA>           <NA>       <NA>            Game Sched~      1
#>  2 CHANGE     Chan~ <NA>           Montréal ~ away            ON: Shea W~      1
#>  3 CHANGE     Chan~ Line change    Toronto M~ home            ON: Wayne ~      1
#>  4 FACEOFF    Face~ <NA>           Toronto M~ home            Auston Mat~      1
#>  5 HIT        Hit   <NA>           Toronto M~ home            Zach Hyman~      1
#>  6 CHANGE     Chan~ On the fly     Montréal ~ away            ON: Jeff P~      1
#>  7 CHANGE     Chan~ On the fly     Toronto M~ home            ON: Alex G~      1
#>  8 CHANGE     Chan~ On the fly     Montréal ~ away            ON: Cole C~      1
#>  9 SHOT       Shot  Wrist Shot     Toronto M~ home            Alex Galch~      1
#> 10 CHANGE     Chan~ On the fly     Toronto M~ home            ON: Jake M~      1
#> # ... with 708 more rows, and 97 more variables: period_seconds <dbl>,
#> #   period_seconds_remaining <dbl>, game_seconds <dbl>,
#> #   game_seconds_remaining <dbl>, home_score <dbl>, away_score <dbl>,
#> #   event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>, ...

scrape_day()

This is the backbone function that keeps the hockeyR-data repository up to date during the season. Supply a date (YYY-MM-DD) and it will scrape play-by-play data for all games on that day. Live game scraping is still awaiting testing.

scrape_day("2015-01-06")
#> # A tibble: 6,472 x 105
#>    event_type event secondary_type event_team event_team_type description period
#>    <chr>      <chr> <chr>          <chr>      <chr>           <chr>        <int>
#>  1 GAME_SCHE~ Game~ <NA>           <NA>       <NA>            Game Sched~      1
#>  2 CHANGE     Chan~ <NA>           Buffalo S~ away            ON: Josh G~      1
#>  3 CHANGE     Chan~ Line change    New Jerse~ home            ON: Patrik~      1
#>  4 FACEOFF    Face~ <NA>           Buffalo S~ away            Zemgus Gir~      1
#>  5 BLOCKED_S~ Bloc~ <NA>           Buffalo S~ away            Andy Green~      1
#>  6 CHANGE     Chan~ On the fly     Buffalo S~ away            ON: Chris ~      1
#>  7 GIVEAWAY   Give~ <NA>           New Jerse~ home            Giveaway b~      1
#>  8 TAKEAWAY   Take~ <NA>           New Jerse~ home            Takeaway b~      1
#>  9 CHANGE     Chan~ On the fly     New Jerse~ home            ON: Mark F~      1
#> 10 CHANGE     Chan~ On the fly     New Jerse~ home            ON: Jaromi~      1
#> # ... with 6,462 more rows, and 98 more variables: period_seconds <dbl>,
#> #   period_seconds_remaining <dbl>, game_seconds <dbl>,
#> #   game_seconds_remaining <dbl>, home_score <dbl>, away_score <dbl>,
#> #   event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>, ...

If you can wait until the day after a game, the load_pbp() function is the only one you’ll need. If you’d like to scrape the data yourself immediately following a game, the other functions discussed here will do the job for you.