Aaron Burns '22 presented National Hockey League event rate data research conducted during the Summer Undergraduate Research Experience (SURE), courses that were special to St. Lawrence's 2021 summer term, at the Midwest Sports Analytics Conference.
In this project, we analyzed National Hockey League (NHL) event rate data in an attempt to classify teams into winners and losers. The goal is to produce efficient out-of-sample predictions for game-winners. The data set was all NHL games from the 2015-2020 regular seasons and included all NHL recorded event data each game.
In order to minimize the effects of game state, the data set was filtered to instances of even strength and where the goal differential was “close," less than or equal to 1. All data was rate-standardized to be number of events per minute of even-strength close time. We explored and compared the following classification methods Linear Discriminant Analysis, Quadratic Discriminant Analysis, K-Nearest Neighbors, Logistic Regression, and Classification Tree Method.
The most efficient model achieved for this prediction rate was the KNN model, with an out-of-sample prediction rate of 62.50%. Non-surprisingly, the least efficient model was the Logistic Regression model with successful predictions about 50% of the time. Using a season-based chronological split for our training and testing data saw an increase in the performance of the Logistic approach but none of the other methods saw major changes. With these results, we conclude that to most efficiently predict whether an event team won the game, the best model/method to use is that of K-Nearest Neighbors.