Which Microstats Drive Production in the NHL?
Going beneath traditional metrics to unpack points
Microstats have always been my go-to for uncovering the hidden details that shape game outcomes. Unlike broader metrics like Corsi or expected goals, microstats drill down into specific actions like passes, zone exits, and shot types that coaches can directly translate into practice plans. I’ve long believed that metrics such as Corsi or expected goals are simply the sum of these smaller building blocks.
Out of curiosity, I decided to run a regression model to identify which microstats correlate most strongly with points in the NHL. Here’s what I found, how I built the model, and what it means for teams looking to boost their scoring.
My Process
Using R and AI, I built a regression model to predict points based on a range of microstats, leveraging play-by-play data (similar to what’s available from providers like Instat). I narrowed the model to 75 predictors, down from 91 in a previous version, to simplify the analysis and reduce overfitting. Packages like dplyr handled data wrangling, scales helped with formatting, and car checked for multicollinearity using Variance Inflation Factors (VIF). The goal was to create a robust, interpretable model that still captured the complexity of NHL scoring.
The model performed strongly, with an R-squared of 0.9918, meaning it explained 99.18% of the variance in points. However, the residual standard error (0.2179) was slightly higher than before, suggesting predictions were a bit less precise due to the trimmed set of predictors. I also found that four variables, controlled entries, scoring chances, shots, and zone exits per 60 minutes caused singularities, likely because they were too closely related to other variables.
Key Findings
So which microstats most strongly correlate with points?
Passes (Estimate: 1.656, p < 0.001): Every additional pass strongly boosts points. This isn’t surprising since puck movement usually creates scoring opportunities.
Shots off Rush (Estimate: 0.696, p < 0.001): Shots from rush plays have a big positive impact, reflecting the high-danger nature of fast breaks.
Exits (Estimate: 4.295, p < 0.001): Successful zone exits are a massive driver of points, likely because they kickstart offensive transitions.
Shots off Forecheck or Cycle per 60 (Estimate: 0.161, p < 0.001): Sustained pressure through forechecking or cycling leads to more points.
Rush Assists per 60 (Estimate: 0.180, p < 0.001): Assists on rush plays are a strong predictor, aligning with the importance of speed in today’s NHL.
Some results were also very interesting to see and interpret. For instance, Botched Retrievals (Estimate: 0.232, p < 0.001) and Failed Exits (Estimate: 0.238, p < 0.001) positively correlate with points, which seems accurate since having excellent on-puck decision-making will result in increased offence. On the flip side, Rushed Exits (Estimate: -0.172, p < 0.001) and Build-up Passes per 60 (Estimate: -0.138, p < 0.001) hurt points.
What Coaches Can Do
These findings are actionable. To maximize points, teams should:
Prioritize puck movement: Encourage more passes to create scoring opportunities.
Focus on rush plays: Practice quick transitions and shots off the rush, as they’re high-value actions.
Improve zone exits: Drill clean exits to spark offensive plays without losing possession.
Sustain pressure: Emphasize forechecking and cycle plays to generate shots in high-danger areas.
Final Thoughts
Microstats are a goldmine for understanding what drives scoring in the NHL. By focusing on high-impact actions like passes, rush shots, and zone exits, teams can fine-tune their strategies to generate offense. Of course, no single theory can explain everything. Hockey is simply too dynamic. But organizations with strong development teams capable of teaching and reinforcing these micro-details will set themselves apart in a league where every edge matters.
Curious why you like R over python? Python is my go to, but I would like to learn R.