NFL Databowl
- Python
- TensorFlow
- Matplotlib
- Jupyter Notebook
- JavaScript
Predicts the outcomes of NFL rushing plays using a convolutional neural net trained on NFL Next Gen stats. Estimates likely yardage gained in a probabilistic distribution. Built an environment to train and test models remotely. Achieved scores in the top 5% on the Kaggle leaderboard
Problem
The NFL Databowl challenged my team to predict the outcomes of NFL rushing plays using an existing database of play outcomes from the NFL Next Gen Stats program. In competition with other teams, we analyzed the existing data, cleaned, prepared, and loaded it, and used statistical and ML models to estimate the outcomes.
Findings
Our team achieved our highest score with a convolutional neural net, a frankly surprising solution to the problem. We selected a subset of $f$ features that early analysis had shown were most significant, in particular the position, velocity, orientation, and acceleration of each of the 22 players on the field. We then built an $11 \times 11 \times f$ matrix of player interactions, comparing every offensive player with every defensive player. This formed our input matrix to a CNN with a window of 1, allowing us to use our model weights more efficiently and decrease training time while retaining accuracy. The model architecture is roughly described by this diagram.
Our final model placed in the top 5% of databowl submissions with a CRPS score of .012 on the published testing set. Below is the loss progression over 20 epochs of training.