nflvR (NFL Virtual Reality)

NFL Virtual Reality (nflvR, for short) is the beginning of an R and Python package which allows users to look at the impact of individual players through player trades and free-agency acquisitions, as well as Head Coaches and Play Callers.

White Paper

Foreword:

"I firmly believe that the Big Data Bowl competition is an ideas competition where the biggest and boldest ideas win or do well - showcased by my project last year in PlayerTV. This year, I took the next step in big ideas in attempting to make an NFL Simulator using all the data that the NFL has released. Unfortunately, the idea was too bold for the time period given, but we (Myself, Tim Keller and Joseph Armstrong) are looking to build on the project into an R Package and beyond." - Zac Rogers.

Contents Section:

Foreword
Contents Section
Team Introductions
Brief Description
Why?
How?
Future Work
Appendix

Team Introductions:

Zac Rogers - Zac is a BI Analyst living in Newcastle, United Kingdom (British). Last year, he was an Honourable Mention in the Big Data Bowl 2022 with his idea PlayerTV.

Tim Keller - Tim is currently doing a Degree in Computer Science at Basel University in Switzerland (but is German). Tim has traditionally focused on Football (Soccer) and NHL data but decided to expand to NFL data for the Big Data Bowl.

Joseph Armstrong - Joseph is a recent graduate from the University of Cincinnati with a master's degree in Information Technology. Joseph has focused on American Football and Baseball data in his sports data projects.

Brief Description:

nflvR is the beginning of an R package which allows users to look at the impact of individual players through player trades and free-agency acquisitions, as well as Head Coaches and Play Callers.

For the purpose of the competition, we focused on the Offensive Line. The effect of an offensive line player is not just how many sacks or pressures they prevent but how they impact the rest of the team. This is because if there is no faith in the Oline, then the only passing game will be either quick or pass plays which help the Oline, which impacts evaluations on QBs, Receivers and beyond. Not only does it affect players, but it can affect our opinions on Head Coaches (and/or playcallers). This means that to determine the value (either positive or negative), we needed to be able to export individuals (or the whole Oline) and place them in different situations to determine their true value. However, while these simulators exist, they aren't available to the public. So, we are building one for the public.

Why?

No one disagrees with the statement that "NFL is one of the most complex sports for data". While there are more fluid sports (like Football aka Soccer), the fact that the same 22 players (plus some substitutes) are always on the field makes it slightly easier. The NFL has three different teams (Offensive, Defensive and Special) who rotate depending on who has possession and the down of snap. You can pass or run, and each has its benefits and cons associated. Combining this with mathematics and how mathematical models work, the outcome is that most Sports Analytics companies and teams have created simulators that they can do A/B hypothesis testing to work out who provides the best value to their teams.

What does this mean for us, the public? It means that public data is significantly behind the curve because of the available data - NFLscapR and NFLverse, along with the Big Data Bowl, have changed the landscape but there is still a long way to go. The Big Data Bowl has now released data for the passing game (Big Data Bowl 2021 and 2023) and Special Teams (Big Data Bowl 2022) - and partial data for the rushing game (Big Data Bowl 2020). As a result, with out-the-box thinking and problem-solving, we can build the first-ever public NFL simulator to create the next step in NFL Open Source Data Analysis.

How?

There are three parts to the simulator:

Backend
Models
Output

BACKEND:

For the Backend, there are a lot of things that seem fairly simple but are complex. The idea is to replicate the NFLverse play-by-play data frame, allowing users to copy the same analysis they would do for actual games for the virtual reality we are creating. For nflvR version 0.1.5, we have 118 fields of data which can be split into nine categories:

nflvR IDs
Outcome Data
Team Coaching Data
Play-By-Play IDs
Time Data
Yards Data
Play Description Data
Stadium Data
Player Participation Data

Some of these fields are more complex than others for creating the dataframe. The simple ones, such as player participation, will be done by a model output, or teams will be selected by user inputs. In comparison, other like yardline_100 has eight different formulas to work out where it should be due to turnovers, punts, kickoffs, the end of the half, PAT and 2 Point conversions.

MODELS:

For Models, the work must be planned out so that the models can interlink. This is because the end goal is that the simulator is run off two input groupings: User or Model. As of the Kaggle Submission, there are 35 different models that we have determined need to be made. Currently, we have got two models as Work In Progress (QB Decision and Oline Protection).

As of submission, this is the list of models we want to create:

Offensive Personnel Decisions
Defensive Personnel Decisions
Offensive Depth Chart Decisions
Defensive Depth Chart Decisions
Offensive Formation
Defensive Formation Decision Shell Type
Defensive Formation Box Count
Offensive Playcall Run/Pass Decision
Offensive Playcall Routes Run
Defensive Playcall Blitzer Count
Defensive Playcall Coverage Type
Receiver Route Depth
Receiver Route Speed
Receiver Route Scramble Drill
Coverage Models for Man Coverage
Coverage Models for Zone Coverage
QB Decision
OLine Protection
Catch Point Outcome
Receiver YAC
Rushing Play Block Success
Rushing Play Yards
Time Out Decisions
4th Down Decisions
Field Goal Success
Punt Kick Yards
Punt Kick Angle
Punt Ball Bounce Angle
Punt Returner Decision
Punt Return Yards
Kickoff Kick Distance
Kickoff Ball Bounce Angle
Kickoff Returner Decision
Kickoff Return Yards
Madden Data to NGS data

OUTPUT:

The output of the simulator caused an interesting discussion within our team because there are many approaches that you can take. The decision we made was to leave the output to the user. This is because the goal of the simulator is to have it used in the same way that they would use nflverse data.

The output for the project was decided to be a "nice to have" and will be discussed more in Future Work.

Future Work:

I mentioned that we are looking to continue developing the project into a R Package, and the work is underway. We all acknowledge there is a long way to go in making this project as good as we would like it to be.

For the Backend, we will continue to work on finishing off some features in order for the simulator to work properly. There is, also, some future-proofing work that is needed to ensure that the backend is ready to put 29 (and counting) different models into it and for the simulator to function as it should.

For the models, we need to work on all the models mentioned above. There is a lot of work on this front and the likelihood is that we will hope to add more people to the development team to speed up the production of this package. The hope was to see how much we could make within the Big Data Bowl rules and then expand after, and we are still on track with that.

For the output, eventually, we would like to create a generic Shinyapp that anyone can look at. This app could look at both game-level and season-level analysis to compare and contrast the impact of certain players for both player and team stats.

linkcode

Appendix:

The Appendix is where we have listed our models we have mentioned and what we hope the output from each model will be.

Offensive Personnel Decisions

When looking at Offensive Personnel Decisions, the output we seek is a percentage chance that the playcaller (either actual or user-inputted) would call each personnel option (mainly focusing on 11, 12, 21 and 22) given the situation that the team is in on a given play.

Defensive Personnel Decisions

When looking at Defensive Personnel Decisions, the output we seek is a percentage chance that the playcaller (either actual or user-inputted) would call each personnel option (mainly focusing on Base, Nickel and Dime) given the situation that the team is in on a given play. The decision ideally needs to be impacted by depth chart options and onfield metadata.

Offensive Depth Chart Decisions

When looking at Offensive Depth Chart Decisions, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would decide that an individual player (mainly focusing on WRs and RBs) would be on the field on a given play. The decision needs to be impacted by the depth chart options each playcaller has to his disposals and ideally impacted by onfield metadata.

Defensive Depth Chart Decisions

When looking at Defensive Depth Chart Decisions, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would decide that an individual player (mainly focusing on DL rotation) would be on the field on a given play. The decision needs to be impacted by the depth chart options each playcaller has to his disposals and ideally impacted by onfield metadata.

Offensive Formation

When looking at Offensive Formation Decision, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would call each formation given the players on the field. The decision would also be impacted by onfield metadata of the given play.

Defensive Formation Decision Shell Type

When looking at Defensive Formation Decision Shell Type, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would call Single High or Two High Shell. The decision would also be impacted by onfield metadata of the given play.

Defensive Formation Box Count

When looking at Defensive Formation Decision Box Count, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would have defenders in the box. The decision would also be impacted by onfield metadata of the given play.

Offensive Playcall Run/Pass Decision

When looking at Offensive Run/Pass Decision, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would call a run or a pass, given the onfield metadata for the given play.

Offensive Playcall Routes Run

When looking at Offensive Playcall Routes Run, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would call each playcall (mainly focusing on Receiver routes or Receiver blocking) given the onfield metadata.

Defensive Playcall Blitzer Count

When looking at Defensive Playcall Blitzer Count, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would use a playcall with the number of pass rushers given the onfield metadata.

Defensive Playcall Coverage Type

When looking at Defensive Playcall Coverage, the output we seek is the percentage chance that the playcaller (either actual or user-inputted) would use Man Coverage or Zone Coverage, given the onfield metadata.

Receiver Route Depth

When looking at Receiver Route Depth, the output we seek is the percentage chance that the Route Runner would go a certain distance given his route run and onfield metadata.

Receiver Route Speed

When looking at Receiver Route Speed, the output we seek is the percentage chance that the Route Runner would go at given his route run and onfield metadata.

Receiver Route Scramble Drill

When looking at Receiver Route Scramble Drill, the output we seek is the percentage chance that the Route Runner would react given the team's scheme, route run and onfield metadata.

Coverage Models for Man Coverage

When looking at Coverage Models for Man Coverage, the output we seek is to look at how the player would react when in Man Coverage in a given situation.

Coverage Models for Zone Coverage

When looking at Coverage Models for Zone Coverage, the output we seek is to look at how the player would react when in Zone Coverage in a given situation.

QB Decision

When looking at QB Decision, the output we seek is the percentage chance that in a given situation on the field, the QB either: * Targets a receiver * Scrambles * Throws Away * Fails to make a decision (Sack)

OLine Protection

When looking into OLine Protection the output we seek the probability of the quarterback being limited in any way by the pass rush in any way on a frame by frame bases. By modelling that for every offensive line player we will create outputs for the entire line. Eventually we then also want to derive the time it takes the pash rush to get to the quarterback and how much extra time the offensive line and individual lineman created

Catch Point Outcome

When looking at Catch Point Outcome, the output we seek is the percentage chance that in a given situation on the field, the catch point outcome is either:

Receiver Catches the Ball
Receiver Drops the Ball
Receiver doesn't make contact with the Ball
Defender Deflects the Ball
Defender Intercepts the Ball
Defender doesn't make contact with the Ball

Receiver YAC

When looking at Receiver YAC, the output we seek is the percentage chance that the receiver runs after the catch point in a given situation on the field.

Rushing Play Block Success

When looking at Rushing Play Block Success, the output we seek is the percentage chance that the Run Blocker successfully blocks the defender in the run game, given the situation on the given play and the onfield metadata.

Rushing Play Yards

When looking at Receiver YAC, the output we seek is the percentage chance that the rusher runs on a given situation on the field.

Time Out Decisions

When looking at Time Out Decisions, the output we seek is a percentage chance that the playcaller (either actual or user-inputted) would call a Time Out, given the onfield metadata.

4th Down Decisions

When looking at 4th Down Decisions, the output we seek is a percentage chance that the playcaller (either actual or user-inputted) would elect to go for it on 4th down (vs Field Goal and Punt) given the onfield metadata.

Field Goal Success

When looking at Field Goal Success, the output we week is a percentage chance that the Field Goal Kicker would successfully score a Field Goal given the situation on the play.

Punt Kick Yards

When looking at Punt Kick Yards, the output we week is a percentage chance that the Punter would punt the ball given the situation on the play.

Punt Returner Decision

When looking at Punt Returner Decision, the output we seek is the percentage chance that in a given situation on the field, the Punt Returner either:

Fair Catches
Attempts to Catch and Run
Lets the Ball bounce

Kick Returner Decision

When looking at Kick Returner Decision, the output we seek is the percentage chance that in a given situation on the field, the Kick Returner either:

Fair Catches
Attempts to Catch and Run
Lets the Ball bounce

Punt Return Yards

When looking at Punt Return Yards, the output we seek is the percentage chance that the Punt Returner runs after the catch point in a given situation on the field.

Kick Return Yards

When looking at Kick Return Yards, the output we seek is the percentage chance that the Kick Returner runs after the catch point in a given situation on the field.

Madden Data to NGS data

When looking at Madden Data to NGS data models, we seek to look at correlation and causation of Madden Data and how it compares to NGS data to understand the potential metadata of each player further. Madden data is how we get data across each player and season even if the NGS data we have doesn't include them.

Twitter Posting

This project is not only designed to win the prizemoney, but also to present your talents to a wider audience. The majority of people do this by posting their submission on Twitter. You can view the Twitter posting here: https://twitter.com/TheRogersUK/.