top of page

End-to-End FPL Analysis

I've been working on expanding my skills, and the next step was to learn Python. The skills I focused on were basic data analysis (Pandas, Numpy, Plotly) and data extraction (API calling and HTML web scraping).


In order to showcase my skills, I decided that creating a FPL (Fantasy Premier League) web app would be a good way of looking into something that I've not looked into before but also not too novel that there is no documentation I could follow if I required it.


There is a lot of documentation on extracting FPL data (I used a guide by Frenzel Timothy, https://medium.com/@frenzelts/fantasy-premier-league-api-endpoints-a-detailed-guide-acbd5598eb19), which I used to get the initial API links.


The IDE that I used was Google Colabs (for a couple of reasons, but the main one being that it is cloud-based, so I could do it while travelling when I was free).


There were two main APIs which I needed to request, the first being general information (player, position and team data) and the second being game-by-game data.

Team data was `team_id`, `short_name` and `team`. I decided for aesthetic reasons that `team` which is the full name is best for the team the player plays for, while `short_name`, which is the abbreviation, is best for opponent fixture(s).

Player data uses the `elements` call from the API, which I brought in `element_type` (position), `id` (player_id),`first_name`, `second_name`, `web_name` and `team` (team_id). I also created columns to track the points for clean sheets and goals, as that is dynamic on the player's position.


The second request is an individual player's game-by-game data. The first thing that I needed to do was to create a loop function which extracted each player who played this season and used the `.append()` function to merge the data frames in a `union` style. I, then, selected the columns I wanted and cleaned the field names (adding `_id` to the `fixture` and `opponent_team` and changing `element` to `player_code`and `total_points` to `round_points`). This allowed for an easier join for `team_mapping_df` which will bring in the actual team names of the fixture rather than the ids. I, then, joined in `team_mapping_df` in relation to the team each player plays for.

I, cleaned, the data in relation to `position` (case when statement relating to `position_id`), `value` (making value match it precisely as 4.5million is originally 45) and changed the data type of `expected_goals`, `expected_assists`, `expected_goals_conceded` to numeric from string. The final cleaning I did was to finalise `fixture` by cleaning `home_or_away` and concatenating that after `fixture` (which, after original cleaning, is the abbreviation of the opponent team).


The next steps was to create the `expected_fpl_points` which I defined as:

  • Expected Goals (a field within the data frame)

  • Expected Assists (a field within the data frame)

  • Expected Minutes (round `minutes_played` played to the nearest 10)

  • Expected Goals Conceded (round `expected_goals_conceded` to integer)

  • Expected Clean Sheets (when `Expected Goals Conceded` is equal to 0 and `Expected Minutes` is above 60 minutes)

  • Expected Bonus (Sum of non-Expected Bonus ranked for each fixture. 1st gets 3 points, 2nd gets 2 points and 3rd gets 3 points).


The next step was to turn game-by-game into round-by-round. Occasionally, teams will play two games in a single round which is how FPL scoring is calculated. This was done in two parts (additional information and aggregatable data) to allow for correct data. The additional information part is in relation to fixtures (which needs to be a concatenation of both fixtures) and value (which is a constant).

The other values need to be aggregated based on a `group_by()` of `player_id`, the collection of names (`first_name`, `second_name`, `web_name`, `player_name`) along with `team` and `round`.


The final step in the data extraction was to create the season-long data, which is essentially a `group_by()` of the round-by-round data without `fixture(s)` and `round`.


The next part was deciding the visualisation tool that I was going to use, and I decided to continue developing my skills in Shiny (R). Because of the change in programming language I downloaded the csv files for the FPL data (round-by-round and season-long) and put them into a GitHub repository before downloading them in R.


The design of the dashboard was to have two tabs (one for round-by-round and one for season-long) with the round-by-round having a Manhattan-inspired chart. For those that don't know, Manhattan graphs is how cricket represents runs for each over as well as showing other information such as Wickets, PowerPlay overs and Run Rate required.

I adapted the approach to have Points for each Round (as the Bar) with Goals, Assists and Clean Sheets above the Bar, then had Expected FPL Points as a line graph.


For both the round-by-round and season-long, I used a combination of three tables (Actual Points, Expected Points and Points over Expected) using a parameter input called `Table Focus` to only show the data that the user wanted to see. The tables can be sorted by all columns and filtered by multiple fields too.


I have a couple of ideas where I would like to take the dashboard in the future, including full automation for keeping the data up to date, and potentially looking at historical seasons and creating a custom Dream Team function.


If you would like to look at the dashboard, the link is: https://zacrogers.shinyapps.io/FPLShiny/.

80 views0 comments

Recent Posts

See All
bottom of page