As a primer for this post I read the blog posts listed below to try and understand the term expected goals (xG). I have been reading StatsBomb for the last year or so, and have been trying to get my head round how data is used to analyse football. In particular, I want to see what I can do in terms of using this to understand the change, if any, in Boro this season.
An expected goals model in 20 minutes (ish) – what I used to create my xG model for the Championship
11tegen11 – Expected goals model – 11tegen11’s xG model
xG on MOTD – StatsBomb article by Ted Knutson (data analysis expert who has worked at Brentford) explains how xG has gone mainstream
xG collection of posts – Michael Caley’s posts on a range of elements of the xG model
So after getting my head around this, and totally changing my approach to what I wanted to include on the blog, I set about creating an expected goals model for the Championship from the last 4 seasons to see how well Boro were performing this season.
I used Paul Riley’s xG in 20 minutes model and it was pretty easy to follow. I used the data available from WhoScored for the last 4 seasons in the Championship to collect the shot data. After some tinkering around in Excel I found the following data and an xG value for the three shot types that WhoScored collects.
|All||Outside||Six Yard||Penalty Area|
The first row of data shows:
- total shots in all matches
- total shots outside the box
- total shots in the six yard box
- total shots in the penalty area (outside 6 yard box)
The second row of data shows:
- total goals in all matches
- total goals outside the box
- total goals in the six yard box
- total goals in the penalty area (outside 6 yard box)
The third row of data shows the expected goal (xG) for each shot type. This is obviously less than 1 as not every shot results in a goal. In the easiest terms, for the outside of the box xG, 4 out of every 100 shots are scored.
So I had data from 57,705 shots, which is a great deal, but nowhere near other models that use anywhere upwards of 100,000. As Paul Riley notes in his post, the model is a decent equivalent to what is out there at the moment, but by no means the most accurate. I will complete some comparisons over the course of the season to see how it measures up to other models.
As I have said there are a range of xG models, which use a lot more data and therefore be will be much more accurate than this model. For example, other models include how the shot was assisted, a more specific location for the shot to name a few. Despite this, I will use this model, as well as others listed at the end of the article, to inform some of my analysis of Boro’s performance this season.
Here are some other blog posts and articles that explain the xG model and how it can be used to assess prior performance and predict future performance.
There are many more out there and they make for interesting reading and a completely different way to think about shots and goals in general, especially when you see your centre midfielder sprinting towards a loose ball 25 yards out ready to unleash a rocket into…the top row of the North Stand. Well, only 96 times out of 100 anyway.