My journey into expected goals (xG)

As a primer for this post I read the blog posts listed below to try and understand the term expected goals (xG). I have been reading StatsBomb for the last year or so, and have been trying to get my head round how data is used to analyse football. In particular, I want to see what I can do in terms of using this to understand the change, if any, in Boro this season.

An expected goals model in 20 minutes (ish) – what I used to create my xG model for the Championship

11tegen11 – Expected goals model – 11tegen11’s xG model

xG on MOTD – StatsBomb article by Ted Knutson (data analysis expert who has worked at Brentford) explains how xG has gone mainstream

xG collection of posts – Michael Caley’s posts on a range of elements of the xG model

So after getting my head around this, and totally changing my approach to what I wanted to include on the blog, I set about creating an expected goals model for the Championship from the last 4 seasons to see how well Boro were performing this season.

I used Paul Riley’s xG in 20 minutes model and it was pretty easy to follow. I used the data available from WhoScored for the last 4 seasons in the Championship to collect the shot data. After some tinkering around in Excel I found the following data and an xG value for the three shot types that WhoScored collects.

 

All Outside Six Yard Penalty Area
Shots 57705 25145 3684 29347
Goals 5182 952 1170 3441
xG 0.09 0.04 0.32 0.12

 

The first row of data shows:

  • total shots in all matches
  • total shots outside the box
  • total shots in the six yard box
  • total shots in the penalty area (outside 6 yard box)

The second row of data shows:

  • total goals in all matches
  • total goals outside the box
  • total goals in the six yard box
  • total goals in the penalty area (outside 6 yard box)

The third row of data shows the expected goal (xG) for each shot type. This is obviously less than 1 as not every shot results in a goal. In the easiest terms, for the outside of the box xG, 4 out of every 100 shots are scored.

 

So I had data from 57,705 shots, which is a great deal, but nowhere near other models that use anywhere upwards of 100,000. As Paul Riley notes in his post, the model is a decent equivalent to what is out there at the moment, but by no means the most accurate. I will complete some comparisons over the course of the season to see how it measures up to other models.

As I have said there are a range of xG models, which use a lot more data and therefore be will be much more accurate than this model. For example, other models include how the shot was assisted, a more specific location for the shot to name a few. Despite this, I will use this model, as well as others listed at the end of the article, to inform some of my analysis of Boro’s performance this season.

Here are some other blog posts and articles that explain the xG model and how it can be used to assess prior performance and predict future performance.

Why use xG?

Paul Riley’s xG model

There are many more out there and they make for interesting reading and a completely different way to think about shots and goals in general, especially when you see your centre midfielder sprinting towards a loose ball 25 yards out ready to unleash a rocket into…the top row of the North Stand. Well, only 96 times out of 100 anyway.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: