Marseille, Lyon, and a dive into Ligue 1 pseudo-xPG data

In comparison to last season, Ligue 1 has had a banner campaign in terms of parity and storylines.

At this time last year, Paris Saint-Germain had a five point lead over second place Monaco and both Olympique Lyonnais and Olympique de Marseille were at least 15 points behind PSG in the title race.

Marseille and Lyon were no fewer than at least eight points back of third place LOSC Lille for the final Champions League qualification spot.

Ligue 1 14

Fast forward to this season and the entire complexion of the league table is flipped on its head:

Ligue 1 15

Olympique Lyon has had a 14 point difference relative to last season and have had their goal difference rocketed to +26. Marseille were the leaders for the majority of the season and have had a +10 difference in goals scored relative to last season. Meanwhile PSG are nine points behind the pace they set last season.

There’s been substantial change in Ligue 1 in terms of how the standings look though goal scoring on the whole hasn’t moved as much as the amount of craziness would otherwise tell.

Despite the offensive explosions of both Olympique Marseille and Olympique Lyon, Ligue 1 is averaging 2.43 goals per game which is a slight upgrade over the 2.395 mark last season, though it’s still the lowest among the top five European leagues.

Explaining the difference in team performance from one year to the next is something that football is striving for, a never ending journey filled with narratives and deviation. As football has evolved over the last couple of seasons, so has the data for it, which has helped in going past cliches put forth by the punditry in media.

We’ve gone from the use of the use of metrics like TSR (total shots for/ total shots for+ total shots against) and PDO (Conversion percentage + save percentage) pioneered by people like James Grayson to much more intricate data uses like Ben Pugsley’s measures of football teams in different game states, which helps in going past what Grayson help built and allows us a deeper look at how teams behave when down two or up two.

Also another type of metric that’s come up has been perhaps the most fanciest of new data trends in football: expected goal data, which takes into account shot location and the type of pass that’s preceded the shot.

Expected Goal Ratio compared to Total Shot Ratio did a better job in its repeatability according to 11tegen11. From Michael Caley to Colin Trainor to 11tegen11 to Paul Riley, ExPG has taken the baton from TSR and helped build upon it. But why listen to me blabber about it when you can just watch this articulate it so much better?

It’s been something that I’ve been intrigued with doing for quite a while, however what makes ExPG what it is is how exhaustive the data for it has to be. Accounting for shot location, the body part in which the shot was taken with, how the pass was delivered is just some of the qualifiers to track down.

It’s something that I quite honestly don’t have the time nor capabilities to do at this moment, but I wanted to get any semblance of the powers of it.

With the help of Ben (@stats_snakeoil on Twitter ) and Seth Dobson, I created some pseudo ExPG data (the name probably needs some work) for Ligue 1 offenses this year via WhoScored’s detail tab.

How I got the data was by simply getting conversion rates for shots outside the box, inside the box and shots from headers. To do that, I had to assume that all headers outside the penalty area were = 0.

The obvious main flaw with the data is it’s only accounting for offense and not the defensive side of the work, therefore comparisons to the explanatory and repeatable effects that Shots on target ratio and TSR hold are pretty much futile.

In the future I’ll hopefully get this corrected so I can see how the pseudo ExPG ratio numbers work with SoTR and TSR.

For now, I can still compare the data by evaluating it only against other offensive data in Shots on Target per game, % of shots that hit the target, and total shots per game.

What I did first off is see how the individual data rated against Pseudo ExPG data in terms of the R2 correlation with open play goals from individual seasons.

The data used is from the 2013-14 season:





So going off of the 2013-14 Ligue 1 season, The ExPG data has the strongest correlation to open play goals thought shots per game was quite close. ExPG data had a 85/15 skill to variance breakdown while Shots per game had a 82/18 ratio.

The percentage of shots that hit the target had a 25/75 breakdown while Shots on Target per game was 59/41. I won’t lie, at the time I was quite shocked to see that shots per game had a very strong correlation to the amount of goals scored last season and challenged pseudo ExPG data the way it did.

What I did afterwards was try and just do a plan ratio of TSR to points from that season to see how much of a correlation there was for it. I’ll save you the graph but I got an R2 of 0.7479, which gave me a 87/13 breakdown. It’s about as close to a perfect 1 as one could get for doing stuff like this.

Sifting through the table made it clear that shot quality for that season wasn’t too big a factor at all. Teams that shot a lot and didn’t conceded shots essentially got what they deserved except for a couple of clubs in Reims and Rennes who had high PDO numbers and near .400 TSR numbers.

I did the same thing with the 2012-13 season. Instead of showing graphs I’ll just post a table of the R2 values.

Data Type R2
Pseudo ExPG .5843
Shots per game .3438
SoT per game .3295
SoT% .1841


We get a much bigger distinction between my ExPG  data and shots per game, with a 17% gap between the two. Furthermore there’s a big decrease in the single season relation between Shots per game and goals from 2013-14 and 2012-13. in 2012-13, the skill/variance level for shots per game to goals was only 59%. 59% isn’t the worst number but in comparison to 2013-14, it’s loses a lot of its explanatory power.

It actually also lines up quite nicely with the significant decline in TSR/points, as in 2012-13 the R2 was only at 0.4255 which has it at 65/35. I should temper it at that though because considering I don’t have the ratio yet for pseudo ExPG, it’d be unfair to just keep assuming that until otherwise.

So the last thing I tried doing is see how the data holds up from one year to the next. Showing the repeatability of the metric would give us the corresponding confidence in predictions for future performance. Here’s the table:

Data Type R2
Psuedo ExPG .6715
Shots per game .4522
SoT per game .542
SoT% .318


So in the end, pseudo ExPG data does the best in predicting future outcomes from one year to the next with a 8% gap in skill next to Shots on Target.

Then again, Shots per Game was the better predictor for goals in an individual season versus SoT so though it’s more repeatable, in those two instances, it didn’t out beat SPG.

So what I can take out of this is that at least on offense, Pseudo ExPG data did a better job in both its repeatability from season to season and how it correlates to goals. Shots per game did very well last year but before that it didn’t do as well. SoT did around the same in both years and SoT% didn’t do much in either one.

I’m tentatively excited with how the results have held up so far. Having pseudo ExPG data give me 85% (2014) and 77% (2013) explanatory power in single seasons is really good considering I didn’t have to track the type of passes pre shot.

Hopefully in the near future I’ll be able to manually collect the data for the defensive work and do a proper comparison to TSR and SoTR. For now, I’ll leave with you with how things stand in Ligue 1 so far.

xPG Ligue 1

The Author


Ligue 1 analyst/writer for Back Page Football. Data is often incorporated. Ligue 1 is really fun, just give it a chance!

One thought on “Marseille, Lyon, and a dive into Ligue 1 pseudo-xPG data

Leave a Reply

Your email address will not be published. Required fields are marked *