Monday 9 July 2012

Calculating Goal Expectancy


Way back in January, I did a post on how to understand Poisson (see entry HEREand how it could be used to estimate various score lines between two teams. This was just a basic get-you-started guide, but it was generally well-received and generated a fair few comments. 


The post, however, while being all well and good, did rather leave everyone hanging because one of the corner-stones of being able to calculate Poisson to any degree of accuracy comes from having each team's attack and defence parameters to hand. When you have the attack and defence parameters, you can then calculate each teams scoring ability - in other words you will know the goal expectancy for each team.


Now that the Euros have come to an end, we have a natural, albeit brief, pause before all the European leagues kick-off again in August and September, so this seems like a good point to pop in some more analytical posts, rather than my normal "ooh", this is what I won/lost kind of entries.

Recap:
If you haven't read my post on Poisson and/or you haven't got a clue what I'm talking about, then do have a read of that post first. Briefly, however, given each team's scoring ability, (or goal expectancy) then we can use Poisson to generate probabilities for any scoreline. This is good for betting/trading in the correct score market, but can also be used for match odds, over/unders and asian bets.


But, as mentioned, the scoring ability bit was omitted from that post, so let's rectify that here. If you remember, in the horrible Poisson calculation we had this little symbol, "µ". This is from the Greek alphabet and is pronounced as "mu". In the Poisson calculation, this was used as a placeholder for the scoring rate of a particular team. So if we wanted to find out the actual scoring rate for Man City when playing at home last season, we would need to work out the actual value of µ. This is achieved thus:

µ = A1 x D2 x H

The value of µ is calculated as the attack rating of the home side (A1) times the defence rating of the away side (D2) times the home advantage value (H).


Oh dear, so we've now got to work-out three different values to get to our scoring rate? Well, we should do really, yes... but we don't absolutely have to do so.


It's at this point that I have a decision to make because there are different methods for calculating attack and defence parameters, and some are easier than others. The best method (in my view) is also the most complicated to explain and the most difficult to implement, and that's a method called Maximum Likelihood Estimation (or MLE). MLE basically uses best estimates for each attack and defence parameter, in addition to the home advantage parameter, by settling on a set of values that produce probabilities which actually reflect the real scorelines when looking at historical data. In other words, MLE is a "best fit" technique.


The problem is that calculating MLE would involve some detailed maths and some medium to advanced Excel, which personally I feel may be at odds with the rather simplistic approach I took when explaining Poisson in the first place. Afterall, what's the point in a dummies guide if the completion of that guide then moves on to advanced calculations?


You may strongly disagree with me here; you may think this is a cop-out or that I'm being outrageously patronising, but for now I'm going to leave MLE behind and show you a more "accessible" method for generating these values.

Before I do so, you may remember that I have already provided one method for calculating these values. I did it in my Fag Packet Calculations post, so have a look at that if you want a really rough-and-ready method.


Goal Expectancy Method:
Okay, let's get down to brass tacks:
  • CityOverallRate = Man City scored 93 goals in 38 Premier League matches last season. If we divide one figure by the other ( 93 ÷ 38 ) we get 2.45
  • WiganOverallRate = Wigan scored 42 goals in 38 Premier League matches last season. If we divide one figure by the other ( 42 ÷ 38 ) we get 1.10
  • CityHomeRate = Man City scored 55 goals at home and 38 goals when away. Again, if we divide one figure by the other (55 ÷ 38 ), then we get a multiplication factor to show us how much better City are at home compared to when they are away. The result is 1.45.
  • WiganAwayRate = Wigan scored 22 goals at home and 20 goals when away. This time we reverse the division (away divide home) because we want to know Wigan's multiplication factor when playing away. This is (20 ÷ 22 ) = 0.90.
Right the home side's scoring rate can now be calculated as:    CityOverallRate X CityHomeRate and the away side's scoring rate can be calculated as: WiganOverallRate X WiganAwayRate

         Man City Scoring Rate = 2.45 X 1.45  = 3.55
         Wigan Scoring Rate    = 1.10 X 0.90  = 0.99

So the goal count looks high. These scoring rate values can be plugged into Poisson now and used to calculate each individual scoreline. In Excel, Microsoft have kindly provided a Poisson function that we can use, so even though I explained the details behind how Poisson works, the truth is that you don't really need to know. Instead you can just plug the appropriate values into the function.

The POISSON() function in Excel takes three arguments; they are, "X", "Mean" and "Cumulative".

If you did revisit my Poisson post, then you'll recognise the "X" value, as that is the number of goals we're interested in. For example, if you want to know the likelihood of Man City scoring 2 goals, then "X" becomes 2.

The "Mean" figure is simply our scoring rate, which we calculated above - so for Man City that would be 3.55.

The "Cumulative" parameter is either TRUE or FALSE. If set to TRUE, then the result will return the cumulative probability between zero and "X", where as FALSE returns exactly "X". For our purpose, we should set it to FALSE. The POISSON function for finding out how likely Man City scoring two goals is then:

=POISSON( 2, 3.55, FALSE)

The answer is 0.181001136, or 18%.

Ideally you should complete a full set of POISSON calculations for all scores from 0 to 10 goals for both the home and away sides. You can then do things like add-up all the home win scorelines, draw scorelines and away win scorelines to produce match odds. The same can be done for overs and unders.

Outstanding Issue:
There is still one other issue that I mentioned in my original Poisson post back in January, and that's the weighting that needs to be applied to Poisson to prevent it from underforcasting score draws and low-score home wins. Judging from my normal output on this type of post, I'd say you only have six months or so to wait before I get round to talking about this. :-)

21 comments:

  1. Nice post about a very interesting subject. I think that the statistical approach to trading is by far the best way. It's the only way we can leave gut-feelings and other crazy stuff at the doorstep and get a fairly clear assessment of the game at hand. Looking forward to the post on preventing underforecasting due around christmas :-)

    ReplyDelete
  2. Thanks Zteff

    I'm actually in two minds whether a purely statistical approach is the best one or not. On one hand, if a pure statistical approach can be successfully formulated, then this would open up the possibility of fully automating a method or strategy - and of course that is highly appealing.

    On the other hand, I also think that if we can take a proven mathematical system and enrich it with things like teams news, injuries, match relevance and the like, then we should be able to lift our strike rate even higher, leading to even greater success.

    Either way, it's an interesting debate.

    I'll try my level best not to keep you waiting until Christmas on the Poisson adjustments!

    ReplyDelete
  3. Great job! But...

    Shouldn't you consider the opponent's defensive parameter when calculating scoring rates?

    I tried to add time-dependence to my estimations which did not perform very well...
    The downweighting of old data goes like this: I multiply the number of goals by "T" on the power of ˙(date_now - date_of_match). Where T is a number close to 1 (eg. 0,997).
    Of course when calculating GOALS/MATCHES the number of matches should also be weighted. ("T" on the power of ˙(date_now - date_of_match) obviously)


    I try to find start weighting from season 2010/11 and then when arrived to 2011/12 I already had reasonable data and that way I continued the estimation until the end of the season. The problem is, that a couldn't find a value for T which gains more profit for every observed league than the default value of 1. What is you opinion about this kind of weighting? Have you ever tried something similar?

    I would be very happy to see a post about MLE! Even in private :)

    RK

    ReplyDelete
  4. Hi RK

    As mentioned, in the post I have plumped for a quick-and-easy method, but you're right, the defence parameter should be considered.

    With regards to your weighting, I'm not sure I fully understand how you're going about it, but assigning greater importance to more recent matches is definitely a good idea. You can back-test your ideas with the data from football-data.

    I may eventually end-up doing a post on MLE if enough people are interested in it. In the meantime, I'm sure you can find out enough by googling it. Some of what you find may be a little cryptic, but it should give you a place to start.

    Cheers
    Eddie.

    ReplyDelete
  5. Great post, Eddie.

    You probably won't be surprised to learn that I fall fairly and squarely in the camp that doesn't believe wholeheartedly in a totally stats based approach.

    I am quite happy now, thanks to your posts and others, to construct my own 'tissues' for games based on the correct score market. From that I can then work out my own prices for the goals markets and match odds. But I then temper my trade with a good old fashioned dollop of common sense and gut feel.

    I'm off to wrestle with MLE now - but am confident that you are able to put it in language which I can understand and am looking forward to that post. So no pressure :-)

    Dave

    ReplyDelete
  6. Further to the above, I have now read the Wiki entry for MLE. I think the base language of the article was English, but as it made absolutely no sense to me whatsoever I really can't be sure....

    ReplyDelete
  7. Hi Dave

    Thanks for your comments. I suspect you and me are probably in the same camp with regards to marrying stats with other criteria (although I'm not sure about the gut feel thing as that can be an "edge" eroder).

    On the MLE, you've managed to both make me laugh and also make me feel guilty about not posting the method up. Maybe I will do so now... and if I do, then I promise to write it in English too ;-)

    ReplyDelete
  8. Oh, and by the way, I've just looked at the Wiki explanation, and I don't understand it either. Not exactly accessible!

    ReplyDelete
  9. like the content of your article, and several others. Hope you don't mind but I have mentioned your writings and put a link to your blog on mine at http://sportstradinghobby.blogspot.co.uk/

    thanks

    Taff

    ReplyDelete
  10. Great article, thanks.

    How would you calculate / compare two teams statistics for last season if one of them is in the premier league last season and the other was in a lower division last season?

    Would you still take their total goals and adjust it according to a league difference? If so, how can we calculate the league differences?

    ReplyDelete
  11. Hi Earnest

    Apologies for not addressing your question earlier.

    The point you have raised is an important one, but would require quite a detailed answer. I already have quite a backlog of posts that I keep promising and invariably never deliver, so I won't make any promises on this one. However, it may be an interesting post to write-up, so who knows.

    Sorry that I haven't answered it here, but I do appreciate you bothering to stop by to read my blatherings.

    Cheers
    Eddie.

    ReplyDelete
  12. hi, I have given some thoughts about this different division comparison and came up with this idea:

    Find promoted teams in the past few seasons and get a RATIO of goals for & against in the previous season vs next full season.
    Then we can apply this common ratio to all promoted teams in future seasons. The only problem in this approach is that it does not take into account new signings, but should be suffice.

    What do you think?

    And I would like to take this opportunity to recommend a post on "Calculation of Odds: Probability and Deviation" http://www.soccerwidow.com/betting-maths/tutorial/calculation-of-odds-probability-and-deviation/
    (particularly the part on value bets) because I wasn't so sure whether to compare my calculations with fair odds (betting chance of winning in the long terms) or with the bookmaker's overround included (to compare which team the market thinks is more likely to win)

    ReplyDelete
  13. Hi Ernest

    I would say your approach is probably worth pursuing, but it may be worth back-testing it to see how this system fared on the historical results.

    Thanks for the link that you gave.

    ReplyDelete
  14. hi, any chance of going over how to add defence quality into this?

    thanks

    ReplyDelete
  15. Have anybody test this prediction method?

    ReplyDelete
  16. Hi there,

    Really enjoying these articles! You mention having to adjust Poisson as it over forecasts 1-0, is there any chance you can elaborate sooner than later, as I am that hooked I really hope the last piece of the jigsaw puzzle comes soon! It's like leaving a prime time tv drama on a cliffhanger at the end of the series.

    Keep up the good work mate, really good read

    ReplyDelete
  17. Hi Kev

    Yes, I have left everyone hanging on that bit, I must admit. I will try and get around to cleaning this all up. There are at least two more posts I need to do on this subject but, as usual, it's a case of finding the time to do it.

    Bear with me. I'm sure they'll turn up sooner or later.

    ReplyDelete
  18. Quick and easy posts. I'm waiting to read about the weighting that needs to be applied to Poisson to prevent it from under-predicting.

    Also, I've searched the internet for calculating probabilities of player's scoring in a match and I've found very little on it. Could you post about that please.

    ReplyDelete
  19. Awesome article!

    Will you do a walkthrough of MLE in excel? I've been trying to extract the parameters with bivariate poisson regressions and diagonally inflated models in the stat software R, but it's nowhere near as easy to use as excel (for non-software engineers anyway).

    So if you ever feel like bringing en MLE in excel template, I'm sure you'll be met with a lot of demand from guys like me trying to bridge the gap between academia and actual systematic betting strategies implemented in excel.

    Great blog anyway, the Poisson article was spot on!

    Cheers,

    Martin

    ReplyDelete
  20. Hi anon, it is indeed my plan to do a simplified walkthrough of how to use MLE using Excel. I did actually plan to have started this by now (if not finished), but I haven't done so.

    I do hope to do so in the near future.

    Cheers

    ReplyDelete
  21. Good article. Note that taking the overall mean of each team and multiplying by the home/away rate, is equivalent to just using their home/away mean goals scored. If anyone's interested, http://scoreline.tips has a nice graphical tool that shows the poisson distribution of scorelines for any match.

    Cheers

    ReplyDelete

Note: only a member of this blog may post a comment.