In my last post, I showed you how to collect the raw data from www.football-data.co.uk and to use that data to apply a basic strategy against every single match (all 380) in one fell swoop to see if that strategy was a winning idea. Sadly, it wasn't, but this was a reasonable first stab at some back-testing, and hopefully even that simple example will have triggered some ideas in you that you want to test.
As you get more into this, however, you will quickly start to see some problems with the approach I have outlined. Many strategies rely on looking at previous results to assess current form or to help with the selection process of a particular strategy. The back-test that I showed in my last post provides no method to select a team’s previous match or indeed their previous n matches.
There are a few considerations when looking to solve this problem:
- We don’t know the number of games a team has played so far.
- We don’t know if a team’s last match was played at home or played away (different column in the raw data).
- We don’t know how far back a team’s last match was (i.e, the last one could have been postponed).
Strategy Idea:
Let’s take a concrete example that we want to test. I have an amazing idea (yes, another one) to lay any team below odds of 3.00 who’s last match was at home and where they lost that match without scoring a goal. So, if a team lost their last match at home to nil, then they will become a lay target. Okay, so how do I achieve this? I need to:
- Check the odds in their current match (are they equal or less than 3.00?)
- Check if they were at home in their last match.
- Check if they lost their last match.
- Check if they scored a goal in their last match.
If all these qualifying criteria equate to true, then I must assess whether I would have won/lost money laying such a team. In other words, what would my P/L have been?
Previous matches:
So, how do we solve these problems? Well, as in most things in life, there is more than one way to skin a cat. I have a method that I have come up with to allow me to overcome these issues. Doubtless, my solution is not the most elegant or effective around as I’m certainly no Excel guru – but it seems to work okay and should allow you to satisfactorily test a wide variety of strategies that require the resolution of these issues.
Below you can download the latest version of my E0.xls spreadsheet. The original test is still in the workbook, but now I’ve added another sheet and called it “Test 2”.
In this “Test 2” sheet, you will see a whole new set of columns. For the columns G up to K, I have simply reproduced the data within the raw data sheet E0. This is merely a convenience and allows me to manipulate the data free of the actual raw data. As mentioned in the previous post, this is always good practice.
If I wanted to make this sheet into some kind of template, I would also be tempted to turn these various columns into “names”, but I won’t go into that right now as I'm already worried that I'm going to confuse the hell out of you all.
I do realise that, for those of you who are not Excel-orientated, what I'm going to describe below might be a bit too much, in which case you may want to pass on the whole thing – but for those of you who even have some small knowledge of Excel, this is probably all understandable. I hope so anyway.
I do realise that, for those of you who are not Excel-orientated, what I'm going to describe below might be a bit too much, in which case you may want to pass on the whole thing – but for those of you who even have some small knowledge of Excel, this is probably all understandable. I hope so anyway.
Meat and Potatoes:
Right let's crack on. Columns L and M are also copies of columns in the data sheet, and are the home and away match odds. Using these helper columns, I can now start to build-up my lookups properly.
In Columns C and D, I have added a game count for each team (just a simple COUNTIF). Then using that game count and the team name, in columns A and B, I have simply concatenated (joined) these two values together to create a string that I can subsequently use to lookup previous matches. The concatenation is very simple:
cell_ref & cell_ref
For example: =IF(G13="", "", G13&C13)
We are now pretty well setup to perform our previous match lookups. To assess each team’s previous match, we need to do this:
- Generate a lookup string “on-the-fly”.
- Try and find a match for that string in the previous match data.
We have the current game number for both teams, so we need to do something like this:
String = team_name + (current game number – 1)
This is cool, because we can now go back any number of games to pick up the result from previous games. If I want to find out the result of a team’s match 6 games ago, it’s simply (current game number – 6). Columns E and F use this lookup technique to find-out if the previous game was a home game or an away game.
=IF(C22<2, "", IF(ISNA(VLOOKUP(G22&(C22-1), $A$12:B607, 1, FALSE)), "A", "H"))
Hmm, what’s all this then. Looks a bit complicated. For columns E and F I’m saying:
If the number of matches is less than two, then put an empty string there, otherwise perform a vertical lookup (VLOOKUP) of the team_name+(game number -1) on the column containing the home name lookup tags (column A). The "ISNA" bit is saying, if there is an error (in other words, it cannot be found), then that must mean there previous game was an away match, so I can put an “A” in the cell, otherwise I did find it and I can plonk a “H” there.
So, columns E and F are themselves helper columns, but I’ve already used the lookup values in columns A and B to good effect. This is essentially the same kind of lookup I will be doing to test the actual strategy.
The real good news is that all of these columns A to M that I'm talking you through can be largely ignored and you don’t really need to know how they’re constructed. I just thought it would be helpful if you did know. The actual testing of our strategy can be found in columns O to T – but you can use this sheet to test other strategies and helper columns A to M should prove useful for numerous strategy tests.
Testing our strategy:
If you look at row 33 in the spreadsheet (where I have highlighted it in yellow), I have identified Everton as a team that last played at home and lost to nil (row 25). How did I do this?
=IF(OR(H33="",F33="A", C33<2, D33<2, M33>$B$4),"", IF(AND(VLOOKUP(H33&(D33-1),$A$12:K32,9,FALSE)=0,VLOOKUP(H33&(D33-1),$A$12:K32,10,FALSE)>0),1,""))
Again, this looks much scarier than it actually is. Let’s break it down. The first part
IF(OR(H33="",F33="A", C33<2, D33<2, M33>$B$4),""
Is a set of disqualifying criteria. It’s saying: “if H33 (away team name) is empty” OR “if their last match was an away game” OR “they have played less than two matches” OR “their match odds are greater than 3.00”, then leave the cell blank. If all those criteria are met, then I can continue. The next bit:
IF(AND(VLOOKUP(H33&(D33-1),$A$12:K32,9,FALSE)=0,VLOOKUP(H33&(D33-1),$A$12:K32,10,FALSE)>0),1,"")
Is saying: “if the team’s last match resulted in a home scoreline of 0” AND “the team’s last match resulted in an away scoreline greater than 0”, then plonk a value of “1” in the cell. This then acts as a flag that we have a qualifying match. Columns Q to T then work from the fact that this is a qualifying match, and they see if laying the team in question would result in a win or not – which is essentially the same as the simple checks from my first post.
Looking at the results, we appear to have found 16 qualifying matches throughout the season, and have made £63 profit, which is an ROI of 39%. Cool.
Before we get too excited, however, there are a couple of things to note. The data from football-data.co.uk provides average odds but presumably these are Back odds, where as my strategy involves laying teams. The odds would therefore not be as favourable as I am using. Also, the number of qualifying matches is extremely low. I would need to either play with my maximum odds (if you change the figure in cell B4 you can do this yourself) to help me pull in more games, or I would need to run this strategy against an awful lot more matches. With such a small sample of qualifying matches, I am at the mercy of variance. In other words, it could just be a small hill just before the unending gradient downwards.
But once again, I haven’t spent any of my children’s hard earned inheritance (as they guy off Dragon’s Den would say) and, based on the results, I would say I need to investigate and experiment further. Or you can :-)
Hopefully, however, all this meddling has shown those of you who presently don't get involved in testing that it's really not that difficult. Once you get your hands dirty with it and build a couple of tests of your own, you will start to to get an idea of how intriguing it all is.
Even if that amazing Holy Grail of betting is not really out there and waiting to be discovered, you can nevertheless come-up with a half-decent idea and give it a thorough test against a vast amount of historic data in about twenty minutes. This will give you a reasonably definitive answer one way or the other.
Just remember, dont' go trialling it the using the fun method first!
Excellent. Thanks pal, this is just whst I needed. Are you linking your excel to get live odds of any qualifying matche can you cover how to do that?
ReplyDeleteHi Al
ReplyDeleteThe last two posts were exclusively about back-testing, and coming up with strategies that, if proven profitable over the course of several seasons and, perhaps, several leagues, can be applied to the current season.
The method you actually choose (automated or manual betting) to apply any strategy to the current season is not covered at all, and not even touched upon in my posts.
I could do a post on this in the future, but presently don't have any plans to do so.
I'm glad, however, that you got some worth out of the last two posts. Cheers.
keep up the good work soccerdude.
ReplyDeleteim sure we can do lots of backtesting during the summer break! :-)
I am afraid the spreadsheet is not correct.
ReplyDelete1. The column K : Result column needs to look at the Full Time Result (and not part Time result). Your need to look at column G instead of column J.
2. The calculation of the return in incorrect. For example your first return for the match Blackburn3 Everton2. You lay Everton , and Everton wins 1-0; you should therefore lose your stake and not win.
3. The result and return columns were incorrect.
Unfortunately once all corrections are done, the return is
- £ 61.
Happy to send the amended spreadsheet to anyone. Just send me an email to jmf@btinternet.com