A journey into rugby’s numbers

Date published: July 10 2018

‘Rugby is stats’ said one Editor to me years ago, exasperated at his media mogul boss’ decision to cease paying what was, in truth, an exorbitant sum of money each month to have a vast swathe of numbers surrounding rugby delivered to his inbox every week.

This was in 2004. Moneyball had been published, but the film was still seven years away. Chris Anderson and David Sally’s excellent book on soccer, ‘The Numbers Game’, was still nine years away. Opta Sports was dominating the Premier League in soccer and branching out into other sports – Sky TV’s (then) Guinness Premiership coverage was adorned with all sorts of numbers bearing Opta’s name. Stats were in the ascendancy, yet it wasn’t until Moneyball entered consciousness this side of the pond that stats, sabermetrics, business and financial theory parallels, and deeper dives into the numbers started to be the subject of furtive conversations around commentary boxes, coaching booths and dimly-lit rugby club bar corners.

Even academia has been relatively slow to catch up. There are many sporting – largely soccer – studies on the outputs of teams relative to inputs off the field, such as money spent on players, but a paltry few talk about purely on-field inputs and outputs. One paper on soccer in Spain used shots, attacking plays made, balls kicked into the opposition’s ‘centre area’ and minutes of possession as inputs, contrasted these with the inverse of these four as defensive inputs and took goal difference as the output: the result.

Closer to our home, a set of papers in 2009 from Surrey University’s Andy Adcroft and Jon Teckman used weighted indexing of rugby match output statistics (tries, drop goals, penalties, weighted relative to the teams’ rankings) to see which rugby teams competed and which merely performed at the 2007 Rugby World Cup, and which combine competitiveness and performance to best effect. Another one from Spain by Ortega and others on the Six Nations over four seasons pointed out the value of each output statistic to match results without examining strategic impacts in a game, noting: “…from these results, it cannot be determined whether one form of play is more successful than another. More research is necessary to determine which form of play is more beneficial or provides more technical and tactical game advantages.” So basically, it doesn’t indicate to us how to play the game.

The prices of acquiring deep statistics have not fallen – not without reason, as it is a painfully laborious and time-consuming task to capture the data. In ‘The Numbers Game’, the author noted that in any one soccer match there are upwards of 2,500 ‘events’ to be recorded – that’s not even taking into account aspects of interpreting, contextualizing or measuring. One thing ‘Moneyball’s examination of the differing values of different statistics in ball games also proved quite conclusively was that simply owning the stats has enormous value in itself across all interested stakeholders. Little wonder that those who own the stats hold them to ransom.

But data in sports is becoming more and more prevalent, ransoms or not. Boston’s MIT holds the Sloan Sports Analytics Conference annually in February, an event which has affectionately become known as Geekapalooza for the type of person it attracts. KPMG hosts a Sports Analytics World Series seven times a year at different venues around the globe. Moneyballists are enjoying a population explosion.

At last year’s UK KPMG event, three things were quickly made apparent by several of the speakers. Firstly, rugby was notable for its absence. Secondly, those sports whose leagues are centrally run – meaning whose teams are franchises operated by a central body as happens in NFL, AFL and baseball among others, or whose competition is a single one, such as the Tennis ATP tour – stand at an advantage, as they own their own stats across the entire premier level of their sport and as such can give fans, coaches and analysts all the undigested versions (and yes, it definitely helps that most of those sports is a national league only). Thirdly, and most intriguingly, nearly all analysts remained aghast at the prevalent attitudes of many coaches to ignore data and go with their gut, or more accurately, go with their admiration for one player’s certain skill or attribute and ignore the data demonstrating how irrelevant that particular skill might actually be. Distressingly, when rugby was brought up, many present – Australians particularly – were acidic at how far rugby and its coaches/analysts might be behind in this.

That’s not to say there’s nothing out there, or nobody doing good work. A good part of Saracens’ recent run success can be attributed to the work of Professor Bill Gerrard, who was brought into the club by Dr. Brendan Venter to create the club’s own statistics rather than rely on info bought from a third party. Emphasising the ‘Doctor’ aspect of Venter’s name is important: here is a coach who actively embraced the scientific approach to all facets of his tenure. But his academic background gave him the capacity to do so; coaches without that studied understanding feel, in the words of one attendee to the World Series, “threatened by data, because it often points out things that they don’t see, so don’t trust.” As one of ‘The Numbers Game’s authors said at one of the KPMG events: “I don’t think football analytics has progressed as far as it could. In fact, in many ways it has stalled… It is in danger of becoming another fad… because nobody’s quite sure what to do (with it).”

That is also, in part, because it is not always right with its direction. Even in ‘Moneyball,’ one of the biggest lessons learned was that relying solely upon data only gets you so far. The subjects of the book, the Oakland A’s, generally made the play-offs but never won the World Series. Why? Well, data reliance will give you success over time, but in those one-off play-off games, single events can unfold which render long-term data useless. Because, in the words of Jonah Hill’s frustrated analyst in the film: “…the sample size (in a one-off game) isn’t big enough (for the data to give enough relevant guidance).”

Analysts at the sports data conference also conceded this. The accepted maximum success rate for an entirely data-driven approach to on-pitch strategy was discussed by a sample of analysts at being somewhere between 60 and 70 per cent at best. One analyst from the AFL looked at the points his team had accumulated over a season and put nearly 60 per cent of them down to luck – i.e. things that his team had no control over. Analysts had to confess to coaches that their statcrunch-based strategies would quite often be wrong, the complete antithesis to the typical coach who by nature is a raving control freak and accepts no failure.

But for us fans and observers whose livelihoods don’t rely upon winning and losing and just love the game, data filtered through the media from media companies for ‘McAnalysis’ during matches on television is frequently useless to enhance our understanding. We see stats on tackles, or metres made, or set-pieces won or lost during matches, but none of it is contextualized properly. Clubs such as Saracens keep their data tight to their chests for good reason: it’s central to their strategies, so why would they risk opening it up to a public where someone is going to see through their secrets?

My colleague Sam Larner wrote a piece on statistics a month or so ago looking at going beyond statistics and outlining the effort that goes into interpreting data. It illustrated with huge clarity just how much potential for analysis is there, just how many facets of the game can be analysed and just how many different methods there might be to chop, contextualize, dig through and interpret the vast swathes of data from a single game to an entire season. It’s a monumental task. And despite the advances and accelerations in big data technology, metric theory, video analysis technology and data capture, there still remains so much about rugby that data could tell us which we simply don’t know.

So over the coming weeks and months, we at Planet Rugby are going to open the conversation and see what ideas are out there. Once a fortnight, we’ll be looking at a particular statistical aspect of a particular game and see if we can find some data pointers to enhance our understanding of our beautiful game.

We’ll go from micro-aspects, such as trying to fathom whether there’s an optimal amount of time for the ball to be in a ruck, or an ideal strategy for which line-out call to use in different areas of the pitch, what constitutes a successful pass, to macro-aspects such as how teams can best use the ball in different situations, whether there’s an ideal attacking shape in open play… the list of possibilities on both sides is large at the very least.

And we want to make this collaborative, as we recognise there are thousands of you reading this with deep rugby knowledge and with your own theories about the game that can – even to an irrelevant sample size such as one game – be vaguely tried and tested.

So email adam.kyriacou@planetsport.com your thoughts and ideas on this, your questions that could be asked and answered with data, even those pertaining to a particular match or a match involving your own non-elite team, and we’ll pick and choose one each fortnight and do our level best to observe, count, define, crunch and see what the data might provide as an answer.

Looking forward to it!

The Journey Into Numbers project is being run by Lawrence Nolan – Loose Pass will return next week and every other week after that.