GAVIN FLEIG is the head of performance analysis at the reigning Premier League champions, Manchester City. He heads up a full-time performance analysis staff of 10 people at City alone, with six focussing on developing players and four on first-team analytics. With ‘soccermetrics’ analysis still in its infancy, City are about the launch a ground-breaking project to take the subject well and truly mainstream. By making public vast reams of data – worth tens of thousands of pounds but given away free from Friday – Fleig and the analysts at City hope to take football’s understanding of this subject to a new level, and perhaps unearth the next Bill James. And rewards are on offer, as Fleig explains.
16 August 2012
On the eve of the 2012-13 Premier League Season, we are proud to launch MCFC Analytics, a game-changing approach to data sharing that aims to support and inspire the growing football analytics community.
Over the last three years it has become increasingly apparent that the field of performance analytics in football has been held back for one simple reason – the lack of publicly accessible data.
Analytics across American sports has flourished over the past two decades with myriad high profile success stories, the most notable of which saw the story of the Oakland A’s baseball team dramatised in the film Moneyball.
Yet their story is the tip of the iceberg and many more similar successes from within the NBA, NFL, MLB and NHL have yet to be told.
Analytics and objective analysis is in the very fabric of those sports. There also lies within them a culture of analytics amongst the franchises themselves as well as the fans, the media, bloggers and students alike.
As far back as the 1970’s Bill James – one of the early pioneers of data analytics in baseball – was using open data to develop new ways of analysing player performance and also objectify the very make-up of the sport, by analysing the data and challenging long held assumptions.
Much of the excellent work now done in American sports has stemmed from the pioneering approach of the people who work outside of the training ground walls. The reason was and still is that making the data publicly available drove forward the US sports as a collective.
Let’s not overlook the excellent work that has been taking place within English football, and indeed other national leagues, within the clubs themselves over the past decade.
There has been a very successful culture of performance analysis and use of data within the club structure, but the speed of growth for the discipline of performance analytics is essentially in the clubs’ hands – it is they who have bought the data at significant cost and the rest of the analytics community simply do not have access to the data at the same level.
Furthermore, the reality is that playing between 40-58 games per season means that the day-to-day demands around the team make it very difficult for a lot of clubs to spend any significant time in this space, further limiting the opportunity for this discipline to evolve.
The development of performance analytics is a passion of ours and one of the primary focuses of the Performance Analysis department at Manchester City Football Club. We see it as the responsibility of those in such a fortunate position to support the analytics community and share what tools, resources and insights that we can in order to accelerate this growth.
There are many people in the analytics community right now who have the skills, desire and vision to make a difference in performance analytics, people who can add significant value such as Bill James did in baseball. (Leading to work by Billy Beane, played by Brad Pitt, pictured in Moneyball).
But those people have no significant data to work with and, with the support of Opta, we hope to change this. Opta have been our data provider for a number of years and have one of the most extensive data sets and detailed coding templates available.
We are launching MCFC Analytics for this very reason, to share the data that we have within the analytics community and, further to that, engage directly with the resulting community to see where this collaboration can takes us all.
Our focus is to not only provide the data and to create a community of analytic research, but to hopefully inspire a cultural change in the way that data is perceived, analysed and distributed.
When people see the value of open data sources and experience the speed of development that follows, we hope that it will lead to a new culture of open data in football. Only time will tell.
Everyone has a unique opportunity to be part of this community. Those that register their interest in this project (by visiting www.mcfc.co.uk/mcfcanalytics from Friday) will be sent our Opta data set for every ‘on the ball’ event for every Premier League player in every match in the entire 2011-12 Premier League season, free of charge.
This data is designed to be broken down, analysed, graphed and visualised however you see fit.
We have made it available to encourage and inspire the next level of analytics.
Our hope is that the data is used to create new performance measures, tools for player/team comparison and profiling, season-long analysis for benchmarking players’ performance and contextualising these performances based on playing position and opposition.
We will work directly with those people in the analytics community who come up with good concepts and more importantly, connect them with others who are working in the same research area. A collective approach will help the community to make more rapid progress and we want you to join us in being part of it.
The data is for everyone and anyone. Students – use it for your dissertation work. Bloggers – use it to write your analytics articles. Statisticians – use it to identify new modeling techniques. Arm-chair enthusiasts – use it to prove your mates wrong!
Once you ‘Join the Community’ you will be sent the data via email within 48-72 hours from Manchester City.
If you find value in the data, tell someone else about it, but please let them download the data themselves. Remember, the data is only one part of this concept. We want to grow an analytics community that can work together to drive development in this area, but to do that we need to know who that community is.
If you are a university course leader, we’d ask you don’t download the data and hand it out but instead ask your students to take 30 seconds to download it themselves.
To continue giving exposure to this discipline we will be running a research competition in the coming months whereby the work submitted to us will be reviewed by our Performance Analysis department and Opta, with the best projects being published on our MCFC Analytics page and the Opta Pro website to share with the world.
Furthermore, as recognition of the contribution to the performance analytics discipline we will invite those with the best projects to come to our training ground to present their work and to then share with us the match day experience, observing how the Performance Analysis department supports our first team during a Premier League game.
A full outline as to the data set(s) available and details on how to join the community is on our www.mcfc.co.uk/mcfcanalytics homepage, which goes live on Friday 17 August.
You can email us at firstname.lastname@example.org
You can follow Gavin Fleig on Twitter at @MCFCGavinFleig