M.Sc Thesis
M.Sc Student Koren Abir Estimating Goal-Scoring Probabilities in Soccer, Based on Physical and Geometric Factors Department of Industrial Engineering and Management Professor Emeritus Avishai Mandelbaum Professor Yaacov Ritov

Abstract

Soccer is considered by many to be the world’s most popular sport. The soccer industry is a billion-dollar industry that is naturally very competitive. Accordingly, in the last several years, there have been many attempts to apply scientific approaches and methods to soccer and its analysis, in order to acquire even the slightest advantage. Eventually, the objective of the game is to score more goals than the opponent and, therefore, goal scoring is the name of the game: this is the motivation for our research, in which we apply scientific methods to estimate scoring probabilities.

It is very reasonable to assume that different attempts yield different scoring probabilities?this is very intuitive when comparing extreme cases like a penalty kick versus a long-distance attempt with many players within range?and therefore we aim to investigate the effects of different characteristics on the scoring probability, and to quantify them. The variety of characteristics of scoring attempts, as well as the high variability within these characteristics, make it challenging to compare different attempts, and thus to assess the scoring probability and the factors that actually affect it.

In our research, we focus on the physical and geometric characteristics of scoring attempts, and we overcome the described challenges via a mathematical model: it gives rise to a measure that enable a quantitative comparison of scoring attempts. This is done by modeling the scoring space that is formed by an attempt, according to its physical and geometric characteristics. Our scoring space models are based on simple assumptions, backed up by previous works that quantify the effects of the physical and geometric characteristics on the scoring probability. We also turn to detailed physical equations for soccer-ball motion, in order to calibrate the model and to adjust its parameters to increase its reliability.

Our models were validated using a data set of 982 scoring attempts, manually collected from the games of London’s Arsenal FC that were played during the 2012/13 season of the English Premier League. The model-required parameters were extracted from each attempt, and the respective scoring spaces were calculated. The statistical relation, between the scoring space and the scoring probability, was then evaluated using non-parametric estimators and a logistic regression. The statistical analyses found the scoring space to be significant in explaining the empirical scoring probability, and furthermore allowed us to evaluate the effects of additional factors, such as type of play (set piece versus open play).

Lastly, we discuss the results and their practical significance. In addition to quantifying the effects of different factors, we apply our results in several ways that provide insights into game analysis and development of training routines. We also discuss the use of our approach and methods to investigate other related areas, like passing or dribbling. This hopefully contributes to a more complete scientific analysis of the game.