π•Šπ•¦π•£π•‘π•£π•šπ•€π•–π•€ 𝕠𝕗 π•Žπ• π•£π•π•• ℂ𝕦𝕑 𝟚𝟘𝟚𝟚: π”½π•’π•€π•”π•šπ•Ÿπ•’π•₯π•šπ•Ÿπ•˜ 𝕄𝕒π•₯𝕙𝕀

Surprises of World Cup 2022: Lessons from Mathematical Models

Keywords:

AWCuPreM, geometric synergy, GIS, mean marginal advantage, mentality premium, model conversion factor, serendipity, soccermatics

β€œMathematician’s model suggesting France can bank on a knife-edge 1.1% mean marginal advantage over England in the World Cup Quarter-Finals, likely a one-goal difference. If serendipity favours England at 70%, then England can emerge victorious, even with a two-goal difference. If France snatches lady luck from England, then England will be soaked in bitter tears as France widens the gap, even by at least three goals.”

  1. The mathematical prediction model applied here was conceptualised, calibrated, and tested during the 2018 World Cup to establish nine crucial variables of winning the matches. The model has been improved further using 24 group stage matches of the 2022 World Cup to achieve a high prediction accuracy of 80%, which is becoming better as the game progresses towards the final stage – due to the sieving and reinforcement learning process involved.
  2. From the knock-out stage towards the Quarter-Final, the model mean marginal advantages have become more concrete, such that a mean marginal advantage of 1.7% is likely to result in a difference of one goal, based on the model conversion factor. The upper margin for a draw is 0.7% from the model.
  3. Quality coaching with leadership in attitude development and investments in attracting, skilling, and retaining key talents in the form of top scorers and goalkeepers, are the bare minimum required to disrupt the pre-quarter-final exit tradition that has been the tired story of African teams.
  4. The priority intervention areas for African teams are: mentality, score drive, tactical inventiveness, and team coherence.
  5. Setting and mapping out the critical spatio-temporal and geometric aspects of the training ground/a standard pitch and the practice sessions therein should benefit from applied geospatial technologies, as this will help enhance the precision and accuracy of shot conversion into goals. This, therefore, speaks to the need to engage geospatial expertise and apply GIS technology as part of football coaching.
  6. Going forward, policy development to effectively address the nine crucial variables of winning in the tournament is the critical challenge Sports Ministers in Africa must champion and steer with zeal and integrity.

A World Cup of Inordinate Surprises

The World Cup captures the imagination and dreams of nations. It brings us together in diversity. Isn’t the World Cup, therefore, a great opportunity to popularise and socialise mathematics, which is otherwise a dreaded subject boring to the majority within classroom walls? Are mathematics scholars modelling the World Cup to prove to the world that mathematics is the most powerful language for rational decision-making, based on data?

The author of Soccermatics, a professor of mathematics called David Sumpter, has already shown the way in applying mathematics to model football outcomes. His approach is advanced, targeting postgraduate students. To demystify football modelling and present a simple but useful model, I tested a model during the 2018 World Cup, improved it to expand from six to nine variables, and I’m now applying the improved model to predict the outcomes of the more intriguing 2022 World Cup.

The 2022 World Cup has rightly earned the title of a tournament of inordinate surprises. The model presented here has confirmed the same. The group stages proved the model to be grossly off in four out of twenty-four group-stage matches that the model predicted. The four surprises, outliers par excellence, were:

1. Group D: Denmark against Australia β€” The model gave Denmark a 6.5% mean marginal advantage over Australia, but Australia won 1–0.

2. Group E: Spain against Japan β€” The model gave Spain a 5.5% mean marginal advantage over Japan, but Japan won 2–1. The funny claim from the model was that Japan needed an ocean of good luck on every coordinate in the rectangular pitch where Spain could draw only droplets of it.

3. Group G: Brazil against Cameroon β€” The model gave Brazil a 7.9% mean marginal advantage over Cameroon, but Cameroon won 1–0 and became the first African country to defeat Brazil at the tournament, ending Brazil’s streak of 17 unchallenged instances at the group stage.

4. Group H: Portugal against South Korea β€” The model gave Portugal a 7.9% mean marginal advantage over South Korea, but South Korea won 2–1.

Soccer and Mathematics

The nexus between soccer and mathematics may look far-fetched. Nothing could be farther from the truth. Mathematical lessons from soccer are critical to decision-making, to help increase the winning chances of teams from Africa, which have been yearning for a place in the advanced stages of the World Cup for decades. Kenya’s Sports Minister, for example, has promised the country a place in the World Cup in 2030. There is a real and urgent need to involve mainly mathematicians, scientists, and surveyors in recalibrating Africa’s football to reach greater heights.

Soccer has been referred to as a game of chance. This statement must excite any mathematician, because it makes soccer the practical go-to arena for applying axiomatic probability. Prof. David Sumpter has even gone further to show the way, by developing Soccermatics as a postgraduate course in applied mathematics for the football industry. Creative application of knowledge knows no boundaries β€” interesting!

Unlike the advanced approach by Prof. David Sumpter, the model shared here is simplified to be readily digestible to high school and college students. The goal is to inspire learning by application while exploiting the infectious, pervasive and global fascination of the World Cup moments. The model draws on real-world examples, which have been used to predict the outcome of the games played at both the 2018 World Cup and the continuing 2022 World Cup. The model predicted France’s victory in the 2018 World Cup, giving it a 1.9% mean marginal advantage over Croatia. The 2018 World Cup experience has led to the establishment of nine variables of key importance in football modelling. The insights and lessons drawn from this experience are key to socialising and popularising STEM education while nurturing scientific curiosity, innovation, and creativity among young learners.

The mathematical prediction for the foreseen tough duel between France and England in the 2022 Quarter-Final scheduled for 10th December is an interesting case in point worth sharing here, scripted from the model as follows:

β€œMathematician’s model suggesting France can bank on a knife-edge 1.1% mean marginal advantage over England in the World Cup Quarter-Finals, likely a one-goal difference. If serendipity favours England at 70%, then England can emerge victorious, even with a two-goal difference. If France snatches lady luck from England, then England will be soaked in bitter tears as France widens the gap, even by at least three goals.”

Soccer as a game involves balancing risks with rewards, the quality with the frequency of scoring chances, and the defensive with the offensive forays. With modern technology, we can measure and generate distribution and heat maps of the team-specific events on the pitch, geometric synergy, the launching angle and how it influences the range within which a shot can land inside the net, which is a sine function of double the angle, and statistics on ball possession, pass accuracy, shot conversion, and historical precedent β€” an aspect of path dependence.

Goal expectation (xG) is calculated as a mathematical expectation, the product of the probability of scoring a goal and the frequency of shots. It is the weighted average of scoring goals as successful events in the rectangular football pitch accommodating random variables. The football pitch is itself a product of applied geometry, making it an object of fascination for any surveyor.

Surveyor’s Perspective

A surveyor will naturally geo-reference the football pitch before any further analysis, then further divide it up into regular units rich in actionable location-based intelligence: polygons made up of a series of (x,y) coordinates in the case of vector data and assuming plane surveying due to the scale involved here, or pixels in the alternative case where the pitch is captured as raster data – an image file with a defined resolution in this case, and also amenable to digital image processing based on unique signatures. On this stage, a teams’ destiny is decided usually within two hours β€” either tears of joy or tears of regret. The shooting angle matters depending on the location of the striker in the pitch and the kinetic energy of the ball as it works against headwinds or gets favoured by tailwinds. To a surveyor, these winds causing air resistance fit into the causes of errors, random errors in this case, which can be resolved by taking the mean values. The wisdom of swapping the goals for the teams during the second half is understandable based on this explanation of the randomness of such factors like wind speed and direction. Machine learning based on the football statistics generated from various techniques and sensors makes it even more interesting to simulate outcomes with amazing accuracy. The big data generated from these frenzied soccer dynamics promises an exciting future for football modelling.

Fundamentally, a model should be as simple as possible, yet as complex as necessary to contain all the key variables. Models are like working hypotheses, made better with time as learning from data progresses. After all, all models are wrong, but some models are useful, just as the British statistician called George Box rightly put it. The usefulness of the World Cup model shared here can be judged based on its purpose, which is predicting the most likely winner in a match.

Nine Crucial Soccer Variables

The modelling experience has identified nine variables which are key to delivering a win in this exciting game. From experience, key informant interviews, observation, expectation, and historical precedent, the modeller assigns the percentage score of a team on each variable.

The methodology used to come up with the model was based on the following fundamentals:

  1. Systems thinking skills
  2. Probability and statistics
  3. Numerical analysis
  4. Geometry and spatial thinking
  5. Projectile motion
  6. Trend analysis and pattern recognition
  7. Permutations and combinations
  8. Decision matrices
  9. Set theory
  10. Expert elicitation surveys
  11. Common sense

The variable of climate goes beyond temperature and humidity to include the general fit into the new environment to the players, home advantage included.

Resistive nucleus includes the quality of a team’s defence and quality of the goalkeeper, both key to preventing an opponent’s goal. Serendipity is about the stroke of luck inherent in random chances that can favour a team, especially the presumed underdog. These lucky factors tilt the scales and cannot be underestimated in this game of chance.

Mentality is key to performance, not only in soccer. There are teams that enter the match with a mindset of possibility and winners. If such a mentality does not degenerate to overconfidence, the rewards are always evident. The other variables to watch and score for each team are tactical inventiveness in exploiting subtle opportunities with high probabilities of scoring β€” think of Messi’s mellifluous dribble, honed skillsettenacity gradient in the sense that it tends to wane with time as the game advances, score drive as seen in gifted scorers through the share of attempts that are likely to be converted into goals β€” including penalty shots, and team coherence as evident in the accuracy of passes, ball possession, and geometric synergy in the field, among others.

Supported by statistics on goal expectation, weighted averages of all the scores on the variables tend to reveal the team that is likely to win β€” with or without serendipity. This is how it can be predicted that France can bank on a knife-edge mean marginal advantage of 1.1% over England in the upcoming Quarter-Final. With luck on the side of England, the outcome can flip and reward England with up to a two-goal difference. If luck shifts to France, then France can open wide at least a three-goal difference.

Progressive Model Prediction Accuracy

The earlier stages of the game are more challenging to predict, but they help in calibrating the model for validation in the subsequent stages. As witnessed here, the model performed much better as the game advanced. Out of the 24 group-stage matches used for calibration, the model predicted 20 matches with an estimated accuracy of 80% based on a scale level of β€œ1 = grossly off” to β€œ5 = spot on”.

Progressing to the knock-out stage and subsequent ones, the model becomes more laser-focused in predicting outcomes. This is because of the sieving and learning process, making it easier to arrive at a more nuanced and accurate parameterisation of the key variables. For example, at the time of writing this article, six out of the eight knock-out matches had been played, and the model got all the predictions spot on (level 5). The two matches remaining followed suit as confirmed in the updated version of this article. A summary of the knock-out stage predictions is shown below. All the model predictions and interpretation of possible scenarios can be accessed from this link: Nashon Adero | Facebook.

Model predictions and actual outcomes at the World Cup 2022 knock-out stage

Model Insights and Lessons

The simplified nine-variable model is a practical soccer prediction alternative. Parametric strategy and policy design for transforming national soccer flows from the intelligence derived from this modelling experience.

Towards the Quarter-Final, the mean marginal advantages in the model become more concrete, such that 1.7% is likely to result in a difference of one goal, based on the model conversion factor. The upper margin for a likely tie/draw tends to be 0.7%.

In the case of Brazil against Korea, the model’s 8% difference in favour of Brazil easily translated to four goals in the first half. In the case of Croatia against Japan, the model’s 2.8% difference in favour of Croatia resulted in a goal difference of just more than one, two in this case. In the case of England against Senegal, the former’s 4.1% mean marginal advantage resulted in a goal difference of just more than 2, three in this case.

We can now reflect on the performance of the African teams against the model shared here. It is now evident that the teams must work harder on these nine variables to match the rest of the world. The priority intervention areas are mentalityscore drivetactical inventiveness, and team coherence. Setting and mapping out the critical spatio-temporal and geometric aspects of the training ground/pitch and practice sessions should benefit from applied geospatial technologies, as this will help enhance the precision and accuracy of shot conversion into goals. This, therefore, speaks to the need to engage geospatial expertise and apply GIS technology as part of football coaching. Quality coaching with leadership in attitude development and investments in attracting, skilling, and retaining key talents in the form of top scorers and goalkeepers, are the bare minimum required to disrupt the pre-quarter-final exit tradition of African teams.

From the experience shared here, there is a legitimate and compelling challenge for scholars to make the 2026 World Cup even more informative with richer modelling techniques that utilise the growing advantages availed through 5G, big data, AI, and GIS technology to share interactive visual maps of this electrifying game with a borderless appeal. Knowledge will have increased substantially by then, according to the Knowledge Doubling Curve. Positive externalities are expected in endearing mathematics and surveying to young learners and the public in general.

Conclusion and Outlook

For easy reference and taking of responsibility, the World Cup Prediction model shared here can now be referred to as Adero’s World Cup Prediction Model (AWCuPreM). The model was conceptualised in 2018 and calibrated based on the 2018 World Cup statistics to satisfaction. The more radical recalibration conducted in 2022 using 24 group-stage matches has enhanced the model’s predictive power, with excellent performance at predicting the 2022 knock-out stage. The following mathematical inequalities have been established from the model as a guide to interpreting the mean marginal advantages in terms of the expected goal differences.

Key to interpreting mean marginal advantages in World Cup Prediction Model (AWCuPreM):

The application of the model to the next stages of the 2022 World Cup Quarter-Finals, Semi-Finals, and Final should help reveal more insights for improvement. There is a huge opportunity for applying modern data-driven and visualisation technologies, such as GIS and Data Engineering, to enhance the accuracy of the model and advance its application range to crucial decision-making and policy simulations that can help make soccer more exciting, educative, and rewarding to STEM students, players, fans, and nations. Going forward in the African soccer agenda, policy development to address the nine crucial variables effectively is the critical challenge Sports Ministers in Africa must champion and steer with zeal and integrity.

Acknowledgement

Wilson Kibe, a young Kenyan graduate from Kenyatta University and a mentee to the author, collected and collated the key datasets that have informed the 2022 World Cup prediction model. The author has been mentoring the youth in skills and career development. He developed COVID-19 models for predicting the rising numbers in 2020 and 2021, which are now part of the book he co-edited in 2021, entitled The Future of Africa in the Post-COVID-19 World. He is also a co-author of a modern book entitled Project Design for Geomatics Engineers and Surveyors (2nd edition).

Nashon J. Adero

Nashon, a geospatial expert, lecturer and trained policy analyst applies dynamic models to complex adaptive systems. He is a youth mentor on career development and the founder of Impact Borderless Digital.