Statistical Analysis: Soccer Premier League

Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Click Here To Order Now!

Introduction

The competent use of statistical methods is a sound strategy for handling quantitative data. As a rule, it allows not only tracking measures of the central tendency of a data set but also to find hidden patterns. Thus, the statistical analysis aims at qualitative data processing and offers possibilities of its convenient visualization, simplifying the interpretation of difficult tables. The present project also used statistical analysis to process numerical data, and it was shown how the use of statistical tools makes working with data sets easier.

The research interest of this paper was the use of statistical analysis techniques on sports data. In particular, data on twenty soccer clubs were prepared in advance, with the number of matches played, wins, losses, and draws, as well as the number of goals, scored and conceded. In general, this is an available online information, which in itself is a very ambiguous description of a particular club. In the present project, however, it was decided to use statistical tools, including measures of central tendency and charts, with which it becomes possible to compare and contrast the data collected (Bhandari, 2022). This will answer one of the main questions of the entire research project, namely, to determine which soccer club(s) prove to be the most advantageous. The personal motivation in obtaining this answer stems from my desire to become a true professional soccer player, so this practice is a kind of analysis of opportunities and weaknesses among Premier League clubs.

Methodology

The methodological basis for this project was the use of statistical analysis aimed at solving the problem in which I was interested. In particular, there was previously collected data on twenty soccer clubs in the Premier League (Appendix A), which included the name of the team, the number of wins and losses and draws, the number of goals scored and conceded, as well as their difference. In addition, the number of points assigned to each club depending on the outcome of the game is used: two points are added if the team wins, one point for a draw, and zero points if the team loses (Furniss, 2021). Thus, the Points variable is an excellent historical criterion for ranking a team, as the team with the most points is most likely to win and thus is the strongest.

The data for each game was collected using Wikipedia, so it is secondary in nature. It should be emphasized that all data was current as of February 10, 2022, and values may have changed since then (Premier League, 2022). It is noticeable that all of the variables used are quantitative, so values of mean, standard deviation, maximum and minimum values were easily calculated for them. In addition, frequency histograms were constructed for these data, allowing us to visually identify general trends and create judgments based solely on the graphs.

Calculations and Graphs

Because all of the data used in this project were of the quantitative type, it was relatively easy to calculate the values of the central tendency measures for them. Specifically, formula (1) was used to calculate the mean and formula (2) to estimate the standard deviation. In fact, all calculations were not done manually but using the built-in MS Excel functionality, but it was important to understand the processes behind specific program functions in order to comprehend critically.

Table 1. Summary statistics of the central trend measure for all clubs

MP Won Drawn Lost GF GA GD Points
Totals 428.0 157.0 114.0 157.0 605.0 605.0 0.0 585.0
Average 21.4 7.9 5.7 7.9 30.3 30.3 0.0 29.3
Standard Deviation 1.4 4.3 2.8 3.4 11.9 9.0 18.5 12.0
Maximum 24.0 18.0 12.0 14.0 58.0 45.0 41.0 57.0
Minimum 18.0 1.0 2.0 2.0 13.0 14.0 -32.0 12.0
Figure 1. Pie chart for the top four Premier League clubs (in terms of points); the criteria are win shares
Figure 2. Histogram of wins, draws, and losses for each Premier League club as of February 10, 2022.

Analysis of Data

One of the first results of the calculations performed is summary Table 1, which shows measures of central tendency for each of the variables. In isolation from soccer clubs, this data is not highly informative, but some conclusions can be drawn from it. First, some of the largest scatter was for the number of stakes scored: the SD value for this variable was 11.9. From this, it can be concluded that the data distribution of this variable is highly unequal, as some of the values deviate significantly from the mean of 30.3 (Zach, 2021). Second, it can be reported initially that the mean number of wins and losses among all soccer clubs was equal, with the data being more closely packed for losses (SD = 2.8), meaning that the distribution is no longer scattered; however, any comparison of averages requires the use of parametric tests. Finally, we can see that there is a negative number in the Table, in the GD section. This variable corresponds to the difference between goals scored and goals conceded, and the presence of a negative number (-32) indicates that a particular team is poorly prepared, as it was 32 times more likely to give itself a goal than to score a goal itself.

The second step in the statistical analysis was to construct charts and histograms that would create judgments for the current data set. Referring to Figure 1 shows the distribution of wins among the top four teams in the soccer premier league. It is excellent to see that Manchester City had the highest weighting (32%) of wins among all teams, while the weakest in the number of games won was Manchester United, having only 20% of the total number of wins. This data alone is tentatively enough to determine the best team.

However, a comprehensive analysis should be used to get more reliable results. Therefore, it was decided to create a histogram of the distribution of wins, draws, and losses, shown in Figure 2. From this figure, we can see that the number of wins is uneven but decreases rapidly to the right; the inverse trend, in general, is shown by the dynamics of the number of losses (Yi, 2022). Thus, Manchester City did have the highest number of wins and the lowest number of losses, while Watford, along with Norwich, had the highest number of losses.

At this point, it became interesting to see if there was a possible correlation between the two variables, namely the number of wins and losses. This pair is expected to show a strong inverse correlation (as can be predicted from Figure 2). Using the correlation estimation tools built into MS Excel, it was found that the Pearson correlation coefficient for the variables wins and losses was -0.7114 (Fernando, 2021). This value supports the hypothesis that there is a moderately high negative correlation between these variables, which means that increasing the number of wins generally decreases the number of losses.

Conclusion

To summarize, statistical tools can be used to handle quantitative data, and this year’s soccer premier league example demonstrated this. Measures of central tendency and graphs were created, making tabular data easier to work with and creating judgments. Thus, Manchester City was indeed the strongest club as of February 10, 2022, in both wins and losses. Consequently, we can assume that some of the strongest players are involved with this particular team.

Reference List

Bhandari, P. (2021) Standard deviation | a step by step guide with formulas. Web.

Bhandari, P. (2022) Central tendency | understanding the mean, median and mode. Web.

Fernando, J. (2021) Correlation coefficient. Web.

Furniss, M. (2021) Most points in a premier league season: the leaders. Web.

Premier League (2022) Web.

Thakur, M. (2019) Mean formula. Web.

Yi, M. (2022) A complete guide to histograms. Web.

Zach (2021) Why is standard deviation important? (explanation + examples). Web.

Appendix A

Data

Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Click Here To Order Now!