Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)
NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.
NB: All your data is kept safe from the public.
Introduction
What is the best way to understand what the data is telling about the matter of interest? Instead of looking at raw data, it is easier to look at the summary of information about the values included in the dataset. In statistics, summative data about a data set is presented in the form of descriptive statistics, which is a quantitative description of the main features of a collection of information (Tanner, 2016). Descriptive statistics may include various calculations, among which are mean, standard error, median, mode, standard deviation, sample variance, kurtosis, skewness, range, minimum, maximum, sum, and count. The present paper aims to analyze a dataset of the annual income of 20 households using descriptive statistics and report the results of the analysis.
Dataset Overview and Central Tendency
The dataset under analysis includes 20 values with a mean of 31.85 and a standard deviation of 19.95. The dataset is presented in Table 1, while descriptive statistics are shown in Table 2.
Table 1. Dataset of Annual Household Income
Table 2. Dataset Descriptive Statistics
The central characteristic of a dataset is the evaluation of the central tendency. There are three methods for illustrating central tendency: calculate the mean, the median, or the mode of the dataset. The mean value is most frequently used when describing the central tendency; however, there are some cases where the median or mode values are preferred. Mode is used for describing datasets that are measured on a nominal scale, while the median is favored when a dataset has several extreme values (Tanner, 2016). As can be seen from Table 1, the dataset includes two extreme values, which are 99 and 78, which implies that the median value is the most appropriate value to describe the central tendency.
According to Table 2, median and mode values are approximately the same, while the mean value differs considerably. Differences between the measures of central tendency lead to the asymmetrical distribution of values. According to Tanner (2016), “when the measures of central tendency do not agree, it is because some scores on one side of the distribution are not counterbalanced by scores a similar distance from the mean on the other side of the distribution” (chapter 2.3). The misbalance tells that the dataset will have a skew significantly different from zero. The only mode of the dataset is 25, as this value appears most frequently in the dataset (4 times).
Interquartile Range and Outliers
As it was mentioned in the previous section, the dataset includes two extreme values, which may influence the mean score considerably. In statistics, extreme values are called outliers, and they are often excluded from the analysis as they may disrupt the results of the analysis, especially when talking about small datasets. There are two ways to calculate outliers. The first method is to exclude values that lie outside the range of two standard deviations below the mean and above the mean. Since the mean is 31.85, and the standard deviation is 19.95, the limits are calculated as follows:
The calculations show that the outliers lie below -8.05 and above 71.75, which implies that household incomes of 78 and 99 should not be included in the analysis.
The second method to find outliers is to exclude all the values 1.5 interquartile ranges (IQRs) above the 75 percentile rank (Q3) and 1.5 IRQs below the 25 percentile rank (Q1). IQR is calculated by subtracting Q1 from Q3. Given that Q1 = 24.5, and Q3 = 28.5, IQR = 28.5 – 24.5 = 4. Therefore, the limits are calculated as follows:
According to the second method, the dataset includes three outliers, which are 18, 78, and 99, which is different from the results received from using the first method.
Discussion
I believe that the most useful descriptive statistics are the mean, the count, and the standard deviation. These values allow the researcher to understand the average value of all the data points and their dispersion. In other words, knowing only to values will allow the observer to understand how the data is distributed. The total count is also vital as it helps to identify if the results are reliable or additional research may be required. However, in the present dataset it is vital to identify the median value as well, to understand that the dataset is skewed and some extreme values may be present.
Conclusion
Descriptive statistics is a powerful method of describing a dataset with a limited number of values. It provides an appreciation of the central tendency and distribution of data points. However, researchers need to be aware that extreme values may disrupt the results of the analysis; therefore, every dataset should be checked for outliers before conducting the analysis.
Reference
Tanner, D. (2016). Statistics for the Behavioral & social sciences (2nd ed.). Bridgepoint Education.
Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)
NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.
NB: All your data is kept safe from the public.