Graphical Presentation of Data: || Method of Summarizing Data:
Introduction
In quantitative data, a measure of central tendency is crucial. It's a guess at a "typical" value. A summary statistic that depicts the center point or usual value of a dataset is known as a measure of central tendency. These measures, often known as the center location of a distribution, indicate where the majority of values in a distribution fall. You might think of it as data tending to cluster around a central value.
The mean, median, and mode are the three most popular measures of central tendency in statistics. Each of these calculations uses a different method to determine the position of the central point.
Mean or Average
The most popular and well-known measure of central tendency is the mean (or average). It can be used with both discrete and continuous data, while continuous data is the most common. The mean is calculated by dividing the total number of values in the data set by the number of values in the data set. Mathematically, this can be written below:
The sample mean is referred to in the equations above. When our data comes from the population, we use the Greek lower case letter "mu" to indicate that we're calculating the population mean rather than the sample mean: The formula is shown below:
There are a variety of tools that can be used to compute the mean or average from raw data, but in this blog, I'll show you how to use Python to do it.
Example:
Consider the following hypothetical data set and calculate the sample mean.
Properties of the Mean:
Your data set's mean is essentially a model. It is the most widely used value. However, you'll note that the mean isn't always one of the actual values you've seen in your data collection. One of its most essential qualities, however, is that it minimizes error in predicting any single number in your data collection. That is, it is the value in the data set that creates the least amount of error when compared to all other values. The fact that every value in your data set is included in the computation is an important property of the mean. The mean is also the only measure of central tendency in which the total of each value's departures from the mean is always zero.
Limitations of the mean:
Because the values cannot be summed, the mean cannot be computed for categorical data. Outliers and skewed distributions have an impact on the mean since it contains every value in the distribution.
Median
The median is the value in the middle. It's the number that divides the data in half. Order your data from smallest to largest, then select the data point with an equal number of values above and below it to obtain the median. Depending on whether your dataset has an even or odd number of items, the process for locating the median differs slightly.
To find the median, arrange the order of magnitude of the data (smallest first) or vice versa. If the data sets are odd, there is only one single value for the median; if the data sets are even, there are two middle values, thus take the average to find the median. Outliers and skewed data set have little effect on median.
Mode
The mode is the value in your data set that appears the most frequently. The tallest bar in a bar chart is the mode. A multimodal distribution occurs when the data has numerous values that are tied for the most frequent occurrence. The data does not have a mode if no value repeats.
Normally, the mode is used with categorical data to determine which category is the most common.
Thank you for Reading!
I'd love to hear your thoughts about the Measures of Central Tendency. Feel free to leave your comment section below.
References:
Is this article useful to you?
1 Comments
Thank you for reading! Any thoughts about Measures of Central Tendency. I'd love to hear your comment.
ReplyDelete