Mean, variance and standard deviation are very basic and most fundamental concepts in statistics and if we are learning machine learning these concepts are very important. Here mean, variance and standard deviation will be clearly explained.
Mean:
Mean is the average of the data which is equal to the sum of all the data points divided by the total number of data points. Let's consider an example:
Let the speed of the angry birds be : 0, 5, 10, 15, 20, 15, 5, 15, 10, 5
Mean = 0+5+10+15+20+15+5+15+10+5 = 10
10
There are two types of mean:
1. Sample mean:
Sample mean is a mean where we take few samples from the whole population and calculate their mean.
Here you may arise a question that why we are taking sample not whole population. Answer to this question is that, suppose we have the population of millions of data so, gathering information from such a large dataset can be very time-consuming and expensive. To avoid this we use sample of the data.
Formula of sample mean is:
Sample mean = (Sum of all the items in the sample)
(Number of items in the sample - 1)
__
sample mean ( X ) = Σ x
(n-1)
Where x is data point and n is a number of data points in the sample.
We have divided the sum by (n-1) because if we divide the sum of sample by n then it will not be equal to the population mean.
2. Population mean:
Population mean is the sum of all the data points divided by total number of data points.
Formula of population mean is :
Population mean = (Sum of all the items)
(Total number of points)
population mean (μ) = Σ x
n
Where x is data point and n is a number of data points in the sample.
Variance:
Variance is measure of the spread of data points i.e. it measures how far each data point is from the mean. Variance is average of the squared differences between a data point and the Mean.
Formula of variance is:
Example:
We will continue with the above example,
data points are : 0, 5, 10, 15, 20, 15, 5, 15, 10, 5
mean(μ) = 10
variance = (0-10)^2+(5-10)^2+(10-10)^2+(15-10)^2+(20-10)^2+(15-10)^2+(5-10)^2+(15-10)^2+(10-10)^2+(5-10)^2
10
= 100+25+0+25+100+25+25+25+0+25
10
=350/10
=35
Same as mean, variance is also of two types:
1. Sample variance:
Sample variance is calculated on sample of the dataset.
Formula for sample variance is:
2. Population variance:
Population variance is calculated on the whole population of the dataset.
Formula for population variance is:
Standard Deviation:
Standard deviation is the square root of variance. It also measures how far data points are spread from the mean.
Formula for standard deviation is:
If data points are normally distributed (Gaussian distribution) then data points lies in below format:
![]() |
Fig. Gaussian distribution graph |
68.3% of data points will lie within 1st standard deviation i.e. -1σ to 1σ, 95.5% of data points will lie within 2nd standard deviation i.e. -2σ to 2σ and 99.7% of data points will lie within 3rd standard deviation i.e. -3σ to 3σ.
0 Comments