Descriptive Statistics [Week 1]
1. Descriptive Statistics
This is where every data analyst’s journey with statistics begins. Descriptive statistics help you summarize and describe the main features of a dataset. It’s all about describing what is happening in the data.
Concepts to Master:
-
Measures of Central Tendency:
- Mean: The average value of the dataset.
- Median: The middle value that separates the dataset in half.
- Mode: The most frequently occurring value.
Real-life Example:
Imagine you’re analyzing the salaries of employees in a company. The mean salary might give you an idea of the average, but if a few executives are making huge salaries, the median salary (middle value) could be a better representation of what most employees earn. -
Measures of Spread:
- Range: The difference between the highest and lowest values.
- Variance: How much the data is spread out from the mean.
- Standard Deviation: A more interpretable version of variance that tells you how much the data varies from the average.
Real-life Example:
In the same salary example, the standard deviation will tell you how widely salaries vary in the company. A large standard deviation means there’s a big difference between what people earn, while a small standard deviation means salaries are more uniform. -
Quartiles and Percentiles:
- Helps you divide the data into parts or rank data.
Real-life Example:
When analyzing exam scores, you might want to know the 90th percentile, which tells you the score below which 90% of students scored. This is useful in competitive exams like the SAT or GRE.