Statistics for Data Analytics [Week 1-4]

Descriptive Statistics [Week 1]

1. Descriptive Statistics

This is where every data analyst’s journey with statistics begins. Descriptive statistics help you summarize and describe the main features of a dataset. It’s all about describing what is happening in the data.

Concepts to Master:

  • Measures of Central Tendency:

    • Mean: The average value of the dataset.
    • Median: The middle value that separates the dataset in half.
    • Mode: The most frequently occurring value.

    Real-life Example:
    Imagine you’re analyzing the salaries of employees in a company. The mean salary might give you an idea of the average, but if a few executives are making huge salaries, the median salary (middle value) could be a better representation of what most employees earn.

  • Measures of Spread:

    • Range: The difference between the highest and lowest values.
    • Variance: How much the data is spread out from the mean.
    • Standard Deviation: A more interpretable version of variance that tells you how much the data varies from the average.

    Real-life Example:
    In the same salary example, the standard deviation will tell you how widely salaries vary in the company. A large standard deviation means there’s a big difference between what people earn, while a small standard deviation means salaries are more uniform.

  • Quartiles and Percentiles:

    • Helps you divide the data into parts or rank data.

    Real-life Example:
    When analyzing exam scores, you might want to know the 90th percentile, which tells you the score below which 90% of students scored. This is useful in competitive exams like the SAT or GRE.