Statistics for Data Analytics [Week 1-4]

Inferential Statistics [Week 2]

Once you’ve described your data with descriptive statistics, you’ll want to make predictions or generalizations from your data. This is where inferential statistics comes into play. It helps you draw conclusions from a sample of data and apply them to a larger population.

Concepts to Master:

  • Sampling and Sampling Distribution:
    You can't always collect data from every individual in a population. Instead, you collect a sample and use sampling distributions to make predictions about the larger population.

    Real-life Example:
    Imagine you're working for a retail company and can only collect purchase data from 1,000 customers, but you want to generalize those insights to millions of customers. Sampling allows you to estimate characteristics of the entire population using your sample.

  • Confidence Intervals: These give you a range within which you can be fairly certain a population parameter lies (e.g., the mean).

    Real-life Example:
    If you survey 1,000 customers about their satisfaction and find that the average satisfaction score is 7.5, you might calculate a confidence interval that says you’re 95% confident the true average satisfaction score lies between 7.2 and 7.8.

  • Hypothesis Testing: This allows you to test an assumption about a population parameter (like the mean) and decide if it’s likely true or not.

    Types of Hypothesis Tests:

    • t-tests (one-sample, two-sample): Compare means between groups.
    • z-tests: Compare population proportions or means when sample size is large.
    • ANOVA (Analysis of Variance): Compare means between three or more groups.

    Real-life Example:
    Say you’re testing two versions of a website landing page (A/B testing) and want to see if version B increases conversions. You can use a t-test to see if the difference in conversion rates is statistically significant or just due to chance.