Statistics for Data Analytics [Week 1-4]

Probability Basics [Week 1]

Data is often random, and probability helps us understand and model that randomness. For a data analyst, understanding probability is crucial for interpreting data correctly and making predictions.

Concepts to Master:

  • Basic Probability Rules:

    • Addition Rule: The probability that one of two mutually exclusive events will occur.
    • Multiplication Rule: The probability that two independent events will both occur.

    Real-life Example:
    If you’re a marketing analyst, and you want to know the probability that a customer will both open an email and click a link, you can use the multiplication rule to calculate this probability.

  • Conditional Probability:

    • Bayes’ Theorem: A way to calculate the probability of an event based on prior knowledge of conditions that might be related to the event.

    Real-life Example:
    If you’re analyzing whether a customer will purchase based on their past behavior, Bayes’ Theorem can help you adjust the probability based on what you know about the customer’s previous purchases.

  • Probability Distributions:
    These are functions that show how probabilities are distributed over possible outcomes.

    Key Distributions to Learn:

    • Normal Distribution: The famous bell curve; most data points are clustered around the mean.
    • Binomial Distribution: Used when there are exactly two outcomes (success/failure).
    • Poisson Distribution: Used to model the number of times an event happens in a fixed interval.

    Real-life Example:
    Normal distribution is often used to model things like test scores or height, where most people cluster around the average. Poisson distribution might help you model how often a server will crash in a month.